6
Architectures
2.05
Max H_path (nats)
0.91
Max Ω
0.39
Min Ω

📊 Architecture Comparison

MLP · CNN · Transformer
ArchitectureDepthH_path (nats)Ω IndexRegime
MLP (2L)20.31
0.91
Reducible
MLP (6L)60.68
0.74
Reducible
MLP (12L)121.14
0.61
Reducible
CNN (8L)80.87
0.68
Reducible
Transformer (12L)121.42
0.52
Borderline
Transformer (24L)242.05
0.39
Irreducible

Values are illustrative. Empirical calibration required for specific architectures.

📐 Core Constructs

Formal Definitions

Def 01 · Local Path Entropy

H_path(l)= −Σ p_{l,k} log p_{l,k}
InterpretationShannon entropy at layer l

Def 02 · Cumulative Path Entropy

H_path^(L)= Σ H(P_l)
InterpretationTotal entropy across L layers

Def 05 · Observability Index

Ω(N)= 1 − H_irr / H_path
Range[0,1] · 1 = fully observable

Corollary · Entropic Leakage

Δ(L)= H_path^(L) − I(x; h_L)
InterpretationUncertainty unexplained

🔬 Reducibility Conditions

Def 03 & 04

Reducible Layer

ConditionI(h_l; M_l(y)) ≥ H_path(l) − δ*
δ*= ε · max H(P_l), ε ∈ (0,1)

Irreducible Path Entropy

H_irr^(L)= H_path^(L) − H_red^(L)
InterpretationEntropy not recoverable

📋 Estimation Protocol

Reproducibility
Step 1
Activation recording: For dataset D = {x_i}, record activation vectors {h_l(x_i)} at each layer l.
Step 2
Entropy estimation: Apply non-parametric entropy estimator (k-NN) to empirical activation distribution → H_path(l).
Step 3
Mutual information estimation: Estimate I(h_l; y) via binned or neural MI estimator → H_red^(L).
Step 4
Observability computation: Compute Ω = 1 − H_irr / H_path.

Reproducibility conditions: fixed weights, consistent discretisation, fixed estimator parameters (k=5), fixed dataset, fixed random seed (42).

📊 H_path vs Ω Visualization

Comparative Chart
MLP (2L)
0.31
Ω=0.91
MLP (6L)
0.68
Ω=0.74
MLP (12L)
1.14
Ω=0.61
CNN (8L)
0.87
Ω=0.68
Transformer (12L)
1.42
Ω=0.52
Transformer (24L)
2.05
Ω=0.39

Red bars indicate low-observability (irreducible) regimes.