📖 Overview
"The algorithms that govern network behaviour are designed at the architectural level, while the precise dynamics of inference remain inaccessible even to their designers." — Geoffrey Hinton (2023)
ENTRO-PATH introduces Irreducible Path Entropy (Hpath), a quantitative metric for characterising the structural accumulation and reducibility of entropy across computational decision trajectories in artificial neural networks. The construct is grounded exclusively in information-theoretic and systems-level analysis — without recourse to semantic, cognitive, or anthropomorphic assumptions.
📐 Mathematical Formalism
Let N denote a neural network comprising L successive transformation layers {f₁, f₂, ..., fL}. For an input x ∈ X, the network computes a sequence of internal representations {h₀, h₁, ..., hL} where h₀ = x and hl = fl(hl-1) for l = 1, ..., L.
At each layer l, the conditional distribution over possible activation states is denoted Pl(hl | hl-1). The Shannon entropy of this distribution is:
H(P_l) = -Σ_k P_l(h_l^{(k)}) log P_l(h_l^{(k)})🏗️ Core Constructs
Def 01 · Local Path Entropy
H_path(l) = -Σ_k p_{l,k} log p_{l,k}Shannon entropy of the conditional activation distribution at layer l.
Def 02 · Cumulative Path Entropy
H_path^(L) = Σ_{l=1}^{L} H(P_l)Total informational uncertainty accumulated across all L layers. This quantity measures the total informational uncertainty introduced across the full computational trajectory from input to output.
Def 04 · Irreducible Path Entropy
H_irr^(L) = H_path^(L) - Σ_{reducible} I(h_l; M_l(y))The component of path entropy that cannot be recovered from external observations.
Corollary · Entropic Leakage
Δ(L) = H_path^(L) - I(x; h_L)
Large values indicate the network introduces substantial uncertainty across its computation that is not explained by the retained input information — a signature of high irreducibility.
🔬 Reducibility Conditions
Def 03 · Reducible Path Entropy
The path entropy at layer l is said to be reducible if there exists a measurement operator Ml acting on the observable output space Y such that:
I(h_l; M_l(y)) ≥ H_path(l) − δ*
where I(· ; ·) denotes mutual information, y ∈ Y is the observable output, and δ* > 0 is a tolerance parameter governing the acceptable residual uncertainty.
Reducibility Threshold
δ* = ε · max H(P_l), ε ∈ (0,1)
The choice of ε governs the sensitivity of the reducibility classification. Smaller values impose stricter observability requirements; larger values permit greater residual uncertainty before a layer is classified as irreducible.
🎯 Observability Index
Def 05 · Observability Index
Ω(N) = 1 - H_irr^(L) / H_path^(L)
Properties
- Monotonicity: Ω is non-increasing as network depth increases under constant layer-wise entropy.
- Architecture dependence: Ω varies with activation function, normalisation strategy, and connectivity structure.
- Boundedness: 0 ≤ Ω ≤ 1 for any finite-depth network with non-zero path entropy.
- Experimental accessibility: Ω can be estimated from activation recordings via probing classifiers or mutual information estimators.
📦 Installation
# From PyPI (stable) pip install entropath # From source git clone https://github.com/gitdeeper12/ENTRO-PATH.git cd ENTRO-PATH && pip install -e . # Quick test python -c "from entropath import entropy_estimator, observability_index; print('ENTRO-PATH ready')"
🔧 API Reference
entropy_estimator.py
from entropath import knn_entropy, mutual_information # Estimate entropy from activation samples H_path = knn_entropy(activations, k=5) # Estimate mutual information for reducibility I_hl_y = mutual_information(h_l, y, bins=32) # Compute observability index omega = observability_index(H_path, H_irr)
Core Functions
| Function | Description | Parameters |
|---|---|---|
| knn_entropy() | k-NN Kozachenko-Leonenko estimator | X, k=5 |
| binned_mi() | Binned mutual information estimator | X, Y, bins=32 |
| cumulative_path_entropy() | Sum H(P_l) across layers | layer_activations |
| observability_index() | Ω = 1 − H_irr / H_path | H_path, H_irr |
| entropic_leakage() | Δ(L) = H_path − I(x; h_L) | H_path, mi_x_hL |
📋 Estimation Protocol
The quantities H_path^(L) and Ω can be estimated experimentally through the following protocol:
For dataset D = {x_i}_{i=1}^N, record activation vectors {h_l(x_i)} at each layer l.Apply non-parametric entropy estimator (k-NN) to empirical distribution → H_path(l).
Estimate I(h_l; y) via binned or neural MI estimator → H_red^(L).
Compute Ω = 1 − H_irr / H_path.
🔄 Reproducibility Conditions
For the framework to be experimentally reproducible, the following conditions must be satisfied:
| Condition | Requirement |
|---|---|
| Fixed weights | No stochastic inference-time modifications (e.g., MC dropout) |
| Consistent discretisation | Activation binning scheme fixed across layers |
| Fixed estimator parameters | Bandwidth / neighbourhood k held constant (k=5) |
| Fixed dataset | D = {x_i} held constant across comparative measurements |
| Fixed random seed | seed=42 for deterministic behaviour |
🏛️ Architecture Findings
| Architecture | Depth | H_path (nats) | Ω Index | Regime |
|---|---|---|---|---|
| MLP (2L) | 2 | 0.31 | 0.91 | Reducible |
| MLP (6L) | 6 | 0.68 | 0.74 | Reducible |
| MLP (12L) | 12 | 1.14 | 0.61 | Reducible |
| CNN (8L) | 8 | 0.87 | 0.68 | Reducible |
| Transformer (12L) | 12 | 1.42 | 0.52 | Borderline |
| Transformer (24L) | 24 | 2.05 | 0.39 | Irreducible |
Values are illustrative. Empirical calibration required for specific architectures.
📝 Citation
"The framework is grounded exclusively in information-theoretic and systems-level analysis, without recourse to semantic, cognitive, or anthropomorphic assumptions." — ENTRO-PATH v1.0.0