TRAINING_INVENTORY.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115

# Training Inventory

Generated: 2026-04-25. Source: `results/` directory scan.

## Summary

| Dimension | Count |
|---|---|
| **Total training runs** | **125** |
| **Unique models** (method × arch × seed × setting) | **125** |
| **Total training epochs** | **9,410** |
| **Estimated total GPU time** | **~10.5 hours** |
| **Training methods** | **10** (6 primary + 4 frozen-baseline variants) |
| **Architectures** | **5 base + 5 depth variants** |
| **Experiment settings** | **~20 distinct** |
| **Method × architecture combinations** | **31** |

## By Method (10 types)

| Method | Runs | Description |
|---|---|---|
| DFA | 45 | Direct Feedback Alignment (Nokland 2016). Output error projected directly to each layer via fixed random d×C matrix. |
| **FA** (NEW) | **21** | Vanilla Feedback Alignment (Lillicrap 2016). Sequential backward credit propagation with fixed random d×d matrices. |
| EP | 13 | Equilibrium Propagation. Contrastive energy-based, nudged-vs-free phase. Internal control (trustworthy). |
| BP | 12 | Backpropagation. End-to-end exact gradients. Gold standard control. |
| State Bridge (SB) | 11 | Diagnostic probe: learns to predict h_L from (h_l, t_l, s), derives credit via target-prop-style gradient. |
| Credit Bridge (CB) | 11 | Diagnostic probe: learns a value network V(h_l, t_l, s), derives credit via synthetic-gradient-style input gradient. |
| BP-shallow | 3 | BP training only embedding + head, blocks frozen. Frozen baseline variant. |
| BP-frozen | 3 | BP training only embedding + head, blocks frozen at random init. |
| DFA-shallow | 3 | DFA training only embedding + head, blocks frozen. Architecture-matched baseline for diagnostic (d). |
| DFA-frozen | 3 | DFA training only embedding + head, blocks frozen at random init. |

## By Architecture (5 base + depth variants)

| Architecture | Runs | Role in paper |
|---|---|---|
| **ResMLP d=256 L=4 + terminal LN** | 89 | Primary audit architecture. 4-block pre-LayerNorm residual MLP, CIFAR-10. |
| **SmallCNN BatchNorm** | 15 | Cross-architecture control. No terminal LN → Mode 1(b) does not fire. |
| **ResMLP d=256 L=4 no-terminal-LN** | 3 | Same-backbone causal control for terminal LN. Preserves Mode 1(a), eliminates 1(b). |
| **ResMLP d=256 L=4 no-residual-skip** | 3 | Falsification control. Shows residual skip is NOT necessary for Mode 1. |
| **ViT-Mini d=128 L=4 (cls token + LN)** | 3 | Cross-architecture: transformer with terminal LN. Both Mode 1 diagnostics fire by epochs 2-3. |
| **ResMLP d=512 L=2** | 2 | Depth sweep (DFA + FA). |
| **ResMLP d=512 L=4** | 4 | Depth sweep (DFA 3-seed + FA). |
| **ResMLP d=512 L=6** | 2 | Depth sweep (DFA + FA). |
| **ResMLP d=512 L=8** | 2 | Depth sweep (DFA + FA). |
| **ResMLP d=512 L=12** | 2 | Depth sweep (DFA + FA). |

## By Experiment Setting (~20 types)

| Setting | Runs | Epochs | Description |
|---|---|---|---|
| Main audit | 18 | 100 | 6 methods × 3 seeds on d=256 ResMLP. Table 1. |
| Cross-architecture | 18 | 100 | DFA/BP/EP/SB/CB on CNN + DFA on ViT-Mini. §5.2. |
| Depth sweep | 12 | 100 | DFA + FA at d=512, L∈{2,4,6,8,12}. Appendix H. |
| Frozen blocks baseline | 12 | 100 | BP/DFA × shallow/frozen × 3 seeds. Diagnostic (d). |
| Penalty rescue λ=1e-2 | ~12 | 30 | DFA/FA/SB/CB + penalty. §5.1 + §4.2. |
| Penalty sweep λ=1e-4 | 6 | 30 | DFA + FA at weaker penalty. §5.1 λ sweep. |
| Penalty λ=1e-1 | 1 | 30 | DFA at stronger penalty. |
| BP+penalty control | 3 | 30 | Capacity-cost control. §5.2. |
| Matched 30ep no-pen | 9 | 30 | BP/DFA/FA without penalty at 30 epochs. §5.2 matched controls. |
| SB/CB vanilla baseline | 2 | 30 | Appendix J. |
| No-terminal-LN ablation | 3 | 100 | DFA on same backbone minus terminal LN. §3.2. |
| No-residual ablation | 3 | 100 | DFA on same backbone minus skip path. §3.1 falsification. |
| Random-target ablation | 4 | 100 | DFA/FA/SB+CB/EP with i.i.d. random labels. §3.1. |
| Early checkpoints | 6 | 5 | DFA + FA at 5 epochs for Mode 2 identification. §4.1. |
| Snapshot evolution | 3 | 100 | DFA per-epoch trajectory (h_L, g_L logged). §3.1. |
| DFA+pen 100ep | 3 | 100 | Long-horizon penalty rescue. |
| EP baseline (6 seeds) | 6 | 100 | Extended EP replication. |

## FA vs DFA Comparison (the key new finding)

Same local loss ⟨f_l, a_l⟩, same architecture, same optimizer. Only difference: how a_l is computed.

| | FA (21 runs) | DFA (45 runs) |
|---|---|---|
| Credit signal | Sequential backward: a_l = B_l @ a_{l+1} (d×d random) | Direct projection: a_l = B_l^T @ e_T (d×C random) |
| Test acc (d=256, 100ep, 3-seed) | **0.401 ± 0.009** | 0.306 ± 0.008 |
| vs frozen baseline 0.349 | **+5.2 pp above** | -4.3 pp below |
| Deep cos | **+0.33** (genuine) | ~0 (degenerate) |
| ‖h_L‖ | ~10⁵ | ~5×10⁸ |
| ‖g_L‖ | ~10⁻⁶ (meaningful) | ~10⁻¹⁰ (floor) |
| Mode 1(b) fires? | **NO** | YES |

## Cross-Method Functional Triangulation (under penalty rescue)

| Metric | SB+pen | CB+pen | DFA+pen | Ordering |
|---|---|---|---|---|
| Accuracy | **0.453** | 0.360 | 0.360 | SB ≫ CB ≈ DFA |
| Nudging (loss Δ) | **-1.93e-3** | -4.26e-4 | -4.98e-5 | SB ≫ CB ≈ DFA |
| Training loss decrease | **-0.447** | -0.121 | -0.095 | SB ≫ CB ≈ DFA |
| Deep cos | +0.322 | **+0.679** | +0.151 | CB > SB > DFA |

3 functional metrics agree on SB ≫ CB ≈ DFA. Deep cosine is the only one that disagrees.

## File Locations

All training results are in `results/`. Key directories:
- `results/protocol_audit/` — main 5-method audit + temporal evolution + CNN + d=512
- `results/fa_main_audit/` — FA 100ep 3-seed
- `results/fa_depth_scan_d512_L{2,4,6,8,12}/` — FA depth sweep
- `results/fa_penalty_30ep/`, `results/fa_penalty_lam1e-4_30ep/` — FA + penalty
- `results/fa_random_targets_s42/`, `results/fa_early_ckpts/` — FA ablations
- `results/dfa_pen_short/` — DFA penalty sweep (lam=0.01, 0.0001, 0.1)
- `results/round38_*_penalty_30ep*/` — SB+pen, CB+pen 3-seed
- `results/round41_dfa_penalty_30ep*/` — DFA+pen 3-seed diagnostics
- `results/bp_with_penalty/`, `results/bp_no_penalty_30ep/` — BP ± penalty
- `results/dfa_no_penalty_30ep/`, `results/fa_no_penalty_30ep/` — matched 30ep
- `results/snapshot_no_outln_v1/` — no-terminal-LN ablation
- `results/h2_no_residual_full_s*/` — no-residual ablation
- `results/snapshot_vit_v1/` — ViT-Mini
- `results/ep_baseline/` — EP 6-seed
- `results/resmlp_frozen_blocks_s*.log` — frozen baseline

Reproducibility notebook: `reproduce_all.ipynb`
All ± values use ddof=1 (sample std with Bessel correction).