diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-06-14 04:06:32 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-06-14 04:06:32 -0500 |
| commit | aa73718eb6427d7da3b9cb416275802d90c4b2ed (patch) | |
| tree | b68b0a664fb650744ef934a1c22abd740a7b62a6 /TRAINING_INVENTORY.md | |
| parent | 827c658fa9a750f3c6ebdb87703762f10f69f6ff (diff) | |
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'TRAINING_INVENTORY.md')
| -rw-r--r-- | TRAINING_INVENTORY.md | 115 |
1 files changed, 115 insertions, 0 deletions
diff --git a/TRAINING_INVENTORY.md b/TRAINING_INVENTORY.md new file mode 100644 index 0000000..bdcacfa --- /dev/null +++ b/TRAINING_INVENTORY.md @@ -0,0 +1,115 @@ +# Training Inventory + +Generated: 2026-04-25. Source: `results/` directory scan. + +## Summary + +| Dimension | Count | +|---|---| +| **Total training runs** | **125** | +| **Unique models** (method × arch × seed × setting) | **125** | +| **Total training epochs** | **9,410** | +| **Estimated total GPU time** | **~10.5 hours** | +| **Training methods** | **10** (6 primary + 4 frozen-baseline variants) | +| **Architectures** | **5 base + 5 depth variants** | +| **Experiment settings** | **~20 distinct** | +| **Method × architecture combinations** | **31** | + +## By Method (10 types) + +| Method | Runs | Description | +|---|---|---| +| DFA | 45 | Direct Feedback Alignment (Nokland 2016). Output error projected directly to each layer via fixed random d×C matrix. | +| **FA** (NEW) | **21** | Vanilla Feedback Alignment (Lillicrap 2016). Sequential backward credit propagation with fixed random d×d matrices. | +| EP | 13 | Equilibrium Propagation. Contrastive energy-based, nudged-vs-free phase. Internal control (trustworthy). | +| BP | 12 | Backpropagation. End-to-end exact gradients. Gold standard control. | +| State Bridge (SB) | 11 | Diagnostic probe: learns to predict h_L from (h_l, t_l, s), derives credit via target-prop-style gradient. | +| Credit Bridge (CB) | 11 | Diagnostic probe: learns a value network V(h_l, t_l, s), derives credit via synthetic-gradient-style input gradient. | +| BP-shallow | 3 | BP training only embedding + head, blocks frozen. Frozen baseline variant. | +| BP-frozen | 3 | BP training only embedding + head, blocks frozen at random init. | +| DFA-shallow | 3 | DFA training only embedding + head, blocks frozen. Architecture-matched baseline for diagnostic (d). | +| DFA-frozen | 3 | DFA training only embedding + head, blocks frozen at random init. | + +## By Architecture (5 base + depth variants) + +| Architecture | Runs | Role in paper | +|---|---|---| +| **ResMLP d=256 L=4 + terminal LN** | 89 | Primary audit architecture. 4-block pre-LayerNorm residual MLP, CIFAR-10. | +| **SmallCNN BatchNorm** | 15 | Cross-architecture control. No terminal LN → Mode 1(b) does not fire. | +| **ResMLP d=256 L=4 no-terminal-LN** | 3 | Same-backbone causal control for terminal LN. Preserves Mode 1(a), eliminates 1(b). | +| **ResMLP d=256 L=4 no-residual-skip** | 3 | Falsification control. Shows residual skip is NOT necessary for Mode 1. | +| **ViT-Mini d=128 L=4 (cls token + LN)** | 3 | Cross-architecture: transformer with terminal LN. Both Mode 1 diagnostics fire by epochs 2-3. | +| **ResMLP d=512 L=2** | 2 | Depth sweep (DFA + FA). | +| **ResMLP d=512 L=4** | 4 | Depth sweep (DFA 3-seed + FA). | +| **ResMLP d=512 L=6** | 2 | Depth sweep (DFA + FA). | +| **ResMLP d=512 L=8** | 2 | Depth sweep (DFA + FA). | +| **ResMLP d=512 L=12** | 2 | Depth sweep (DFA + FA). | + +## By Experiment Setting (~20 types) + +| Setting | Runs | Epochs | Description | +|---|---|---|---| +| Main audit | 18 | 100 | 6 methods × 3 seeds on d=256 ResMLP. Table 1. | +| Cross-architecture | 18 | 100 | DFA/BP/EP/SB/CB on CNN + DFA on ViT-Mini. §5.2. | +| Depth sweep | 12 | 100 | DFA + FA at d=512, L∈{2,4,6,8,12}. Appendix H. | +| Frozen blocks baseline | 12 | 100 | BP/DFA × shallow/frozen × 3 seeds. Diagnostic (d). | +| Penalty rescue λ=1e-2 | ~12 | 30 | DFA/FA/SB/CB + penalty. §5.1 + §4.2. | +| Penalty sweep λ=1e-4 | 6 | 30 | DFA + FA at weaker penalty. §5.1 λ sweep. | +| Penalty λ=1e-1 | 1 | 30 | DFA at stronger penalty. | +| BP+penalty control | 3 | 30 | Capacity-cost control. §5.2. | +| Matched 30ep no-pen | 9 | 30 | BP/DFA/FA without penalty at 30 epochs. §5.2 matched controls. | +| SB/CB vanilla baseline | 2 | 30 | Appendix J. | +| No-terminal-LN ablation | 3 | 100 | DFA on same backbone minus terminal LN. §3.2. | +| No-residual ablation | 3 | 100 | DFA on same backbone minus skip path. §3.1 falsification. | +| Random-target ablation | 4 | 100 | DFA/FA/SB+CB/EP with i.i.d. random labels. §3.1. | +| Early checkpoints | 6 | 5 | DFA + FA at 5 epochs for Mode 2 identification. §4.1. | +| Snapshot evolution | 3 | 100 | DFA per-epoch trajectory (h_L, g_L logged). §3.1. | +| DFA+pen 100ep | 3 | 100 | Long-horizon penalty rescue. | +| EP baseline (6 seeds) | 6 | 100 | Extended EP replication. | + +## FA vs DFA Comparison (the key new finding) + +Same local loss ⟨f_l, a_l⟩, same architecture, same optimizer. Only difference: how a_l is computed. + +| | FA (21 runs) | DFA (45 runs) | +|---|---|---| +| Credit signal | Sequential backward: a_l = B_l @ a_{l+1} (d×d random) | Direct projection: a_l = B_l^T @ e_T (d×C random) | +| Test acc (d=256, 100ep, 3-seed) | **0.401 ± 0.009** | 0.306 ± 0.008 | +| vs frozen baseline 0.349 | **+5.2 pp above** | -4.3 pp below | +| Deep cos | **+0.33** (genuine) | ~0 (degenerate) | +| ‖h_L‖ | ~10⁵ | ~5×10⁸ | +| ‖g_L‖ | ~10⁻⁶ (meaningful) | ~10⁻¹⁰ (floor) | +| Mode 1(b) fires? | **NO** | YES | + +## Cross-Method Functional Triangulation (under penalty rescue) + +| Metric | SB+pen | CB+pen | DFA+pen | Ordering | +|---|---|---|---|---| +| Accuracy | **0.453** | 0.360 | 0.360 | SB ≫ CB ≈ DFA | +| Nudging (loss Δ) | **-1.93e-3** | -4.26e-4 | -4.98e-5 | SB ≫ CB ≈ DFA | +| Training loss decrease | **-0.447** | -0.121 | -0.095 | SB ≫ CB ≈ DFA | +| Deep cos | +0.322 | **+0.679** | +0.151 | CB > SB > DFA | + +3 functional metrics agree on SB ≫ CB ≈ DFA. Deep cosine is the only one that disagrees. + +## File Locations + +All training results are in `results/`. Key directories: +- `results/protocol_audit/` — main 5-method audit + temporal evolution + CNN + d=512 +- `results/fa_main_audit/` — FA 100ep 3-seed +- `results/fa_depth_scan_d512_L{2,4,6,8,12}/` — FA depth sweep +- `results/fa_penalty_30ep/`, `results/fa_penalty_lam1e-4_30ep/` — FA + penalty +- `results/fa_random_targets_s42/`, `results/fa_early_ckpts/` — FA ablations +- `results/dfa_pen_short/` — DFA penalty sweep (lam=0.01, 0.0001, 0.1) +- `results/round38_*_penalty_30ep*/` — SB+pen, CB+pen 3-seed +- `results/round41_dfa_penalty_30ep*/` — DFA+pen 3-seed diagnostics +- `results/bp_with_penalty/`, `results/bp_no_penalty_30ep/` — BP ± penalty +- `results/dfa_no_penalty_30ep/`, `results/fa_no_penalty_30ep/` — matched 30ep +- `results/snapshot_no_outln_v1/` — no-terminal-LN ablation +- `results/h2_no_residual_full_s*/` — no-residual ablation +- `results/snapshot_vit_v1/` — ViT-Mini +- `results/ep_baseline/` — EP 6-seed +- `results/resmlp_frozen_blocks_s*.log` — frozen baseline + +Reproducibility notebook: `reproduce_all.ipynb` +All ± values use ddof=1 (sample std with Bessel correction). |
