summaryrefslogtreecommitdiff
path: root/protocol
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-04-23 11:18:59 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-04-23 11:18:59 -0500
commit5937af903fdcb473cb3dd39cd3d0a86c1dbe0a05 (patch)
tree5b233aefa3c41fb511128d5b08355144aa2e3e0c /protocol
parent05c935ab03ee0bdb8597d19466192dfb92ee889d (diff)
Update NOTE.md + EVIDENCE_SUMMARY.md with FA results (2026-04-23)
NOTE.md: added comprehensive current-status section at the top with the full 6-method audit table (BP/FA/EP/DFA/CB/SB), FA vs DFA key comparison, depth sweep, penalty rescue comparison, cross-method functional triangulation, and open items. Old Phase 10A content kept below as historical reference. EVIDENCE_SUMMARY.md: added "Vanilla FA vs DFA" section with the paper-changing finding (FA 0.401 ± 0.009 vs DFA 0.306 ± 0.008, FA has genuine deep cos +0.33, no Mode 1(b) collapse) and the d=512 depth sweep table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'protocol')
-rw-r--r--protocol/EVIDENCE_SUMMARY.md26
1 files changed, 26 insertions, 0 deletions
diff --git a/protocol/EVIDENCE_SUMMARY.md b/protocol/EVIDENCE_SUMMARY.md
index bab8764..d6d3945 100644
--- a/protocol/EVIDENCE_SUMMARY.md
+++ b/protocol/EVIDENCE_SUMMARY.md
@@ -129,6 +129,32 @@ on deep layers. **Caught by direct per-layer cosine measurement.**
| BP | 0.585 ± 0.001 | **0.532 ± 0.006** | −5.3 pp (capacity loss) |
| DFA | 0.301 ± 0.005 | 0.360 ± 0.001 | +5.9 pp (rescue) |
+### Vanilla FA vs DFA (2026-04-22, commit 88ff85c)
+
+**PAPER-CHANGING FINDING.** FA (Lillicrap 2016 sequential backward with d×d random matrices) is dramatically different from DFA on the same architecture.
+
+| | FA | DFA |
+|---|---|---|
+| Test acc (100ep, 3-seed, d=256) | **0.401 ± 0.009** | 0.306 ± 0.008 |
+| vs frozen 0.349 | **+5.2 pp above** | -4.3 pp below |
+| Deep cos | **+0.33** | ~0 (degenerate) |
+| ‖h_L‖ | ~10⁵ | ~5×10⁸ |
+| ‖g_L‖ | ~10⁻⁶ (meaningful) | ~10⁻¹⁰ (floor) |
+| Mode 1(b) fires? | **NO** | YES |
+
+Same local loss ⟨f_l, a_l⟩, same architecture, same optimizer. Only difference: how a_l is computed (sequential vs direct projection). FA's sequential backward preserves credit quality → prevents catastrophic Mode 1 growth. **Strongest empirical support for Mode 2 → Mode 1 causal hypothesis.**
+
+Source: `results/fa_main_audit/results_cifar10.json`
+
+FA depth sweep (d=512, 100ep, s42):
+| L | FA acc | FA deep cos | DFA acc | DFA deep cos |
+|---|---|---|---|---|
+| 2 | 0.350 | +0.96 | — | — |
+| 4 | 0.424 | +0.29 | — | — |
+| 6 | 0.401 | +0.16 | — | — |
+| 8 | 0.409 | +0.11 | 0.306 | ~0 |
+| 12 | 0.404 | +0.09 | 0.309 | ~0 |
+
### Round 20 phrasing for the gap
**Lower bound on non-capacity gap**: matched penalty controls show that only part of DFA's deficit is attributable to the representational/optimization cost of the penalty itself; a substantial residual remains and is consistent with poorer credit assignment.