From 5937af903fdcb473cb3dd39cd3d0a86c1dbe0a05 Mon Sep 17 00:00:00 2001
From: YurenHao0426 <Blackhao0426@gmail.com>
Date: Thu, 23 Apr 2026 11:18:59 -0500
Subject: Update NOTE.md + EVIDENCE_SUMMARY.md with FA results (2026-04-23)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

NOTE.md: added comprehensive current-status section at the top with
the full 6-method audit table (BP/FA/EP/DFA/CB/SB), FA vs DFA key
comparison, depth sweep, penalty rescue comparison, cross-method
functional triangulation, and open items. Old Phase 10A content kept
below as historical reference.

EVIDENCE_SUMMARY.md: added "Vanilla FA vs DFA" section with the
paper-changing finding (FA 0.401 ± 0.009 vs DFA 0.306 ± 0.008,
FA has genuine deep cos +0.33, no Mode 1(b) collapse) and the
d=512 depth sweep table.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 protocol/EVIDENCE_SUMMARY.md | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

(limited to 'protocol/EVIDENCE_SUMMARY.md')

diff --git a/protocol/EVIDENCE_SUMMARY.md b/protocol/EVIDENCE_SUMMARY.md
index bab8764..d6d3945 100644
--- a/protocol/EVIDENCE_SUMMARY.md
+++ b/protocol/EVIDENCE_SUMMARY.md
@@ -129,6 +129,32 @@ on deep layers. **Caught by direct per-layer cosine measurement.**
 | BP | 0.585 ± 0.001 | **0.532 ± 0.006** | −5.3 pp (capacity loss) |
 | DFA | 0.301 ± 0.005 | 0.360 ± 0.001 | +5.9 pp (rescue) |
 
+### Vanilla FA vs DFA (2026-04-22, commit 88ff85c)
+
+**PAPER-CHANGING FINDING.** FA (Lillicrap 2016 sequential backward with d×d random matrices) is dramatically different from DFA on the same architecture.
+
+| | FA | DFA |
+|---|---|---|
+| Test acc (100ep, 3-seed, d=256) | **0.401 ± 0.009** | 0.306 ± 0.008 |
+| vs frozen 0.349 | **+5.2 pp above** | -4.3 pp below |
+| Deep cos | **+0.33** | ~0 (degenerate) |
+| ‖h_L‖ | ~10⁵ | ~5×10⁸ |
+| ‖g_L‖ | ~10⁻⁶ (meaningful) | ~10⁻¹⁰ (floor) |
+| Mode 1(b) fires? | **NO** | YES |
+
+Same local loss ⟨f_l, a_l⟩, same architecture, same optimizer. Only difference: how a_l is computed (sequential vs direct projection). FA's sequential backward preserves credit quality → prevents catastrophic Mode 1 growth. **Strongest empirical support for Mode 2 → Mode 1 causal hypothesis.**
+
+Source: `results/fa_main_audit/results_cifar10.json`
+
+FA depth sweep (d=512, 100ep, s42):
+| L | FA acc | FA deep cos | DFA acc | DFA deep cos |
+|---|---|---|---|---|
+| 2 | 0.350 | +0.96 | — | — |
+| 4 | 0.424 | +0.29 | — | — |
+| 6 | 0.401 | +0.16 | — | — |
+| 8 | 0.409 | +0.11 | 0.306 | ~0 |
+| 12 | 0.404 | +0.09 | 0.309 | ~0 |
+
 ### Round 20 phrasing for the gap
 
 **Lower bound on non-capacity gap**: matched penalty controls show that only part of DFA's deficit is attributable to the representational/optimization cost of the penalty itself; a substantial residual remains and is consistent with poorer credit assignment.
-- 
cgit v1.2.3