From 5937af903fdcb473cb3dd39cd3d0a86c1dbe0a05 Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Thu, 23 Apr 2026 11:18:59 -0500 Subject: Update NOTE.md + EVIDENCE_SUMMARY.md with FA results (2026-04-23) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit NOTE.md: added comprehensive current-status section at the top with the full 6-method audit table (BP/FA/EP/DFA/CB/SB), FA vs DFA key comparison, depth sweep, penalty rescue comparison, cross-method functional triangulation, and open items. Old Phase 10A content kept below as historical reference. EVIDENCE_SUMMARY.md: added "Vanilla FA vs DFA" section with the paper-changing finding (FA 0.401 ± 0.009 vs DFA 0.306 ± 0.008, FA has genuine deep cos +0.33, no Mode 1(b) collapse) and the d=512 depth sweep table. Co-Authored-By: Claude Opus 4.6 (1M context) --- protocol/EVIDENCE_SUMMARY.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'protocol/EVIDENCE_SUMMARY.md') diff --git a/protocol/EVIDENCE_SUMMARY.md b/protocol/EVIDENCE_SUMMARY.md index bab8764..d6d3945 100644 --- a/protocol/EVIDENCE_SUMMARY.md +++ b/protocol/EVIDENCE_SUMMARY.md @@ -129,6 +129,32 @@ on deep layers. **Caught by direct per-layer cosine measurement.** | BP | 0.585 ± 0.001 | **0.532 ± 0.006** | −5.3 pp (capacity loss) | | DFA | 0.301 ± 0.005 | 0.360 ± 0.001 | +5.9 pp (rescue) | +### Vanilla FA vs DFA (2026-04-22, commit 88ff85c) + +**PAPER-CHANGING FINDING.** FA (Lillicrap 2016 sequential backward with d×d random matrices) is dramatically different from DFA on the same architecture. + +| | FA | DFA | +|---|---|---| +| Test acc (100ep, 3-seed, d=256) | **0.401 ± 0.009** | 0.306 ± 0.008 | +| vs frozen 0.349 | **+5.2 pp above** | -4.3 pp below | +| Deep cos | **+0.33** | ~0 (degenerate) | +| ‖h_L‖ | ~10⁵ | ~5×10⁸ | +| ‖g_L‖ | ~10⁻⁶ (meaningful) | ~10⁻¹⁰ (floor) | +| Mode 1(b) fires? | **NO** | YES | + +Same local loss ⟨f_l, a_l⟩, same architecture, same optimizer. Only difference: how a_l is computed (sequential vs direct projection). FA's sequential backward preserves credit quality → prevents catastrophic Mode 1 growth. **Strongest empirical support for Mode 2 → Mode 1 causal hypothesis.** + +Source: `results/fa_main_audit/results_cifar10.json` + +FA depth sweep (d=512, 100ep, s42): +| L | FA acc | FA deep cos | DFA acc | DFA deep cos | +|---|---|---|---|---| +| 2 | 0.350 | +0.96 | — | — | +| 4 | 0.424 | +0.29 | — | — | +| 6 | 0.401 | +0.16 | — | — | +| 8 | 0.409 | +0.11 | 0.306 | ~0 | +| 12 | 0.404 | +0.09 | 0.309 | ~0 | + ### Round 20 phrasing for the gap **Lower bound on non-capacity gap**: matched penalty controls show that only part of DFA's deficit is attributable to the representational/optimization cost of the penalty itself; a substantial residual remains and is consistent with poorer credit assignment. -- cgit v1.2.3