diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-23 11:18:59 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-23 11:18:59 -0500 |
| commit | 5937af903fdcb473cb3dd39cd3d0a86c1dbe0a05 (patch) | |
| tree | 5b233aefa3c41fb511128d5b08355144aa2e3e0c /protocol | |
| parent | 05c935ab03ee0bdb8597d19466192dfb92ee889d (diff) | |
Update NOTE.md + EVIDENCE_SUMMARY.md with FA results (2026-04-23)
NOTE.md: added comprehensive current-status section at the top with
the full 6-method audit table (BP/FA/EP/DFA/CB/SB), FA vs DFA key
comparison, depth sweep, penalty rescue comparison, cross-method
functional triangulation, and open items. Old Phase 10A content kept
below as historical reference.
EVIDENCE_SUMMARY.md: added "Vanilla FA vs DFA" section with the
paper-changing finding (FA 0.401 ± 0.009 vs DFA 0.306 ± 0.008,
FA has genuine deep cos +0.33, no Mode 1(b) collapse) and the
d=512 depth sweep table.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'protocol')
| -rw-r--r-- | protocol/EVIDENCE_SUMMARY.md | 26 |
1 files changed, 26 insertions, 0 deletions
diff --git a/protocol/EVIDENCE_SUMMARY.md b/protocol/EVIDENCE_SUMMARY.md index bab8764..d6d3945 100644 --- a/protocol/EVIDENCE_SUMMARY.md +++ b/protocol/EVIDENCE_SUMMARY.md @@ -129,6 +129,32 @@ on deep layers. **Caught by direct per-layer cosine measurement.** | BP | 0.585 ± 0.001 | **0.532 ± 0.006** | −5.3 pp (capacity loss) | | DFA | 0.301 ± 0.005 | 0.360 ± 0.001 | +5.9 pp (rescue) | +### Vanilla FA vs DFA (2026-04-22, commit 88ff85c) + +**PAPER-CHANGING FINDING.** FA (Lillicrap 2016 sequential backward with d×d random matrices) is dramatically different from DFA on the same architecture. + +| | FA | DFA | +|---|---|---| +| Test acc (100ep, 3-seed, d=256) | **0.401 ± 0.009** | 0.306 ± 0.008 | +| vs frozen 0.349 | **+5.2 pp above** | -4.3 pp below | +| Deep cos | **+0.33** | ~0 (degenerate) | +| ‖h_L‖ | ~10⁵ | ~5×10⁸ | +| ‖g_L‖ | ~10⁻⁶ (meaningful) | ~10⁻¹⁰ (floor) | +| Mode 1(b) fires? | **NO** | YES | + +Same local loss ⟨f_l, a_l⟩, same architecture, same optimizer. Only difference: how a_l is computed (sequential vs direct projection). FA's sequential backward preserves credit quality → prevents catastrophic Mode 1 growth. **Strongest empirical support for Mode 2 → Mode 1 causal hypothesis.** + +Source: `results/fa_main_audit/results_cifar10.json` + +FA depth sweep (d=512, 100ep, s42): +| L | FA acc | FA deep cos | DFA acc | DFA deep cos | +|---|---|---|---|---| +| 2 | 0.350 | +0.96 | — | — | +| 4 | 0.424 | +0.29 | — | — | +| 6 | 0.401 | +0.16 | — | — | +| 8 | 0.409 | +0.11 | 0.306 | ~0 | +| 12 | 0.404 | +0.09 | 0.309 | ~0 | + ### Round 20 phrasing for the gap **Lower bound on non-capacity gap**: matched penalty controls show that only part of DFA's deficit is attributable to the representational/optimization cost of the penalty itself; a substantial residual remains and is consistent with poorer credit assignment. |
