1 files changed, 26 insertions, 0 deletions
diff --git a/protocol/EVIDENCE_SUMMARY.md b/protocol/EVIDENCE_SUMMARY.md
index bab8764..d6d3945 100644
--- a/protocol/EVIDENCE_SUMMARY.md
+++ b/protocol/EVIDENCE_SUMMARY.md
@@ -129,6 +129,32 @@ on deep layers. **Caught by direct per-layer cosine measurement.**
 | BP | 0.585 ± 0.001 | **0.532 ± 0.006** | −5.3 pp (capacity loss) |
 | DFA | 0.301 ± 0.005 | 0.360 ± 0.001 | +5.9 pp (rescue) |
 
+### Vanilla FA vs DFA (2026-04-22, commit 88ff85c)
+
+**PAPER-CHANGING FINDING.** FA (Lillicrap 2016 sequential backward with d×d random matrices) is dramatically different from DFA on the same architecture.
+
+| | FA | DFA |
+|---|---|---|
+| Test acc (100ep, 3-seed, d=256) | **0.401 ± 0.009** | 0.306 ± 0.008 |
+| vs frozen 0.349 | **+5.2 pp above** | -4.3 pp below |
+| Deep cos | **+0.33** | ~0 (degenerate) |
+| ‖h_L‖ | ~10⁵ | ~5×10⁸ |
+| ‖g_L‖ | ~10⁻⁶ (meaningful) | ~10⁻¹⁰ (floor) |
+| Mode 1(b) fires? | **NO** | YES |
+
+Same local loss ⟨f_l, a_l⟩, same architecture, same optimizer. Only difference: how a_l is computed (sequential vs direct projection). FA's sequential backward preserves credit quality → prevents catastrophic Mode 1 growth. **Strongest empirical support for Mode 2 → Mode 1 causal hypothesis.**
+
+Source: `results/fa_main_audit/results_cifar10.json`
+
+FA depth sweep (d=512, 100ep, s42):
+| L | FA acc | FA deep cos | DFA acc | DFA deep cos |
+|---|---|---|---|---|
+| 2 | 0.350 | +0.96 | — | — |
+| 4 | 0.424 | +0.29 | — | — |
+| 6 | 0.401 | +0.16 | — | — |
+| 8 | 0.409 | +0.11 | 0.306 | ~0 |
+| 12 | 0.404 | +0.09 | 0.309 | ~0 |
+
 ### Round 20 phrasing for the gap
 
 **Lower bound on non-capacity gap**: matched penalty controls show that only part of DFA's deficit is attributable to the representational/optimization cost of the penalty itself; a substantial residual remains and is consistent with poorer credit assignment.