From 3012cba6032ee04cc0b82c178fbf8df8e47c7d2f Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Mon, 23 Mar 2026 19:46:56 -0500 Subject: Add sweep results confirming terminal gradient matching is essential 12-config sweep: no hyperparameter combination recovers useful credit gradients without terminal gradient matching (best cos ~0.3 early, decays to ~0). --- report/REPORT.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'report/REPORT.md') diff --git a/report/REPORT.md b/report/REPORT.md index 2fa8e31..1df77e1 100644 --- a/report/REPORT.md +++ b/report/REPORT.md @@ -30,7 +30,7 @@ Key constraint: **No hidden BP anchor** — intermediate layers never receive ex **Key findings:** - Credit bridge matches state bridge (~0.94 cosine) on the linear system. - Both far exceed DFA, which provides essentially zero directional credit. -- Credit bridge requires **terminal gradient matching** to succeed. Without it, the value function learns correct values but has uninformative gradients (cosine collapses to ~0.03). Terminal gradient matching uses output-layer-local info only (not hidden BP). +- Credit bridge requires **terminal gradient matching** to succeed. Without it, the value function learns correct values but has uninformative gradients (cosine collapses to ~0.03). This was verified across a 12-config hyperparameter sweep — no combination of noise (σ=0.03–1.0), temperature (λ=0.1–1.0), architecture, or learning rate recovers useful gradients without terminal gradient matching. Terminal gradient matching uses output-layer-local info only (not hidden BP). - FM auxiliary provides marginal additional improvement (0.946 vs 0.940). ![Per-layer diagnostics](toy_per_layer_diagnostics.png) -- cgit v1.2.3