summaryrefslogtreecommitdiff
path: root/protocol/REPORTING_TEMPLATE.md
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-04-07 22:20:48 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-04-07 22:20:48 -0500
commit7b64702ad970c16171142665365e16a8e1737190 (patch)
tree30df9313fc195aa037f05c0598fdc0df8b32bbcc /protocol/REPORTING_TEMPLATE.md
parent2dc8e7efb5f2ff827fbd97be87f0127aa5ab2757 (diff)
Add FA diagnostic protocol reference implementation
Codex round 15 #1 priority for the E&D-track paper: - protocol/protocol.py: 4 diagnostics (residual norms, BP grad norms, cross-batch direction stability, and a frozen-baseline comparator) - protocol/report.py: DiagnosticReport with per-diagnostic verdicts and pretty-printer - protocol/smoke_test.py: validates BP/DFA/EP checkpoints produce the expected verdicts (BP/EP trustworthy; DFA walked back via residual explosion + BP grad at floor) - protocol/README.md: usage, audit cases, threshold rationale - protocol/CHECKLIST.md: 6 evaluation pipeline pitfalls (norm(-1), cosine_similarity eps clamp, fp16 underflow, Bs reproducibility, aggregation, layer-0 dominance) - protocol/REPORTING_TEMPLATE.md: per-method fillable form for FA papers
Diffstat (limited to 'protocol/REPORTING_TEMPLATE.md')
-rw-r--r--protocol/REPORTING_TEMPLATE.md69
1 files changed, 69 insertions, 0 deletions
diff --git a/protocol/REPORTING_TEMPLATE.md b/protocol/REPORTING_TEMPLATE.md
new file mode 100644
index 0000000..5d4a187
--- /dev/null
+++ b/protocol/REPORTING_TEMPLATE.md
@@ -0,0 +1,69 @@
+# FA Evaluation Reporting Template
+
+A minimal, fillable reporting template for any paper that evaluates a
+local-credit (FA-style) method on a residual architecture. Reviewers can
+use this as a checklist to verify whether the reported numbers are in the
+metric's meaningful regime.
+
+Copy the table below into your paper appendix or supplementary material.
+Fill in one row per `(method × architecture × dataset × seed)`.
+
+---
+
+## Method × architecture identification
+
+| field | value |
+|---|---|
+| Method | _e.g._ DFA, EP, SB, CB, BP |
+| Method local-credit signal | _formula_, _e.g._ `e_T B_l^T` for DFA |
+| Architecture | _e.g._ 4-block d=256 pre-LN ResidualMLP |
+| Has terminal LayerNorm before head | yes / no |
+| Dataset | _e.g._ CIFAR-10 |
+| Number of seeds reported | _e.g._ 3 |
+| Random feedback `Bs` (if any): seed/spec | |
+
+## Headline numbers
+
+| field | value |
+|---|---|
+| Test accuracy (mean ± std over seeds) | |
+| Headline Γ (cosine to BP gradient) | |
+| Aggregation: how is Γ collapsed? | _e.g._ "mean over layers, then mean over samples" |
+| Per-layer Γ table reported in §X | yes / no |
+
+## Diagnostic protocol numbers (this is the new content)
+
+| diagnostic | per-layer values | flag? |
+|---|---|---|
+| (a) ‖h_l‖₂ for l = 0 .. L | _e.g._ [255, 226, 211, 205, 203] | scale OK / **EXPLODED** |
+| (b) ‖g_l‖₂ for l = 0 .. L | _e.g._ [4e-4, 4e-4, 4e-4, 4e-4, 3e-4] | grad OK / **AT FLOOR** |
+| (c) cross-batch direction stability at layer L/2 | _e.g._ 0.099 (N=128, ≥8 batches) | OK / **DRIFT** |
+| (d) frozen-blocks baseline acc | _e.g._ 0.349 | trainable acc clears it by ≥2 pp / **UNDERCUT** |
+
+If any of (a)-(d) fires a flag, headline accuracy and Γ should be walked
+back or accompanied by an explicit caveat. We recommend using the bundled
+`protocol.diagnose(...)` function and pasting its `__str__` output as the
+appendix table.
+
+## Pipeline-pitfalls disclosure
+
+State explicitly:
+
+- [ ] We grepped for `tensor.norm(-1)` (the L₋₁ footgun) and confirmed all
+ gradient-norm computations use `tensor.norm(dim=-1)`.
+- [ ] We did not compute Γ in fp16 (or, if we did, we verified no underflow
+ and reported the precision).
+- [ ] We disclose the random feedback `Bs` used during training (if
+ applicable) and reproduced reported Γ with those exact `Bs`.
+- [ ] We report per-layer Γ in addition to any aggregate.
+- [ ] We report `‖h_l‖`, `‖g_l‖`, and `Γ_l` *together* for every layer.
+- [ ] We ran an architecture-matched frozen-random-blocks baseline and
+ report its accuracy alongside the trainable-blocks variant.
+
+## Internal control (strongly recommended)
+
+If your method is *not* the only candidate on this architecture, run the
+protocol on a second method that you have reason to believe operates in
+the meaningful regime (e.g., BP itself, or EP). The protocol should pass
+on the control. If it does not, you have an architecture problem rather
+than a method problem, and the paper should reflect that.