1 files changed, 69 insertions, 0 deletions
diff --git a/protocol/REPORTING_TEMPLATE.md b/protocol/REPORTING_TEMPLATE.md
new file mode 100644
index 0000000..5d4a187
--- /dev/null
+++ b/protocol/REPORTING_TEMPLATE.md
@@ -0,0 +1,69 @@
+# FA Evaluation Reporting Template
+
+A minimal, fillable reporting template for any paper that evaluates a
+local-credit (FA-style) method on a residual architecture. Reviewers can
+use this as a checklist to verify whether the reported numbers are in the
+metric's meaningful regime.
+
+Copy the table below into your paper appendix or supplementary material.
+Fill in one row per `(method × architecture × dataset × seed)`.
+
+---
+
+## Method × architecture identification
+
+| field | value |
+|---|---|
+| Method | _e.g._ DFA, EP, SB, CB, BP |
+| Method local-credit signal | _formula_, _e.g._ `e_T B_l^T` for DFA |
+| Architecture | _e.g._ 4-block d=256 pre-LN ResidualMLP |
+| Has terminal LayerNorm before head | yes / no |
+| Dataset | _e.g._ CIFAR-10 |
+| Number of seeds reported | _e.g._ 3 |
+| Random feedback `Bs` (if any): seed/spec | |
+
+## Headline numbers
+
+| field | value |
+|---|---|
+| Test accuracy (mean ± std over seeds) | |
+| Headline Γ (cosine to BP gradient) | |
+| Aggregation: how is Γ collapsed? | _e.g._ "mean over layers, then mean over samples" |
+| Per-layer Γ table reported in §X | yes / no |
+
+## Diagnostic protocol numbers (this is the new content)
+
+| diagnostic | per-layer values | flag? |
+|---|---|---|
+| (a) ‖h_l‖₂ for l = 0 .. L | _e.g._ [255, 226, 211, 205, 203] | scale OK / **EXPLODED** |
+| (b) ‖g_l‖₂ for l = 0 .. L | _e.g._ [4e-4, 4e-4, 4e-4, 4e-4, 3e-4] | grad OK / **AT FLOOR** |
+| (c) cross-batch direction stability at layer L/2 | _e.g._ 0.099 (N=128, ≥8 batches) | OK / **DRIFT** |
+| (d) frozen-blocks baseline acc | _e.g._ 0.349 | trainable acc clears it by ≥2 pp / **UNDERCUT** |
+
+If any of (a)-(d) fires a flag, headline accuracy and Γ should be walked
+back or accompanied by an explicit caveat. We recommend using the bundled
+`protocol.diagnose(...)` function and pasting its `__str__` output as the
+appendix table.
+
+## Pipeline-pitfalls disclosure
+
+State explicitly:
+
+- [ ] We grepped for `tensor.norm(-1)` (the L₋₁ footgun) and confirmed all
+      gradient-norm computations use `tensor.norm(dim=-1)`.
+- [ ] We did not compute Γ in fp16 (or, if we did, we verified no underflow
+      and reported the precision).
+- [ ] We disclose the random feedback `Bs` used during training (if
+      applicable) and reproduced reported Γ with those exact `Bs`.
+- [ ] We report per-layer Γ in addition to any aggregate.
+- [ ] We report `‖h_l‖`, `‖g_l‖`, and `Γ_l` *together* for every layer.
+- [ ] We ran an architecture-matched frozen-random-blocks baseline and
+      report its accuracy alongside the trainable-blocks variant.
+
+## Internal control (strongly recommended)
+
+If your method is *not* the only candidate on this architecture, run the
+protocol on a second method that you have reason to believe operates in
+the meaningful regime (e.g., BP itself, or EP). The protocol should pass
+on the control. If it does not, you have an architecture problem rather
+than a method problem, and the paper should reflect that.