diff options
Diffstat (limited to 'protocol/REPORTING_TEMPLATE.md')
| -rw-r--r-- | protocol/REPORTING_TEMPLATE.md | 69 |
1 files changed, 69 insertions, 0 deletions
diff --git a/protocol/REPORTING_TEMPLATE.md b/protocol/REPORTING_TEMPLATE.md new file mode 100644 index 0000000..5d4a187 --- /dev/null +++ b/protocol/REPORTING_TEMPLATE.md @@ -0,0 +1,69 @@ +# FA Evaluation Reporting Template + +A minimal, fillable reporting template for any paper that evaluates a +local-credit (FA-style) method on a residual architecture. Reviewers can +use this as a checklist to verify whether the reported numbers are in the +metric's meaningful regime. + +Copy the table below into your paper appendix or supplementary material. +Fill in one row per `(method × architecture × dataset × seed)`. + +--- + +## Method × architecture identification + +| field | value | +|---|---| +| Method | _e.g._ DFA, EP, SB, CB, BP | +| Method local-credit signal | _formula_, _e.g._ `e_T B_l^T` for DFA | +| Architecture | _e.g._ 4-block d=256 pre-LN ResidualMLP | +| Has terminal LayerNorm before head | yes / no | +| Dataset | _e.g._ CIFAR-10 | +| Number of seeds reported | _e.g._ 3 | +| Random feedback `Bs` (if any): seed/spec | | + +## Headline numbers + +| field | value | +|---|---| +| Test accuracy (mean ± std over seeds) | | +| Headline Γ (cosine to BP gradient) | | +| Aggregation: how is Γ collapsed? | _e.g._ "mean over layers, then mean over samples" | +| Per-layer Γ table reported in §X | yes / no | + +## Diagnostic protocol numbers (this is the new content) + +| diagnostic | per-layer values | flag? | +|---|---|---| +| (a) ‖h_l‖₂ for l = 0 .. L | _e.g._ [255, 226, 211, 205, 203] | scale OK / **EXPLODED** | +| (b) ‖g_l‖₂ for l = 0 .. L | _e.g._ [4e-4, 4e-4, 4e-4, 4e-4, 3e-4] | grad OK / **AT FLOOR** | +| (c) cross-batch direction stability at layer L/2 | _e.g._ 0.099 (N=128, ≥8 batches) | OK / **DRIFT** | +| (d) frozen-blocks baseline acc | _e.g._ 0.349 | trainable acc clears it by ≥2 pp / **UNDERCUT** | + +If any of (a)-(d) fires a flag, headline accuracy and Γ should be walked +back or accompanied by an explicit caveat. We recommend using the bundled +`protocol.diagnose(...)` function and pasting its `__str__` output as the +appendix table. + +## Pipeline-pitfalls disclosure + +State explicitly: + +- [ ] We grepped for `tensor.norm(-1)` (the L₋₁ footgun) and confirmed all + gradient-norm computations use `tensor.norm(dim=-1)`. +- [ ] We did not compute Γ in fp16 (or, if we did, we verified no underflow + and reported the precision). +- [ ] We disclose the random feedback `Bs` used during training (if + applicable) and reproduced reported Γ with those exact `Bs`. +- [ ] We report per-layer Γ in addition to any aggregate. +- [ ] We report `‖h_l‖`, `‖g_l‖`, and `Γ_l` *together* for every layer. +- [ ] We ran an architecture-matched frozen-random-blocks baseline and + report its accuracy alongside the trainable-blocks variant. + +## Internal control (strongly recommended) + +If your method is *not* the only candidate on this architecture, run the +protocol on a second method that you have reason to believe operates in +the meaningful regime (e.g., BP itself, or EP). The protocol should pass +on the control. If it does not, you have an architecture problem rather +than a method problem, and the paper should reflect that. |
