# FA Evaluation Reporting Template A minimal, fillable reporting template for any paper that evaluates a local-credit (FA-style) method on a residual architecture. Reviewers can use this as a checklist to verify whether the reported numbers are in the metric's meaningful regime. Copy the table below into your paper appendix or supplementary material. Fill in one row per `(method × architecture × dataset × seed)`. --- ## Method × architecture identification | field | value | |---|---| | Method | _e.g._ DFA, EP, SB, CB, BP | | Method local-credit signal | _formula_, _e.g._ `e_T B_l^T` for DFA | | Architecture | _e.g._ 4-block d=256 pre-LN ResidualMLP | | Has terminal LayerNorm before head | yes / no | | Dataset | _e.g._ CIFAR-10 | | Number of seeds reported | _e.g._ 3 | | Random feedback `Bs` (if any): seed/spec | | ## Headline numbers | field | value | |---|---| | Test accuracy (mean ± std over seeds) | | | Headline Γ (cosine to BP gradient) | | | Aggregation: how is Γ collapsed? | _e.g._ "mean over layers, then mean over samples" | | Per-layer Γ table reported in §X | yes / no | ## Diagnostic protocol numbers (this is the new content) | diagnostic | per-layer values | flag? | |---|---|---| | (a) ‖h_l‖₂ for l = 0 .. L | _e.g._ [255, 226, 211, 205, 203] | scale OK / **EXPLODED** | | (b) ‖g_l‖₂ for l = 0 .. L | _e.g._ [4e-4, 4e-4, 4e-4, 4e-4, 3e-4] | grad OK / **AT FLOOR** | | (c) cross-batch direction stability at layer L/2 | _e.g._ 0.099 (N=128, ≥8 batches) | OK / **DRIFT** | | (d) frozen-blocks baseline acc | _e.g._ 0.349 | trainable acc clears it by ≥2 pp / **UNDERCUT** | If any of (a)-(d) fires a flag, headline accuracy and Γ should be walked back or accompanied by an explicit caveat. We recommend using the bundled `protocol.diagnose(...)` function and pasting its `__str__` output as the appendix table. ## Pipeline-pitfalls disclosure State explicitly: - [ ] We grepped for `tensor.norm(-1)` (the L₋₁ footgun) and confirmed all gradient-norm computations use `tensor.norm(dim=-1)`. - [ ] We did not compute Γ in fp16 (or, if we did, we verified no underflow and reported the precision). - [ ] We disclose the random feedback `Bs` used during training (if applicable) and reproduced reported Γ with those exact `Bs`. - [ ] We report per-layer Γ in addition to any aggregate. - [ ] We report `‖h_l‖`, `‖g_l‖`, and `Γ_l` *together* for every layer. - [ ] We ran an architecture-matched frozen-random-blocks baseline and report its accuracy alongside the trainable-blocks variant. ## Internal control (strongly recommended) If your method is *not* the only candidate on this architecture, run the protocol on a second method that you have reason to believe operates in the meaningful regime (e.g., BP itself, or EP). The protocol should pass on the control. If it does not, you have an architecture problem rather than a method problem, and the paper should reflect that.