protocol/REPORTING_TEMPLATE.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69

# FA Evaluation Reporting Template

A minimal, fillable reporting template for any paper that evaluates a
local-credit (FA-style) method on a residual architecture. Reviewers can
use this as a checklist to verify whether the reported numbers are in the
metric's meaningful regime.

Copy the table below into your paper appendix or supplementary material.
Fill in one row per `(method × architecture × dataset × seed)`.

---

## Method × architecture identification

| field | value |
|---|---|
| Method | _e.g._ DFA, EP, SB, CB, BP |
| Method local-credit signal | _formula_, _e.g._ `e_T B_l^T` for DFA |
| Architecture | _e.g._ 4-block d=256 pre-LN ResidualMLP |
| Has terminal LayerNorm before head | yes / no |
| Dataset | _e.g._ CIFAR-10 |
| Number of seeds reported | _e.g._ 3 |
| Random feedback `Bs` (if any): seed/spec | |

## Headline numbers

| field | value |
|---|---|
| Test accuracy (mean ± std over seeds) | |
| Headline Γ (cosine to BP gradient) | |
| Aggregation: how is Γ collapsed? | _e.g._ "mean over layers, then mean over samples" |
| Per-layer Γ table reported in §X | yes / no |

## Diagnostic protocol numbers (this is the new content)

| diagnostic | per-layer values | flag? |
|---|---|---|
| (a) ‖h_l‖₂ for l = 0 .. L | _e.g._ [255, 226, 211, 205, 203] | scale OK / **EXPLODED** |
| (b) ‖g_l‖₂ for l = 0 .. L | _e.g._ [4e-4, 4e-4, 4e-4, 4e-4, 3e-4] | grad OK / **AT FLOOR** |
| (c) cross-batch direction stability at layer L/2 | _e.g._ 0.099 (N=128, ≥8 batches) | OK / **DRIFT** |
| (d) frozen-blocks baseline acc | _e.g._ 0.349 | trainable acc clears it by ≥2 pp / **UNDERCUT** |

If any of (a)-(d) fires a flag, headline accuracy and Γ should be walked
back or accompanied by an explicit caveat. We recommend using the bundled
`protocol.diagnose(...)` function and pasting its `__str__` output as the
appendix table.

## Pipeline-pitfalls disclosure

State explicitly:

- [ ] We grepped for `tensor.norm(-1)` (the L₋₁ footgun) and confirmed all
      gradient-norm computations use `tensor.norm(dim=-1)`.
- [ ] We did not compute Γ in fp16 (or, if we did, we verified no underflow
      and reported the precision).
- [ ] We disclose the random feedback `Bs` used during training (if
      applicable) and reproduced reported Γ with those exact `Bs`.
- [ ] We report per-layer Γ in addition to any aggregate.
- [ ] We report `‖h_l‖`, `‖g_l‖`, and `Γ_l` *together* for every layer.
- [ ] We ran an architecture-matched frozen-random-blocks baseline and
      report its accuracy alongside the trainable-blocks variant.

## Internal control (strongly recommended)

If your method is *not* the only candidate on this architecture, run the
protocol on a second method that you have reason to believe operates in
the meaningful regime (e.g., BP itself, or EP). The protocol should pass
on the control. If it does not, you have an architecture problem rather
than a method problem, and the paper should reflect that.