diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-07 22:20:48 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-07 22:20:48 -0500 |
| commit | 7b64702ad970c16171142665365e16a8e1737190 (patch) | |
| tree | 30df9313fc195aa037f05c0598fdc0df8b32bbcc /protocol/REPORTING_TEMPLATE.md | |
| parent | 2dc8e7efb5f2ff827fbd97be87f0127aa5ab2757 (diff) | |
Add FA diagnostic protocol reference implementation
Codex round 15 #1 priority for the E&D-track paper:
- protocol/protocol.py: 4 diagnostics (residual norms, BP grad norms,
cross-batch direction stability, and a frozen-baseline comparator)
- protocol/report.py: DiagnosticReport with per-diagnostic verdicts and
pretty-printer
- protocol/smoke_test.py: validates BP/DFA/EP checkpoints produce the
expected verdicts (BP/EP trustworthy; DFA walked back via residual
explosion + BP grad at floor)
- protocol/README.md: usage, audit cases, threshold rationale
- protocol/CHECKLIST.md: 6 evaluation pipeline pitfalls (norm(-1),
cosine_similarity eps clamp, fp16 underflow, Bs reproducibility,
aggregation, layer-0 dominance)
- protocol/REPORTING_TEMPLATE.md: per-method fillable form for FA papers
Diffstat (limited to 'protocol/REPORTING_TEMPLATE.md')
| -rw-r--r-- | protocol/REPORTING_TEMPLATE.md | 69 |
1 files changed, 69 insertions, 0 deletions
diff --git a/protocol/REPORTING_TEMPLATE.md b/protocol/REPORTING_TEMPLATE.md new file mode 100644 index 0000000..5d4a187 --- /dev/null +++ b/protocol/REPORTING_TEMPLATE.md @@ -0,0 +1,69 @@ +# FA Evaluation Reporting Template + +A minimal, fillable reporting template for any paper that evaluates a +local-credit (FA-style) method on a residual architecture. Reviewers can +use this as a checklist to verify whether the reported numbers are in the +metric's meaningful regime. + +Copy the table below into your paper appendix or supplementary material. +Fill in one row per `(method × architecture × dataset × seed)`. + +--- + +## Method × architecture identification + +| field | value | +|---|---| +| Method | _e.g._ DFA, EP, SB, CB, BP | +| Method local-credit signal | _formula_, _e.g._ `e_T B_l^T` for DFA | +| Architecture | _e.g._ 4-block d=256 pre-LN ResidualMLP | +| Has terminal LayerNorm before head | yes / no | +| Dataset | _e.g._ CIFAR-10 | +| Number of seeds reported | _e.g._ 3 | +| Random feedback `Bs` (if any): seed/spec | | + +## Headline numbers + +| field | value | +|---|---| +| Test accuracy (mean ± std over seeds) | | +| Headline Γ (cosine to BP gradient) | | +| Aggregation: how is Γ collapsed? | _e.g._ "mean over layers, then mean over samples" | +| Per-layer Γ table reported in §X | yes / no | + +## Diagnostic protocol numbers (this is the new content) + +| diagnostic | per-layer values | flag? | +|---|---|---| +| (a) ‖h_l‖₂ for l = 0 .. L | _e.g._ [255, 226, 211, 205, 203] | scale OK / **EXPLODED** | +| (b) ‖g_l‖₂ for l = 0 .. L | _e.g._ [4e-4, 4e-4, 4e-4, 4e-4, 3e-4] | grad OK / **AT FLOOR** | +| (c) cross-batch direction stability at layer L/2 | _e.g._ 0.099 (N=128, ≥8 batches) | OK / **DRIFT** | +| (d) frozen-blocks baseline acc | _e.g._ 0.349 | trainable acc clears it by ≥2 pp / **UNDERCUT** | + +If any of (a)-(d) fires a flag, headline accuracy and Γ should be walked +back or accompanied by an explicit caveat. We recommend using the bundled +`protocol.diagnose(...)` function and pasting its `__str__` output as the +appendix table. + +## Pipeline-pitfalls disclosure + +State explicitly: + +- [ ] We grepped for `tensor.norm(-1)` (the L₋₁ footgun) and confirmed all + gradient-norm computations use `tensor.norm(dim=-1)`. +- [ ] We did not compute Γ in fp16 (or, if we did, we verified no underflow + and reported the precision). +- [ ] We disclose the random feedback `Bs` used during training (if + applicable) and reproduced reported Γ with those exact `Bs`. +- [ ] We report per-layer Γ in addition to any aggregate. +- [ ] We report `‖h_l‖`, `‖g_l‖`, and `Γ_l` *together* for every layer. +- [ ] We ran an architecture-matched frozen-random-blocks baseline and + report its accuracy alongside the trainable-blocks variant. + +## Internal control (strongly recommended) + +If your method is *not* the only candidate on this architecture, run the +protocol on a second method that you have reason to believe operates in +the meaningful regime (e.g., BP itself, or EP). The protocol should pass +on the control. If it does not, you have an architecture problem rather +than a method problem, and the paper should reflect that. |
