| Age | Commit message (Collapse) | Author |
|
Verified by extracting per-layer gamma_dfa from existing ViT-Mini snapshot
JSON (3 seeds, final epoch). On ViT all 4 layers have per-layer cosine
near zero (~0.001 with eps clamp); no layer dominates. Compare to ResMLP
where layer 0 has +0.42 and layers 1-4 are essentially zero.
The pitfall is real on ResMLP but the specific 'layer 0 dominates' framing
doesn't generalize to ViT. Reframed as 'aggregation hides per-layer
structure'; lesson is to always report per-layer values regardless of
which architecture-specific pattern you might be hiding.
|
|
not saved
Discovered in our own cnn_baseline.py: when the random feedback Bs (for
DFA) or bridge predictor (for SB/CB) are not persisted alongside the
model checkpoint, post-hoc Gamma computation cannot reconstruct the
local credit signal. Instead of erroring, the script falls back to
cos(BP_grad, BP_grad) = 1.0 and records that as Gamma. Reader who
doesn't notice the small 'Gamma_note' field interprets 1.0 as perfect
alignment.
Recommendation: always save aux nets alongside checkpoints; if they're
missing, report Gamma as N/A, not 1.0.
|
|
Codex round 15 #1 priority for the E&D-track paper:
- protocol/protocol.py: 4 diagnostics (residual norms, BP grad norms,
cross-batch direction stability, and a frozen-baseline comparator)
- protocol/report.py: DiagnosticReport with per-diagnostic verdicts and
pretty-printer
- protocol/smoke_test.py: validates BP/DFA/EP checkpoints produce the
expected verdicts (BP/EP trustworthy; DFA walked back via residual
explosion + BP grad at floor)
- protocol/README.md: usage, audit cases, threshold rationale
- protocol/CHECKLIST.md: 6 evaluation pipeline pitfalls (norm(-1),
cosine_similarity eps clamp, fp16 underflow, Bs reproducibility,
aggregation, layer-0 dominance)
- protocol/REPORTING_TEMPLATE.md: per-method fillable form for FA papers
|