faeval.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2026-04-08	protocol/protocol.py: sync (c) range docstring with v2.31.13 paper update	YurenHao0426
	The cross_batch_direction_stability docstring claimed healthy values "~0.05-0.18" and drift-dominated "~0.5-0.99" — these were the same loose ranges that v2.31.13 corrected in the paper §6 ¶3. Re-aggregated from results/protocol_audit/audit_table_s42_s123_s456.json (K=10 batches of 128 samples): Healthy 6 BP+EP values: range [-0.036, 0.120], median 0.093 Degen 9 DFA/SB/CB values: range [-0.005, 0.992], median 0.352 5/9 above 0.30 cutoff 3/9 above 0.50 Updated docstring to match the actual audit data and point at the JSON source. Now the paper §6 ¶3 prose and the protocol.py docstring agree exactly on the (c) calibration ranges. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07	Add FA diagnostic protocol reference implementation	YurenHao0426
	Codex round 15 #1 priority for the E&D-track paper: - protocol/protocol.py: 4 diagnostics (residual norms, BP grad norms, cross-batch direction stability, and a frozen-baseline comparator) - protocol/report.py: DiagnosticReport with per-diagnostic verdicts and pretty-printer - protocol/smoke_test.py: validates BP/DFA/EP checkpoints produce the expected verdicts (BP/EP trustworthy; DFA walked back via residual explosion + BP grad at floor) - protocol/README.md: usage, audit cases, threshold rationale - protocol/CHECKLIST.md: 6 evaluation pipeline pitfalls (norm(-1), cosine_similarity eps clamp, fp16 underflow, Bs reproducibility, aggregation, layer-0 dominance) - protocol/REPORTING_TEMPLATE.md: per-method fillable form for FA papers