<feed xmlns='http://www.w3.org/2005/Atom'>
<title>faeval.git/protocol/protocol.py, branch master</title>
<subtitle>Unnamed repository; edit this file 'description' to name the repository.
</subtitle>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/'/>
<entry>
<title>protocol/protocol.py: sync (c) range docstring with v2.31.13 paper update</title>
<updated>2026-04-09T00:21:56+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-09T00:21:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=bd873fea53ec917a01618799eeb97f770081ba53'/>
<id>bd873fea53ec917a01618799eeb97f770081ba53</id>
<content type='text'>
The cross_batch_direction_stability docstring claimed healthy values
"~0.05-0.18" and drift-dominated "~0.5-0.99" — these were the same
loose ranges that v2.31.13 corrected in the paper §6 ¶3.

Re-aggregated from results/protocol_audit/audit_table_s42_s123_s456.json
(K=10 batches of 128 samples):
  Healthy 6 BP+EP values: range [-0.036, 0.120], median 0.093
  Degen 9 DFA/SB/CB values: range [-0.005, 0.992], median 0.352
                           5/9 above 0.30 cutoff
                           3/9 above 0.50

Updated docstring to match the actual audit data and point at the
JSON source. Now the paper §6 ¶3 prose and the protocol.py docstring
agree exactly on the (c) calibration ranges.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The cross_batch_direction_stability docstring claimed healthy values
"~0.05-0.18" and drift-dominated "~0.5-0.99" — these were the same
loose ranges that v2.31.13 corrected in the paper §6 ¶3.

Re-aggregated from results/protocol_audit/audit_table_s42_s123_s456.json
(K=10 batches of 128 samples):
  Healthy 6 BP+EP values: range [-0.036, 0.120], median 0.093
  Degen 9 DFA/SB/CB values: range [-0.005, 0.992], median 0.352
                           5/9 above 0.30 cutoff
                           3/9 above 0.50

Updated docstring to match the actual audit data and point at the
JSON source. Now the paper §6 ¶3 prose and the protocol.py docstring
agree exactly on the (c) calibration ranges.

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Add FA diagnostic protocol reference implementation</title>
<updated>2026-04-08T03:20:48+00:00</updated>
<author>
<name>YurenHao0426</name>
<email>Blackhao0426@gmail.com</email>
</author>
<published>2026-04-08T03:20:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.blackhao.com/faeval.git/commit/?id=7b64702ad970c16171142665365e16a8e1737190'/>
<id>7b64702ad970c16171142665365e16a8e1737190</id>
<content type='text'>
Codex round 15 #1 priority for the E&amp;D-track paper:
  - protocol/protocol.py: 4 diagnostics (residual norms, BP grad norms,
    cross-batch direction stability, and a frozen-baseline comparator)
  - protocol/report.py: DiagnosticReport with per-diagnostic verdicts and
    pretty-printer
  - protocol/smoke_test.py: validates BP/DFA/EP checkpoints produce the
    expected verdicts (BP/EP trustworthy; DFA walked back via residual
    explosion + BP grad at floor)
  - protocol/README.md: usage, audit cases, threshold rationale
  - protocol/CHECKLIST.md: 6 evaluation pipeline pitfalls (norm(-1),
    cosine_similarity eps clamp, fp16 underflow, Bs reproducibility,
    aggregation, layer-0 dominance)
  - protocol/REPORTING_TEMPLATE.md: per-method fillable form for FA papers
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Codex round 15 #1 priority for the E&amp;D-track paper:
  - protocol/protocol.py: 4 diagnostics (residual norms, BP grad norms,
    cross-batch direction stability, and a frozen-baseline comparator)
  - protocol/report.py: DiagnosticReport with per-diagnostic verdicts and
    pretty-printer
  - protocol/smoke_test.py: validates BP/DFA/EP checkpoints produce the
    expected verdicts (BP/EP trustworthy; DFA walked back via residual
    explosion + BP grad at floor)
  - protocol/README.md: usage, audit cases, threshold rationale
  - protocol/CHECKLIST.md: 6 evaluation pipeline pitfalls (norm(-1),
    cosine_similarity eps clamp, fp16 underflow, Bs reproducibility,
    aggregation, layer-0 dominance)
  - protocol/REPORTING_TEMPLATE.md: per-method fillable form for FA papers
</pre>
</div>
</content>
</entry>
</feed>
