faeval.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 18:47:26 -0500
committer	YurenHao0426 <Blackhao0426@gmail.com>	2026-04-08 18:47:26 -0500
commit	ffbb53cb59eeea47f7967c4b4654cf2ee73395a9 (patch)
tree	b01bfe2b75ab21072cee6eedbf8996b9db93d6c8 /results/confirmatory/A2_cifar_state_vs_credit.csv
parent	348e0f4e19be654febc9061ae873493a58080f91 (diff)

paper v2.31.13: §6 ¶3 (c) ranges replaced with audit-data ranges

The (c) calibration ranges "0.05-0.18 healthy, 0.5-0.99 drift-dominated" overstated the separation. Re-aggregated from results/protocol_audit/audit_table_s42_s123_s456.json: Healthy (BP+EP) 6 values: [-0.036, -0.024, 0.087, 0.099, 0.114, 0.120] range = [-0.036, 0.120], median 0.093 (NOT "0.05 to 0.18" — has negative values, max < 0.18) Degen (DFA+SB+CB) 9 values: [-0.005, 0.035, 0.047, 0.250, 0.352, 0.436, 0.518, 0.561, 0.992] range = [-0.005, 0.992], median 0.352 (only 5/9 above 0.30, only 3/9 above 0.50) The (c) discriminator has substantial overlap between healthy and degen distributions on this metric — the paper already calls (c) a "sub-mode discriminator" not a binary detector, so the loose calibration is acknowledged in framing but the numerical ranges should match the data. Updated to: "healthy methods cluster near zero with all six BP/EP values in [-0.04, +0.12], while drift-dominated cases reach high tails up to +0.99, and 5/9 degenerate values exceed the 0.30 default cutoff". This is more honest and points at the audit JSON. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Diffstat (limited to 'results/confirmatory/A2_cifar_state_vs_credit.csv')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: