faeval.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2026-04-08	paper v2.31: matched 30-epoch BP/DFA controls (was unsourced 0.609/0.308)	YurenHao0426
	The §5 ¶3 BP-no-penalty value of 0.609 ± 0.004 and DFA-no-penalty value of 0.308 ± 0.014 turned out to be unsourced — they were carried over from a hardcoded comment in experiments/bp_with_penalty_control.py ("BP-trainable (3-seed mean): 0.609") that nobody had actually measured with a matched 30-epoch run. Ran the missing matched controls under the same recipe as BP+pen (lam=0, 30 epochs, AdamW 1e-3, wd 0.01, cosine schedule, batch 128, 3 seeds 42/123/456): BP no-pen 30ep: per-seed 0.5851, 0.5845, 0.5863 → 0.585 ± 0.001 (paper said 0.609 ± 0.004, off by 0.024) DFA no-pen 30ep: per-seed 0.3070, 0.2985, 0.2966 → 0.301 ± 0.005 (paper said 0.308 ± 0.014) Also re-grounded DFA+penalty 30ep using the dfa_pen_short 3-seed run (0.3593, 0.3610, 0.3604 → 0.360 ± 0.001), which is what the deep-cosine +0.155 figure was computed on. The paper had 0.363 ± 0.001 — that came from the 100-epoch run, not the 30-epoch run, so it was an apples-to- oranges comparison with BP+pen 30-ep. Paper changes (§5 ¶3): BP penalty cost: -8 pp → -5.5 pp DFA pen rescue: +5.5 → +5.9 pp DFA+pen margin vs frozen: +1.4 → +1.1 pp BP-to-DFA gap: 17 → 17.0 pp (unchanged) BP-to-SB gap: 7.7 → 7.7 pp (unchanged) BP-to-DFA gap is still the lower-bound credit-quality cost claim; 17 pp gap is unchanged in magnitude. Also updated: - §5 ¶1 prose: 0.363 → 0.360, 0.308 → 0.301 - §4 ¶4 prose: DFA+pen 0.363 → 0.360 - Appendix J Table 9 caption: 0.363 → 0.360, +9.0 → +9.3 pp gap to SB - Appendix L paragraph: +5.5 → +5.9 pp DFA penalty rescue - Figure 3 panel C bar values + title pen-cost annotation - New results/matched_30ep_control_summary.json as auditable record Page layout preserved: 9 main pages + refs p10, 18 total, 0 overfull. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08	Figures 3 and 4: fix aspect ratio (fig3 was squeezed strip) and key-finding ↵	YurenHao0426
	label overlap (fig4) Per user feedback: - fig4_penalty_rescue.pdf (Figure 3 in paper): was figsize=(13, 3.5), aspect 3.7:1, which rendered as a thin strip with squeezed subplot content. Increased height to figsize=(13, 6.0), aspect 2.2:1. Much taller panels that actually show axis labels and legends readably. - fig5_cross_arch_summary.pdf (Figure 4 in paper): the 'Key finding' italic text annotation at y=-1.0 in axes transform was overlapping with the multiline architecture y-tick labels at the bottom of the second subplot. Moved to y=-1.55 and increased figsize height from 3.5 to 4.2 so the lower annotation still fits in bbox_inches='tight' crop. - Also bumped includegraphics width from 0.92\linewidth to \linewidth for both figures so they use the full text width. Main content still exactly 9 pages within E&D budget. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08	Fill in tables 1-3 + generate figures 2/4/5 from existing data	YurenHao0426
	Tables filled with real values: Table 1: 5-method audit (3-seed mean ± std for acc, headline Γ, verdict) Table 2: 4-condition mode 2 validation (cos and ρ values from existing checkpoint measurements) Table 3: protocol thresholds (50×, 1e-7, 0.30, 2pp) Figures generated from existing data: fig2_decision_utility.pdf: 5×7 verdict heatmap from results/protocol_audit/ablation_decision_utility.json fig4_penalty_rescue.pdf: 3-panel — trajectory + cos/ρ bars + 2×2 acc from snapshot_evolution_v2 + dfa_residual_penalty + bp_with_penalty fig5_cross_arch_summary.pdf: 5×4 BP/DFA verdict matrix across architectures Compiles to 8 pages with all tables/figures rendered. §1-§7 main body still has only paragraph topic sentences (TODO: per-section prose filling via codex). Figure numbering is wrong (codex put figures in section order not numerical order — need fixing).