faeval.git/paper/figures, branch master

Add new experiment scripts, figures, and paper assets; untrack pyc/build artifacts

2026-06-14T09:06:32+00:00

Co-Authored-By: Claude Opus 4.8 (1M context)

paper v2.35: add Figure 2 - cross-method cos-vs-accuracy dissociation

2026-04-09T01:17:43+00:00

User said "you don't need to worry about page count for now", which
freed up the page budget for substantive additions. Highest-yield
substantive addition: a visual figure for the §4 ¶4 cross-method
dissociation that the user previously flagged as the paper's
strongest new observation but is currently text-only.

New figure: paper/figures/fig_cos_acc_dissociation.pdf
- Parallel-coordinates / slope-chart style
- 4 columns: deep cos | accuracy | |nudging| | training-loss decrease
- 3 lines: SB+pen (blue), CB+pen (red), DFA+pen (gray)
- Each metric normalized to [0, 1] with raw values annotated
- Shaded "cos: CB top" region on the left vs labeled
  "accuracy / nudging / training-loss: SB top" on the right
- The X-pattern between cos and accuracy makes the dissociation
  visually immediate: SB rises from middle (cos) to top (functional),
  CB falls from top (cos) to tied with DFA (functional)

Inserted between §4 ¶4 (Mode 2 mechanism) and §5 (intervention).
Referenced from the §4 ¶4 functional measurements paragraph as
"Figure 2".

Why this figure replaces the prose-only argument's burden of proof:
the X-pattern visualization is a single glance vs paragraph parsing.
Reviewers will see "deep cosine ranks differently from 3 functional
metrics" without needing to track the numbers.

Important design choice: did NOT include deep ρ in the figure, even
though it's in §4 ¶2, because ρ ranks CB > SB > DFA (same as cos),
not the SB > CB > DFA pattern of the functional metrics. ρ groups
with cos as a "directional alignment" metric, while the functional
triad (accuracy, nudging, training-loss) groups around forward-state
usefulness. The figure caption notes this distinction implicitly
by listing only the three functional metrics.

Page impact: total 18 → 19 pages, main content §1-§7 now spans
p1-p10 (was p1-p9). Per user's relaxed constraint, page count is no
longer the binding constraint. Figure auto-shifts the figure
numbering: cos_acc_dissoc is now Figure 2, temporal_cross_arch
becomes Figure 3, penalty_rescue → Figure 4, cross_arch_summary
→ Figure 5. All figure references use \\ref{} so they auto-update.

Co-Authored-By: Claude Opus 4.6 (1M context)

paper v2.34.3: fix Figure 4 cross-arch verdict matrix (data + layout)

2026-04-09T01:10:14+00:00

User flagged Figure 4 issues. Found three problems:

1. **Row 4 (no-terminal-LN ResMLP) (d) frozen** was encoded as 0 (passes)
   but the actual data is no-outln DFA acc 0.327 ± 0.012 (3-seed) vs
   frozen baseline 0.349 ± 0.002 → margin -2.2 pp, beyond the 2 pp
   threshold → (d) FIRES. Updated to 1 (WB).

2. **Row 5 (CNN BN) cells (c) and (d)** were encoded as 0 (passes) but
   the CNN audit (results/protocol_audit/audit_cnn_3seed.json) only
   measured (a) and (b); there is no CNN frozen baseline and no CNN
   stability run. Showing them as ✓ was misleading. Added a third color
   (gray, "—") for "not measured" and marked CNN (c)+(d) accordingly.

3. **Layout** had massive empty vertical space below the panels with
   the key-finding text floating far below. Compressed:
   - figsize (11, 4.2) → (11, 3.2) [tighter aspect ratio]
   - Key-finding text moved from axes-coordinates y=-1.55 (way below
     plot) to figure-coordinates y=-0.05 (directly under panels)
   - BP panel title clarified: "BP-trained: protocol passes" →
     "BP-trained: protocol passes everywhere"

Also marked ViT-Mini (c) and no-LN ResMLP (c) as "not measured" since
neither has a saved cross_batch_stability value (the audit_cnn,
audit_d512, snapshot_vit_v1, and snapshot_no_outln_v1 files don't
include this diagnostic).

New verdict matrix:
                    (a) (b) (c) (d)
  ResMLP-d256 LN    WB  WB  ✓   WB
  ResMLP-d512 LN    WB  WB  ✓   WB
  ViT-Mini          WB  WB  —   WB
  ResMLP-d256 no-LN WB  ✓   —   WB   ← row 4 (d) was wrong
  CNN BN            WB  ✓   —   —    ← row 5 (c)+(d) were misleading

Key finding "(b) only fires on terminal-LN architectures" is unchanged
and now visually clearer (rows 1-3 have WB in (b), rows 4-5 have ✓).

Page impact: total page count 19 → 18 (the more compact figure
reclaimed an entire page). §1-§7 main content still fits on 9 pages.

Updated docstring with full data sources for each row.

Co-Authored-By: Claude Opus 4.6 (1M context)

paper v2.32: BP+penalty multi-seeded (was single-seed s42)

2026-04-09T00:11:40+00:00

The §5 ¶3 BP+penalty value (0.530, +18.1 pp margin) was single-seed s42.
Ran s123 and s456 to multi-seed it, matching the BP-no-pen 3-seed control.

3-seed BP+pen 30ep results (lam=0.01, AdamW lr=1e-3 wd=0.01, cosine, batch 128):
  s42:  0.5303, +18.13 pp vs frozen
  s123: 0.5262, +17.72 pp
  s456: 0.5397, +19.07 pp
  3-seed mean: 0.5321 ± 0.0057, +18.31 pp

Updates:
- §5 ¶3: BP+pen "0.530 (single seed)" → "0.532 ± 0.006" (3-seed)
- §5 ¶3: BP penalty cost -5.5 pp → -5.3 pp
- §5 ¶3: BP+pen margin +18.1 → +18.3 pp
- §5 ¶3: BP-to-DFA gap 17.0 → 17.2 pp
- §4 ¶4: BP+pen +18.1 → +18.3 pp comparison
- Figure 3 panel C bar values: BP with_pen 0.530 → 0.532
- Figure 3 panel C title: BP-pen-cost -5.5pp → -5.3pp

The +18.3 pp 3-seed mean is essentially the same as the s42 single-seed
+18.13 pp, so the headline conclusion (BP+pen far above frozen baseline,
huge gap vs DFA+pen) is unchanged. This commit removes the last
single-seed value labeled as a key control.

New auditable file: results/bp_with_penalty_3seed_summary.json

Page layout preserved: 9 pages main, refs p10, 0 overfull boxes.

Co-Authored-By: Claude Opus 4.6 (1M context)

paper v2.31.9: relabel "StudentNet" → "no-terminal-LN ResMLP"

2026-04-08T23:32:23+00:00

The §3 ¶3 / §5 ¶3 / Figure 5 / §7 mentions of "StudentNet" as a
cross-architecture validation case were a misleading rebrand of the
no-terminal-LN ResMLP-d256 ablation. Verified by tracing the data:

  results/protocol_audit/temporal_evolution_s{42,123,456}.json
    final_acc 0.332/0.313/0.336 (matches no-outln 3-seed 0.327±0.012)
    first_fire_a {18, 14, 25}
    first_fire_b None / None / None

The actual synth StudentNet (results/snapshot_synth_v1, d=128 alpha=1.0)
has max-per-block growth ~6.88 over 80 epochs and never reaches the
50× threshold, so diagnostic (a) does NOT fire on the real synth
StudentNet at all. Calling the no-outln data "StudentNet" doubled-
counted the same architecture under two names (the same-backbone
causal control AND the cross-arch generalization test).

Relabeled to "no-terminal-LN ResMLP" everywhere it appeared:
- §3 ¶3 paragraph 1 cross-arch list
- §3 ¶3 paragraph 2 (now with explicit per-seed first-fire epochs {18,14,25})
- §5 paragraph (the conclusion)
- §7 conclusion (cross-arch list)
- Figure 5 caption
- Figure 5 row label (with re-rendered PDF)

The remaining cross-arch generalization claim is now: ViT-Mini fires
both diagnostics, ResMLP at d=256/d=512 fires both, no-terminal-LN
ResMLP and BatchNorm CNN fire only (a) — three real architecture
classes, with the no-LN ablation being the same-backbone control rather
than a separate architecture. The cross-arch story is slightly weaker
("3 architecture classes" not "4") but truthful and self-consistent.

Co-Authored-By: Claude Opus 4.6 (1M context)

paper v2.31: matched 30-epoch BP/DFA controls (was unsourced 0.609/0.308)

2026-04-08T23:03:16+00:00

The §5 ¶3 BP-no-penalty value of 0.609 ± 0.004 and DFA-no-penalty value
of 0.308 ± 0.014 turned out to be unsourced — they were carried over
from a hardcoded comment in experiments/bp_with_penalty_control.py
("BP-trainable (3-seed mean): 0.609") that nobody had actually measured
with a matched 30-epoch run.

Ran the missing matched controls under the same recipe as BP+pen
(lam=0, 30 epochs, AdamW 1e-3, wd 0.01, cosine schedule, batch 128,
3 seeds 42/123/456):

  BP no-pen 30ep: per-seed 0.5851, 0.5845, 0.5863  →  0.585 ± 0.001
                  (paper said 0.609 ± 0.004, off by 0.024)
  DFA no-pen 30ep: per-seed 0.3070, 0.2985, 0.2966 →  0.301 ± 0.005
                  (paper said 0.308 ± 0.014)

Also re-grounded DFA+penalty 30ep using the dfa_pen_short 3-seed run
(0.3593, 0.3610, 0.3604 → 0.360 ± 0.001), which is what the deep-cosine
+0.155 figure was computed on. The paper had 0.363 ± 0.001 — that came
from the 100-epoch run, not the 30-epoch run, so it was an apples-to-
oranges comparison with BP+pen 30-ep.

Paper changes (§5 ¶3):
  BP penalty cost:  -8 pp  →  -5.5 pp
  DFA pen rescue:   +5.5 → +5.9 pp
  DFA+pen margin vs frozen: +1.4 → +1.1 pp
  BP-to-DFA gap:     17 → 17.0 pp (unchanged)
  BP-to-SB gap:      7.7 → 7.7 pp (unchanged)
  BP-to-DFA gap is still the lower-bound credit-quality cost claim;
  17 pp gap is unchanged in magnitude.

Also updated:
- §5 ¶1 prose: 0.363 → 0.360, 0.308 → 0.301
- §4 ¶4 prose: DFA+pen 0.363 → 0.360
- Appendix J Table 9 caption: 0.363 → 0.360, +9.0 → +9.3 pp gap to SB
- Appendix L paragraph: +5.5 → +5.9 pp DFA penalty rescue
- Figure 3 panel C bar values + title pen-cost annotation
- New results/matched_30ep_control_summary.json as auditable record

Page layout preserved: 9 main pages + refs p10, 18 total, 0 overfull.

Co-Authored-By: Claude Opus 4.6 (1M context)

Figures 3 and 4: fix aspect ratio (fig3 was squeezed strip) and key-finding label overlap (fig4)

2026-04-08T19:35:20+00:00

Per user feedback:
- fig4_penalty_rescue.pdf (Figure 3 in paper): was figsize=(13, 3.5), aspect 3.7:1,
  which rendered as a thin strip with squeezed subplot content. Increased height
  to figsize=(13, 6.0), aspect 2.2:1. Much taller panels that actually show axis
  labels and legends readably.
- fig5_cross_arch_summary.pdf (Figure 4 in paper): the 'Key finding' italic text
  annotation at y=-1.0 in axes transform was overlapping with the multiline
  architecture y-tick labels at the bottom of the second subplot. Moved to
  y=-1.55 and increased figsize height from 3.5 to 4.2 so the lower annotation
  still fits in bbox_inches='tight' crop.
- Also bumped includegraphics width from 0.92\linewidth to \linewidth for both
  figures so they use the full text width.

Main content still exactly 9 pages within E&D budget.

Co-Authored-By: Claude Opus 4.6 (1M context)

Fill in tables 1-3 + generate figures 2/4/5 from existing data

2026-04-08T09:46:59+00:00

Tables filled with real values:
  Table 1: 5-method audit (3-seed mean ± std for acc, headline Γ, verdict)
  Table 2: 4-condition mode 2 validation (cos and ρ values from existing
           checkpoint measurements)
  Table 3: protocol thresholds (50×, 1e-7, 0.30, 2pp)

Figures generated from existing data:
  fig2_decision_utility.pdf: 5×7 verdict heatmap from
    results/protocol_audit/ablation_decision_utility.json
  fig4_penalty_rescue.pdf: 3-panel — trajectory + cos/ρ bars + 2×2 acc
    from snapshot_evolution_v2 + dfa_residual_penalty + bp_with_penalty
  fig5_cross_arch_summary.pdf: 5×4 BP/DFA verdict matrix across
    architectures

Compiles to 8 pages with all tables/figures rendered. §1-§7 main body
still has only paragraph topic sentences (TODO: per-section prose
filling via codex). Figure numbering is wrong (codex put figures in
section order not numerical order — need fixing).

v2 skeleton from round 25: section structure now matches round 23

2026-04-08T09:43:39+00:00

Round 24's skeleton had 3 deviations from round 23 redo:
  - Made §3 'Diagnostic Protocol' instead of 'Failure Mode 1'
  - Collapsed Mode 1 + Mode 2 into one §4
  - Added §6 'Reference Implementation' (was supposed to be dropped)

Round 25 fixed all three. New §3-§7 match round 23 redo exactly:
  §3 Failure Mode 1: Measurement Degeneracy
  §4 Failure Mode 2: Low Intrinsic Credit-Direction Quality
  §5 Intervention and Cross-Architecture Evidence
  §6 Recommended FA Evaluation Protocol
  §7 Discussion, Limits, Conclusion

Also added:
  - In-line bibliography with 12 \bibitem entries (Paleka, O'Bray, Jordan
    + FA literature) — citations resolve correctly now
  - Appendices A-G with actual prose content (not just headers)
  - 7-pitfall catalog with descriptions
  - Walk-back chain methodology paragraph
  - 7-validation summary table

Compiles to 9 pages with figures 1+3 inline (existing PNGs) and figures
2/4/5 as placeholder text PDFs (TODO: regenerate). Tables 1/2/3 still
have TODO placeholders for numerical values.

Next: fill in tables 1-3 with existing JSON data, generate figures 2/4/5
from existing data, then consult codex per-section for prose filling.