faeval.git/paper, branch master

Add new experiment scripts, figures, and paper assets; untrack pyc/build artifacts

2026-06-14T09:06:32+00:00

Co-Authored-By: Claude Opus 4.8 (1M context)

paper v2.39: restructure §1-§7 into subsection hierarchy (codex proposal)

2026-04-09T04:03:58+00:00

Per codex consultation, refactored §1-§7 from "section + paragraph
headers" into proper \subsection{} hierarchy. 14 subsections total.

Key decisions (from codex):
- §3 (Mode 1) and §4 (Mode 2) kept SEPARATE, not merged. Avoids
  overcommitting to the Mode 2 → Mode 1 hypothesis too early in the
  narrative.
- §5 (Intervention + Cross-arch) kept as one section with two subsections.
  Splitting would create thin sections.
- Moved old §4 ¶4 "Method-dependent severity..." into new §5.1 "Penalty
  rescue and sweep" — it's intervention-stage evidence about method
  severity after Mode 1 is alleviated, fits with rescue/sweep/cost
  evidence rather than core Mode 2 identification.

New structure:

  §1 Introduction
    1.1 Standard FA reporting
    1.2 Two failure modes and contributions
  §2 Audit: Standard Reporting Walks Back Nothing
    2.1 Audit setup and probes
    2.2 The status-quo reading fails
  §3 Failure Mode 1: Measurement Degeneracy
    3.1 Mode 1 signatures
    3.2 Terminal-LN control
  §4 Failure Mode 2: Low Intrinsic Credit-Direction Quality
    4.1 Mode 2 under valid measurement
    4.2 Functional triangulation [paper's strongest new claim]
    4.3 From Mode 2 to Mode 1? [the causal hypothesis]
  §5 Intervention and Cross-Architecture Evidence
    5.1 Penalty rescue and sweep [absorbed old §4 ¶4]
    5.2 Cost and transfer
  §6 Recommended FA Evaluation Protocol
    6.1 Validity-first screening
    6.2 Diagnostic roles and calibration
  §7 Discussion, Limits, Conclusion
    7.1 Scope and reporting recommendation
    7.2 Open questions

All \paragraph{} inline bold headers from §1-§7 were stripped since the
\subsection{} hierarchy now carries the structure. Appendix A and B
(Pitfalls Catalog) retain their \paragraph{} headers since they're
structured by pitfall/concept rather than subsection.

Page count: 20 (unchanged). 0 errors, 0 overfull boxes.

Co-Authored-By: Claude Opus 4.6 (1M context)

paper v2.38: ddof=1 statistical convention sweep (sample std with Bessel)

2026-04-09T03:43:52+00:00

User picked option B: convert all 3-seed std values across the paper
from ddof=0 (population) to ddof=1 (Bessel-corrected sample std).

Bessel correction for n=3: ddof=1 std is √(3/2) ≈ 1.22× larger than
ddof=0. About 22% inflation per value.

49 ± value replacements across §1, §2, §4, §5, Tables 1+2+9, Appendix
H, Appendix L, and intermediate prose. Major updates:

  Table 1 (5-method audit accuracies):
    BP   0.615 ± 0.003 → 0.615 ± 0.004
    EP   0.316 ± 0.030 → 0.316 ± 0.037
    DFA  0.306 ± 0.006 → 0.306 ± 0.008
    SB   0.205 ± 0.032 → 0.205 ± 0.039
    CB   0.289 ± 0.026 → 0.289 ± 0.031

  Frozen baseline:
    0.349 ± 0.002 → 0.349 ± 0.003 (4 occurrences)

  §5 matched 30-ep controls:
    BP no-pen 0.585 ± 0.001 (already 0.001, ddof=1 = 0.0009 stays)
    BP+pen    0.532 ± 0.006 → 0.532 ± 0.007
    DFA no-pen 0.301 ± 0.005 → 0.301 ± 0.006
    DFA+pen   0.360 ± 0.001 → 0.360 ± 0.002
    SB+pen    0.453 ± 0.003 stays (0.0030 → 0.003)
    CB+pen    0.360 ± 0.003 → 0.360 ± 0.004

  §4 ¶4 cosines:
    SB+pen cos +0.322 ± 0.007 → +0.322 ± 0.008
    CB+pen cos +0.679 ± 0.008 → +0.679 ± 0.010
    DFA+pen cos +0.151 ± 0.025 stays (pooled n=12 ddof=1=0.0247)

  §4 ¶4 perturbation rho:
    SB+pen rho +0.402 ± 0.015 → +0.402 ± 0.019
    CB+pen rho +0.464 ± 0.025 → +0.464 ± 0.030
    DFA+pen rho +0.080 ± 0.011 → +0.080 ± 0.012

  §4 ¶4 nudging test:
    SB  -1.93 ± 0.11×10^-3 → -1.93 ± 0.14×10^-3
    CB  -4.26 ± 0.24×10^-4 → -4.26 ± 0.29×10^-4
    DFA -4.98 ± 0.44×10^-5 → -4.98 ± 0.53×10^-5

  §4 ¶4 training loss decrease:
    SB  -0.447 ± 0.008 → -0.447 ± 0.010
    CB  -0.121 ± 0.003 stays
    DFA -0.095 ± 0.007 → -0.095 ± 0.008

  §4 ¶1 vanilla DFA early-epoch deep cos / rho:
    -0.008 ± 0.013 → -0.008 ± 0.016
    -0.018 ± 0.018 → -0.018 ± 0.017
    -0.003 ± 0.005 → -0.003 ± 0.004

  Appendix H L=4 d=512 3-seed:
    DFA layer-0 +0.412 ± 0.011 → +0.412 ± 0.013
    DFA deep -0.0004 ± 0.0008 → -0.0004 ± 0.0009
    CB deep +0.039 ± 0.010 → +0.039 ± 0.012

  Appendix J Table 9 std rows updated to ddof=1 means.

  Appendix L drift values:
    DFA+pen w2 18.6 ± 0.5 → 18.6 ± 0.6
    DFA+pen embed 94.6 ± 1.4 → 94.6 ± 1.8
    (others unchanged within rounding)

The §3 ¶3 no-outln value 0.327 ± 0.012 was already ddof=1 (was the
single inconsistent place that motivated this sweep). Now the entire
paper uses ddof=1 consistently.

All means and per-seed values are unchanged (only the std reported
changes). Headline science conclusions all unchanged. Page count
20 (unchanged).

Co-Authored-By: Claude Opus 4.6 (1M context)

paper v2.37.3: add paragraph headers to Appendix A

2026-04-09T03:21:44+00:00

Appendix A (Reference Implementation) was the only main section
without \paragraph{} headers. Added two:

  - Release scope (existing ¶1: what the release contains)
  - Repository organization (existing ¶2: how the code is structured)

Now every multi-paragraph section in the paper has explicit paragraph
headers for skim-readability. Other appendices either are single-paragraph
already (so don't need headers) or have explicit \paragraph{Pitfall N:}
structure (Appendix B).

Page count: 20 (unchanged).

Co-Authored-By: Claude Opus 4.6 (1M context)

paper v2.37.2: §7 ¶2 soften the reference-impl extensibility claim

2026-04-09T02:50:39+00:00

The v2.37 §7 ¶2 said "the protocol code in Appendix A is structured
to make these extensions a configuration change rather than a new
experimental design." On re-reading Appendix A, this overclaims —
Appendix A describes the reference implementation as organized around
4 claim types but doesn't explicitly say "extensions are configuration
changes". The "configuration change rather than new experimental
design" framing is aspirational, not verified.

Softened to: "The reference implementation in Appendix A is intended
to support such extensions at the level of training-recipe and
architecture-class configuration so the audit pipeline itself does
not need to be re-derived."

Now the §7 forward-looking claim only commits to what Appendix A
actually describes (claim-organized, four artifact types) plus a
modest "intended to support" caveat.

Page count: 20 (unchanged).

Co-Authored-By: Claude Opus 4.6 (1M context)

paper v2.37.1: abstract mentions nudging + training-loss confirmation

2026-04-09T02:21:22+00:00

Earlier (during the page-budget-constrained polish loop) I tried to add
the nudging-test mention to the abstract but had to revert because it
pushed §7 onto p10. With page budget relaxed, re-attempting the update.

Old abstract sentence about Mode 2 dissociation:
  "...while Credit Bridge attains much higher deep BP cosine than DFA
  at the same final accuracy, a dissociation that motivates reporting
  layerwise credit quality jointly with a depth-utilization baseline."

New abstract sentence:
  "...while Credit Bridge attains roughly 4× DFA's deep BP cosine yet
  matches DFA's accuracy—a dissociation that single-step nudging and
  integrated training-loss decrease both confirm against the reverse
  cosine ordering, and that motivates reporting layerwise credit quality
  jointly with a depth-utilization baseline."

This now references the v2.33 functional triangulation in the abstract,
matching the §4 main-text framing. A reader of just the abstract now
sees the strongest form of the cos-vs-acc dissociation: it's not just
"CB has higher cos but same acc" (which could be a noisy single
measurement) but "three independent functional metrics rank the
methods opposite to deep cosine".

Page count: 20 (unchanged).

Co-Authored-By: Claude Opus 4.6 (1M context)

paper v2.37: §7 add 'Open questions and concrete next experiments'

2026-04-09T01:51:04+00:00

§7 currently has only the Scope/limits/recommendation paragraph.
Adding a second paragraph that explicitly flags the Mode 2 → Mode 1
hypothesis status as an open question and proposes two concrete
falsification tests, plus a wider-scope replication path.

The new paragraph:

1. Acknowledges the Mode 2 → Mode 1 causal reading is a hypothesis,
   not a theorem, and that the parallel-failure reading is also
   formally consistent with the data.

2. Proposes a *direct* test: measure per-block forward-state-change
   content along the training trajectory and check whether per-block
   loss decrease tracks per-block credit usefulness more tightly than
   per-block cosine.

3. Proposes a *falsification* test for the downstream-of-Mode-2 reading:
   substitute the random B_l with a high-quality credit signal (sparse,
   learned, or weight-transport-restored à la Akrout 2019) at fixed
   ‖f_l‖ and check whether Mode 1 activation growth still appears. If
   yes, Mode 1 is NOT downstream of Mode 2.

4. Notes the wider-scope replication path: CIFAR-100, Tiny-ImageNet,
   architectures outside ResMLP/ViT-Mini, with a pointer to Appendix A
   as the structured configuration entry point.

This explicitly answers the reviewer question "what would falsify
your hypothesis?" without overclaiming. It positions the paper as
honest about open questions and points at concrete next steps.

Page count: 20 (unchanged) — the paragraph fit within the existing
slack.

Co-Authored-By: Claude Opus 4.6 (1M context)

paper v2.36.1: extend \paragraph{} headers to §1, §2, §7

2026-04-09T01:23:54+00:00

v2.36 added paragraph headers to §3-§6. Extending the same treatment
to §1, §2, and §7 for consistent skim-readability across the whole
main content.

§1 (Introduction):
  - Feedback alignment and the standard reporting pair
  - The standard pair fails to validate
  - Two failure modes and their separability
  - Contribution: a methodology paper, not a new FA variant

§2 (Audit):
  - Setup: 5-method audit on a 4-block pre-LayerNorm ResMLP
  - State Bridge and Credit Bridge: diagnostic probes constructed for this paper
  - Status-quo reading: every method looks acceptable
  - EP as the internal control: low accuracy without invalid measurement
  - Frozen-blocks baseline overturns the status-quo reading

§7 (Discussion):
  - Scope, limits, and reporting recommendation

All §1-§7 main content paragraphs now have inline bold headers
naming what each paragraph argues for.

Page count: 20 (unchanged from v2.36) — paragraph headers don't take
much extra space, and the layout already had ~1 line of slack.

Co-Authored-By: Claude Opus 4.6 (1M context)

paper v2.36: add \paragraph{} headers throughout §3-§6 for readability

2026-04-09T01:22:05+00:00

User feedback: "你能不能分点subsection甚至subsubsection？或者至少paragraph。
现在读起来很不顺畅，因为看起来太大块了"

The recent v2.34-v2.35 expansions made several §4 paragraphs very long
and dense. The user asked for visible structure (subsection /
subsubsection / at least paragraph headers).

Added \paragraph{} bold inline headers throughout §3-§6:

§3 (Mode 1):
  - The two parts of Mode 1
  - Falsification chain: four alternative attributions
  - Causal control: removing terminal LayerNorm on the same backbone

§4 (Mode 2):
  - Mode 2 is present even when measurement is meaningful
  - A second metric with different failure modes agrees
  - Per-layer reporting is mandatory: layer-0 dominance
  - Method-dependent severity once Mode 1 is alleviated
  - Three functional metrics rank the methods consistently; cosine disagrees
  - A three-part proposition: observation, inference, mechanism hypothesis
  - Mode 1 may be a downstream symptom of Mode 2
  - Hypothesis status and reporting rule

§5 (Intervention):
  - The penalty rescues the measurement regime
  - Penalty alleviates Mode 2 only partially; the λ sweep separates the modes
  - Capacity-cost control: BP under the same penalty
  - Cross-architecture and depth-sweep evidence

§6 (Protocol):
  - Start from measurement validity
  - Decision value: which diagnostics actually walk back which methods
  - Diagnostic roles and calibration

Each paragraph now has a bold inline header that names what it argues
for, so a reader skimming the §3-§6 mainline can navigate by header
rather than parsing dense prose blocks.

Used \paragraph{} (not \subsection{}) because:
1. \subsection{} would renumber the TOC and add another level of
   heading depth
2. \paragraph{} is the standard LaTeX inline-bold header that's
   visually distinct without breaking the section structure
3. Doesn't affect the figure/table numbering

Page impact: total pages 19 → 20 (paragraph headers add ~1 line per
header). Per the user's relaxed page budget, this is acceptable. §7
still starts on p10, references on p10-12.

Co-Authored-By: Claude Opus 4.6 (1M context)

paper v2.35: add Figure 2 - cross-method cos-vs-accuracy dissociation

2026-04-09T01:17:43+00:00

User said "you don't need to worry about page count for now", which
freed up the page budget for substantive additions. Highest-yield
substantive addition: a visual figure for the §4 ¶4 cross-method
dissociation that the user previously flagged as the paper's
strongest new observation but is currently text-only.

New figure: paper/figures/fig_cos_acc_dissociation.pdf
- Parallel-coordinates / slope-chart style
- 4 columns: deep cos | accuracy | |nudging| | training-loss decrease
- 3 lines: SB+pen (blue), CB+pen (red), DFA+pen (gray)
- Each metric normalized to [0, 1] with raw values annotated
- Shaded "cos: CB top" region on the left vs labeled
  "accuracy / nudging / training-loss: SB top" on the right
- The X-pattern between cos and accuracy makes the dissociation
  visually immediate: SB rises from middle (cos) to top (functional),
  CB falls from top (cos) to tied with DFA (functional)

Inserted between §4 ¶4 (Mode 2 mechanism) and §5 (intervention).
Referenced from the §4 ¶4 functional measurements paragraph as
"Figure 2".

Why this figure replaces the prose-only argument's burden of proof:
the X-pattern visualization is a single glance vs paragraph parsing.
Reviewers will see "deep cosine ranks differently from 3 functional
metrics" without needing to track the numbers.

Important design choice: did NOT include deep ρ in the figure, even
though it's in §4 ¶2, because ρ ranks CB > SB > DFA (same as cos),
not the SB > CB > DFA pattern of the functional metrics. ρ groups
with cos as a "directional alignment" metric, while the functional
triad (accuracy, nudging, training-loss) groups around forward-state
usefulness. The figure caption notes this distinction implicitly
by listing only the three functional metrics.

Page impact: total 18 → 19 pages, main content §1-§7 now spans
p1-p10 (was p1-p9). Per user's relaxed constraint, page count is no
longer the binding constraint. Figure auto-shifts the figure
numbering: cos_acc_dissoc is now Figure 2, temporal_cross_arch
becomes Figure 3, penalty_rescue → Figure 4, cross_arch_summary
→ Figure 5. All figure references use \\ref{} so they auto-update.

Co-Authored-By: Claude Opus 4.6 (1M context)