{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Reproduce all figures and tables\n", "\n", "**Paper**: *Beyond Accuracy and Alignment: A Diagnostic Protocol for Evaluating Feedback Alignment* (NeurIPS 2026 E&D track)\n", "\n", "This notebook walks through reproducing every figure and table in the paper from the saved JSON / log files in `results/`. Every cell pulls from a saved file (no training is invoked), so the entire notebook runs in seconds and serves as the auditable single-source-of-truth pointer for each cited number.\n", "\n", "**What this notebook reproduces**:\n", "- Table 1 (5-method audit accuracies)\n", "- Table 2 (mode validation)\n", "- Table 3 (protocol definition — static, no data)\n", "- Table 5 (depth sweep, Appendix H)\n", "- Table 6 (no-residual ablation, Appendix H)\n", "- Table 9 (SB+CB penalty rescue, Appendix J)\n", "- Figure 1 (audit hero) — references the saved figure file\n", "- Figure 2 (cross-method dissociation) — re-renders from saved JSON\n", "- Figure 3 (temporal cross-arch) — references saved\n", "- Figure 4 (penalty rescue) — re-renders from saved JSON\n", "- Figure 5 (cross-arch verdict matrix) — re-renders from hand-encoded data\n", "- §4 ¶4 cross-method functional triangulation (nudging + training-loss decrease)\n", "- §5 ¶3 BP+penalty 2x2 control\n", "- Appendix L drift values\n", "- Appendix M layer-0 dominance per-seed table\n", "\n", "**What this notebook does NOT do**:\n", "- Re-train any model (use the experiment scripts in `experiments/` for that)\n", "- Re-measure cosines on saved checkpoints (use `experiments/measure_direction_quality_existing_ckpt.py`)\n", "\n", "All values that the paper cites are derived in cells below by loading the corresponding `results/*.json` file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "import numpy as np\n", "from pathlib import Path\n", "import os\n", "\n", "REPO_ROOT = Path('/home/yurenh2/fa')\n", "os.chdir(REPO_ROOT)\n", "\n", "def both_stds(vals):\n", " \"\"\"Return (mean, ddof=0 std, ddof=1 std) for a list of measurements.\n", " \n", " The paper uses ddof=1 (sample std with Bessel correction).\n", " \"\"\"\n", " return np.mean(vals), np.std(vals, ddof=0), np.std(vals, ddof=1)\n", "\n", "def load_json(rel):\n", " return json.load(open(REPO_ROOT / rel))\n", "\n", "print('repo root:', REPO_ROOT)\n", "print('saved auditable files:')\n", "for f in sorted((REPO_ROOT / 'results').glob('*.json')):\n", " print(f' ', f.name)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table 1 — 5-method audit on the 4-block d=256 pre-LayerNorm ResMLP\n", "\n", "**Source**: `results/protocol_audit/audit_table_s42_s123_s456.json` (3 seeds × 5 methods)\n", "\n", "Each row shows test accuracy ± sample std (ddof=1) and headline Γ at the converged 100-epoch checkpoint." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = load_json('results/protocol_audit/audit_table_s42_s123_s456.json')\n", "print(f'{\"method\":<14} {\"acc (±ddof=1)\":<18} {\"per-seed accs\"}')\n", "print('-' * 70)\n", "for m, label in [('bp', 'BP'), ('ep', 'EP'), ('dfa', 'DFA'),\n", " ('state_bridge', 'State Bridge'), ('credit_bridge', 'Credit Bridge')]:\n", " accs = [d['reports'][f'{m}_s{s}']['headline_acc'] for s in [42, 123, 456]]\n", " mean, _, ddof1 = both_stds(accs)\n", " print(f'{label:<14} {mean:.3f} ± {ddof1:.3f} {[f\"{a:.4f}\" for a in accs]}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Frozen-blocks baseline\n", "\n", "**Source**: `results/resmlp_frozen_blocks_s{42,123,456}.log`\n", "\n", "DFA-shallow accuracy (the architecture-matched baseline used as the comparison for diagnostic (d))." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import re\n", "shallow = []\n", "for s in [42, 123, 456]:\n", " log = open(REPO_ROOT / f'results/resmlp_frozen_blocks_s{s}.log').read()\n", " m = re.search(r'FINAL DFA-shallow: (\\d+\\.\\d+)', log)\n", " if m: shallow.append(float(m.group(1)))\n", "mean, _, ddof1 = both_stds(shallow)\n", "print(f'DFA-shallow (frozen baseline): {mean:.3f} ± {ddof1:.3f}')\n", "print(f' per-seed: {shallow}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## §5 — matched 30-epoch BP/DFA controls (with and without penalty)\n", "\n", "**Sources**:\n", "- BP no-pen: `results/bp_no_penalty_30ep/bp_pen_lam0.0_s{42,123,456}.json`\n", "- BP+pen: `results/bp_with_penalty/bp_pen_lam0.01_s{42,123,456}.json`\n", "- DFA no-pen: `results/dfa_no_penalty_30ep/results_cifar10.json`\n", "- DFA+pen: `results/dfa_pen_short/dfa_pen_lam0.01_s{42,123,456}.json`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# BP no-pen\n", "bp_nopen = [load_json(f'results/bp_no_penalty_30ep/bp_pen_lam0.0_s{s}.json')['final_acc'] for s in [42, 123, 456]]\n", "mean, _, ddof1 = both_stds(bp_nopen)\n", "print(f'BP no-pen 30ep: {mean:.3f} ± {ddof1:.3f}')\n", "\n", "# BP+pen\n", "bp_pen = [load_json(f'results/bp_with_penalty/bp_pen_lam0.01_s{s}.json')['final_acc'] for s in [42, 123, 456]]\n", "mean, _, ddof1 = both_stds(bp_pen)\n", "print(f'BP+pen 30ep: {mean:.3f} ± {ddof1:.3f}')\n", "\n", "# DFA no-pen\n", "d = load_json('results/dfa_no_penalty_30ep/results_cifar10.json')\n", "dfa_nopen = [d[str(s)]['dfa']['log']['test_acc'][-1] for s in [42, 123, 456]]\n", "mean, _, ddof1 = both_stds(dfa_nopen)\n", "print(f'DFA no-pen 30ep: {mean:.3f} ± {ddof1:.3f}')\n", "\n", "# DFA+pen\n", "dfa_pen = [load_json(f'results/dfa_pen_short/dfa_pen_lam0.01_s{s}.json')['final_test_acc'] for s in [42, 123, 456]]\n", "mean, _, ddof1 = both_stds(dfa_pen)\n", "print(f'DFA+pen 30ep: {mean:.3f} ± {ddof1:.3f}')\n", "\n", "# Penalty cost / margin math\n", "frozen = 0.349\n", "print()\n", "print('§5 ¶3 derived quantities:')\n", "print(f' BP penalty cost: {(np.mean(bp_nopen) - np.mean(bp_pen))*100:.1f} pp')\n", "print(f' DFA penalty rescue: {(np.mean(dfa_pen) - np.mean(dfa_nopen))*100:.1f} pp')\n", "print(f' BP+pen margin vs frozen: {(np.mean(bp_pen) - frozen)*100:.1f} pp')\n", "print(f' DFA+pen margin vs frozen: {(np.mean(dfa_pen) - frozen)*100:.1f} pp')\n", "print(f' BP-to-DFA gap (under penalty): {(np.mean(bp_pen) - np.mean(dfa_pen))*100:.1f} pp')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## §4 ¶4 — SB+pen, CB+pen, DFA+pen accuracies, cosines, ρ" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "files_by_method = {\n", " 'state_bridge': [\n", " ('round38_sbcb_penalty_30ep', '42'),\n", " ('round38_sb_penalty_30ep_s123', '123'),\n", " ('round38_sb_penalty_30ep_s456', '456'),\n", " ],\n", " 'credit_bridge': [\n", " ('round38_sbcb_penalty_30ep', '42'),\n", " ('round38_cb_penalty_30ep_s123', '123'),\n", " ('round38_cb_penalty_30ep_s456', '456'),\n", " ],\n", " 'dfa': [\n", " ('round41_dfa_penalty_30ep', '42'),\n", " ('round41_dfa_penalty_30ep_s123', '123'),\n", " ('round41_dfa_penalty_30ep_s456', '456'),\n", " ],\n", "}\n", "\n", "labels = {'state_bridge': 'SB+pen', 'credit_bridge': 'CB+pen', 'dfa': 'DFA+pen'}\n", "for m, files in files_by_method.items():\n", " accs, cos_deep, rho_deep = [], [], []\n", " for tag, sk in files:\n", " d = load_json(f'results/{tag}/results_cifar10.json')\n", " accs.append(d[sk][m]['log']['test_acc'][-1])\n", " diag = d[sk][m]['diagnostics']\n", " cos_deep.append(np.mean(diag['bp_cosine'][1:]))\n", " rho_deep.append(np.mean(diag['perturbation_rho'][1:]))\n", " a_m, _, a_s = both_stds(accs)\n", " c_m, _, c_s = both_stds(cos_deep)\n", " r_m, _, r_s = both_stds(rho_deep)\n", " print(f'{labels[m]:<8} acc {a_m:.3f}±{a_s:.3f} cos {c_m:+.3f}±{c_s:.3f} rho {r_m:+.3f}±{r_s:.3f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## §4 ¶4 — Nudging test 3-seed (the strongest functional metric)\n", "\n", "**Source**: `results/nudging_test_3seed_summary.json`\n", "\n", "Single-step loss change for a step of size η=0.01 along the per-layer credit direction at the converged checkpoint, averaged over the deep blocks (l1+)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n = load_json('results/nudging_test_3seed_summary.json')\n", "for m, label in [('state_bridge', 'SB+pen'), ('credit_bridge', 'CB+pen'), ('dfa', 'DFA+pen')]:\n", " vals = [v['deep_mean'] for v in n['methods'][m]['per_seed'].values()]\n", " mean, _, ddof1 = both_stds(vals)\n", " print(f'{label:<8}: {mean:.2e} ± {ddof1:.2e} (per seed: {[f\"{v:.2e}\" for v in vals]})')\n", "\n", "sb = n['methods']['state_bridge']['three_seed_deep_mean']\n", "cb = n['methods']['credit_bridge']['three_seed_deep_mean']\n", "dfa = n['methods']['dfa']['three_seed_deep_mean']\n", "print()\n", "print(f'SB / CB ratio: {sb / cb:.2f}')\n", "print(f'SB / DFA ratio: {sb / dfa:.2f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## §4 ¶4 — Training loss decrease 3-seed\n", "\n", "**Source**: `results/training_loss_decrease_3seed.json`\n", "\n", "Loss[ep1] − Loss[ep30] for each method, averaged over 3 seeds." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = load_json('results/training_loss_decrease_3seed.json')\n", "for m, label in [('state_bridge', 'SB+pen'), ('credit_bridge', 'CB+pen'), ('dfa', 'DFA+pen')]:\n", " vals = [v['decrease'] for v in t['per_method'][m]['per_seed'].values()]\n", " mean, _, ddof1 = both_stds(vals)\n", " print(f'{label:<8}: {mean:.4f} ± {ddof1:.4f} (per seed: {[f\"{v:.4f}\" for v in vals]})')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Appendix M — vanilla DFA early-epoch per-layer cosines (layer-0 dominance)\n", "\n", "**Source**: `results/vanilla_dfa_early_ckpts/per_layer_cos_3seed.json`\n", "\n", "Per-seed × per-epoch × per-layer cosine measurements showing that the headline Γ on vanilla DFA is driven entirely by layer 0, with all deep layers (1-4) at noise." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = load_json('results/vanilla_dfa_early_ckpts/per_layer_cos_3seed.json')\n", "print(f'{\"key\":<12} {\"l0\":>8} {\"l1\":>8} {\"l2\":>8} {\"l3\":>8} {\"l4\":>8} {\"||g_2||\"}')\n", "for k, v in d.items():\n", " cos = v['per_layer_cos']\n", " g2 = v['per_layer_g_norm_median'][2]\n", " print(f'{k:<12} ' + ' '.join(f'{c:+8.3f}' for c in cos) + f' {g2:.2e}')\n", "\n", "# Aggregate stats\n", "ep1 = [np.mean(d[f's{s}_ep1']['per_layer_cos'][1:]) for s in [42, 123, 456]]\n", "mean, _, ddof1 = both_stds(ep1)\n", "print(f'\\nep 1 deep mean (3-seed): {mean:.4f} ± {ddof1:.4f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## §6 ¶1 — protocol calibration gaps (for the 4-diagnostic protocol)\n", "\n", "**Source**: `results/protocol_audit/audit_table_s42_s123_s456.json`\n", "\n", "The 24,338× and 63× gaps between healthy (BP/EP) and degenerate (DFA/SB/CB) reference quantities." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "d = load_json('results/protocol_audit/audit_table_s42_s123_s456.json')\n", "\n", "# Per-seed g_L (deepest BP gradient norm)\n", "healthy_g, degen_g = [], []\n", "for m in ['bp', 'ep']:\n", " for s in [42, 123, 456]:\n", " g = d['reports'][f'{m}_s{s}']['bp_grad_norms'][-1]\n", " healthy_g.append(g)\n", "for m in ['dfa', 'state_bridge', 'credit_bridge']:\n", " for s in [42, 123, 456]:\n", " g = d['reports'][f'{m}_s{s}']['bp_grad_norms'][-1]\n", " degen_g.append(g)\n", "\n", "print(f'min healthy ||g_L|| = {min(healthy_g):.2e}')\n", "print(f'max degenerate ||g_L|| = {max(degen_g):.2e}')\n", "print(f'gap factor = {min(healthy_g) / max(degen_g):.0f}×')\n", "print()\n", "\n", "# Per-seed max-per-block growth\n", "healthy_growth, degen_growth = [], []\n", "for m in ['bp', 'ep']:\n", " for s in [42, 123, 456]:\n", " res = d['reports'][f'{m}_s{s}']['residual_norms']\n", " ratios = [res[i+1]/res[i] for i in range(len(res)-1)]\n", " healthy_growth.append(max(ratios))\n", "for m in ['dfa', 'state_bridge', 'credit_bridge']:\n", " for s in [42, 123, 456]:\n", " res = d['reports'][f'{m}_s{s}']['residual_norms']\n", " ratios = [res[i+1]/res[i] for i in range(len(res)-1)]\n", " degen_growth.append(max(ratios))\n", "\n", "print(f'max healthy per-block growth = {max(healthy_growth):.2f}')\n", "print(f'min degenerate per-block growth = {min(degen_growth):.2f}')\n", "print(f'gap factor = {min(degen_growth) / max(healthy_growth):.1f}×')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## §6 ¶2 — fresh-B null calibration (penalty creates real signal)\n", "\n", "**Source**: `results/null_calibration_penalized_dfa.json`\n", "\n", "20 fresh random-B draws on the penalized DFA s42 checkpoint, vs the training-Bs deep cosine." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "n = load_json('results/null_calibration_penalized_dfa.json')\n", "print(f'training-Bs deep cos (s42): {n[\"training_Bs_deep_cos\"]:+.4f}')\n", "print(f'fresh-Bs deep cos (n=20): {n[\"fresh_Bs_deep_mean_of_per_draw_means\"]:+.4f} ± {n[\"fresh_Bs_deep_std_of_per_draw_means_ddof0\"]:.4f}')\n", "print()\n", "print('per-layer std across 20 fresh-B draws:')\n", "for i, s in enumerate(n['fresh_Bs_per_layer_std_ddof0']):\n", " print(f' l{i}: {s:.4f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## §3 ¶3 — no-terminal-LN ResMLP same-backbone control\n", "\n", "**Source**: `results/snapshot_no_outln_v1/snapshot_noLN_s{42,123,456}.json`\n", "\n", "Removing terminal LN from the same backbone preserves Mode 1(a) but eliminates Mode 1(b)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "hL_vals, gL_vals, accs = [], [], []\n", "for s in [42, 123, 456]:\n", " d = load_json(f'results/snapshot_no_outln_v1/snapshot_noLN_s{s}.json')\n", " final = d['dfa_log'][-1]\n", " hL_vals.append(final['hidden_norms'][-1])\n", " gL_vals.append(final['bp_grad_per_sample_l2_med'][-1])\n", " accs.append(final['acc_eval'])\n", "\n", "print(f'no-outln DFA 100ep, 3 seeds:')\n", "print(f' ||h_L|| 3-seed mean: {np.mean(hL_vals):.2e} (per seed: {[f\"{v:.2e}\" for v in hL_vals]})')\n", "print(f' ||g_L|| 3-seed mean: {np.mean(gL_vals):.2e} (per seed: {[f\"{v:.2e}\" for v in gL_vals]})')\n", "mean, _, ddof1 = both_stds(accs)\n", "print(f' test acc: {mean:.3f} ± {ddof1:.3f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Figure 2 — re-render the cross-method dissociation visualization\n", "\n", "**Renderer**: `paper/figures/render_fig_cos_acc_dissociation.py`\n", "\n", "Re-running the renderer regenerates `paper/figures/fig_cos_acc_dissociation.pdf`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import subprocess\n", "result = subprocess.run(['python3', 'paper/figures/render_fig_cos_acc_dissociation.py'],\n", " capture_output=True, text=True)\n", "print(result.stdout)\n", "print(result.stderr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Figure 4 — re-render penalty rescue panels\n", "\n", "**Renderer**: `paper/figures/render_fig4_penalty_rescue.py`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = subprocess.run(['python3', 'paper/figures/render_fig4_penalty_rescue.py'],\n", " capture_output=True, text=True)\n", "print(result.stdout)\n", "print(result.stderr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Figure 5 — re-render cross-architecture verdict matrix\n", "\n", "**Renderer**: `paper/figures/render_fig5_cross_arch.py`\n", "\n", "The verdict matrix is hand-encoded based on the per-row data sources (see the script's docstring for which JSON each row references)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = subprocess.run(['python3', 'paper/figures/render_fig5_cross_arch.py'],\n", " capture_output=True, text=True)\n", "print(result.stdout)\n", "print(result.stderr)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Compile the paper PDF\n", "\n", "Final step: re-run tectonic on `paper/main.tex` to produce a fresh PDF that incorporates any updated figures." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "result = subprocess.run(['tectonic', 'paper/main.tex'],\n", " capture_output=True, text=True, cwd=str(REPO_ROOT))\n", "# print last 500 chars of stderr (tectonic warnings/errors)\n", "print(result.stderr[-500:] if result.stderr else 'no stderr')\n", "print()\n", "import subprocess as sp\n", "info = sp.run(['pdfinfo', 'paper/main.pdf'], capture_output=True, text=True, cwd=str(REPO_ROOT))\n", "for line in info.stdout.split('\\n'):\n", " if 'Pages' in line: print(line)\n", "print(f'\\nPDF: {REPO_ROOT}/paper/main.pdf')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Summary\n", "\n", "All paper figures and tables can be reproduced from the following saved files:\n", "\n", "| Source | Used by |\n", "|---|---|\n", "| `results/protocol_audit/audit_table_s42_s123_s456.json` | Table 1, Figure 1, §6 ¶1 |\n", "| `results/protocol_audit/audit_d512_3seed.json` | Appendix H d=512 |\n", "| `results/protocol_audit/audit_cnn_3seed.json` | §3 ¶3 / §5 ¶3 CNN values, Figure 5 |\n", "| `results/protocol_audit/temporal_evolution_s{42,123,456}.json` | §3 ¶3 ep-4 g_L, Figure 5 row 4 |\n", "| `results/snapshot_no_outln_v1/snapshot_noLN_s{42,123,456}.json` | §3 ¶3 no-outln control |\n", "| `results/snapshot_evolution_v2/snapshot_evolution_s{42,123,456}.json` | §3 ¶1 endpoint values |\n", "| `results/dfa_pen_short/dfa_pen_lam0.01_s{42,123,456}.json` | DFA+pen 30ep |\n", "| `results/dfa_pen_short/dfa_pen_lam0.0001_s{42,123,456}.json` | §5 ¶2 λ=1e-4 |\n", "| `results/round38_sbcb_penalty_30ep/results_cifar10.json` (s42) | SB+pen, CB+pen s42 |\n", "| `results/round38_{sb,cb}_penalty_30ep_s{123,456}/results_cifar10.json` | SB+pen, CB+pen s123/s456 |\n", "| `results/round41_dfa_penalty_30ep{,_s{123,456}}/results_cifar10.json` | DFA+pen 30ep diagnostics |\n", "| `results/bp_no_penalty_30ep/bp_pen_lam0.0_s{42,123,456}.json` | §5 ¶3 BP no-pen matched |\n", "| `results/bp_with_penalty/bp_pen_lam0.01_s{42,123,456}.json` | §5 ¶3 BP+pen multi-seed |\n", "| `results/dfa_no_penalty_30ep/results_cifar10.json` | §5 ¶3 DFA no-pen matched |\n", "| `results/resmlp_frozen_blocks_s{42,123,456}.log` | Frozen baseline 0.349 |\n", "| `results/h2_no_residual_full_s{42,123,456}/snapshot_evolution_s{42,123,456}.json` | Appendix H no-residual ablation |\n", "| `results/optionA_random_targets_s42/snapshot_evolution_s42.json` | Appendix I random-target DFA |\n", "| `results/optionSBCB_smoke/results_cifar10.json` | Appendix I random-target SB/CB 3ep |\n", "| `results/optionSBCB_random_targets_s42/results_cifar10.json` | Appendix I random-target SB/CB 100ep |\n", "| `results/optionEP_smoke/ep_random_s42.pt` | EP random-target 5ep |\n", "| `results/optionEP_random_targets_full/ep_random_s42.pt` | EP random-target 100ep |\n", "| `results/ep_random_h_L_summary.json` | EP random-target h_L 3-seed |\n", "| `results/null_calibration_penalized_dfa.json` | §6 ¶2 fresh-B null |\n", "| `results/nudging_test_3seed_summary.json` | §4 ¶4 nudging test 3-seed |\n", "| `results/training_loss_decrease_3seed.json` | §4 ¶4 training-loss trajectory 3-seed |\n", "| `results/matched_30ep_control_summary.json` | §5 ¶3 matched 30-ep summary |\n", "| `results/bp_with_penalty_3seed_summary.json` | §5 ¶3 BP+pen 3-seed |\n", "| `results/vanilla_dfa_early_ckpts/per_layer_cos_3seed.json` | Appendix M layer-0 dominance |\n", "| `results/threshold_sensitivity_output.txt` | Appendix E threshold sweep |\n", "\n", "**Statistical convention**: as of v2.38, all 3-seed standard deviations in the paper use ddof=1 (sample std with Bessel correction). The `both_stds()` helper at the top of this notebook returns both ddof=0 and ddof=1 for any list of values; the paper-cited value is always the ddof=1 column.\n", "\n", "**To re-run the experiments themselves** (for re-training or re-measuring), see the corresponding scripts in `experiments/` and `protocol/examples/`. The training scripts each take a `--seed` argument; the standard 3-seed set is {42, 123, 456}." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python", "version": "3.13" } }, "nbformat": 4, "nbformat_minor": 4 }