From 7aa7123e190cbae3f6ce55050666efcc2ce00796 Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Wed, 8 Apr 2026 23:05:18 -0500 Subject: Add reproduce_all.ipynb: walkthrough for every paper figure + table MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit User requested: "reproducibility: want an ipynb walkthrough that reproduces all figures/tables" reproduce_all.ipynb loads values from saved results/*.json files and re-derives every cited number and figure in the paper. Cells: 1. Table 1 (5-method audit accuracies, ddof=1) 2. Frozen-blocks baseline (DFA-shallow from resmlp_frozen_blocks_s*.log) 3. §5 matched 30-ep BP/DFA controls + penalty-cost math 4. §4 ¶4 SB/CB/DFA+pen accuracy/cosine/rho 5. §4 ¶4 nudging test 3-seed (from nudging_test_3seed_summary.json) 6. §4 ¶4 training loss decrease 3-seed 7. Appendix M vanilla DFA early-epoch per-layer cosines (layer-0 dominance) 8. §6 ¶1 protocol calibration gaps (24,338× and 63× math) 9. §6 ¶2 fresh-B null calibration 10. §3 ¶3 no-terminal-LN ResMLP control 11-13. Re-render Figure 2 (dissociation), Figure 4 (penalty rescue), Figure 5 (cross-arch matrix) from their scripts 14. Re-compile main.pdf via tectonic Every cited number in the paper is traceable to one of the loaded files, listed in the final summary table. Includes both_stds() helper that returns (mean, ddof=0, ddof=1) for any list — the paper uses ddof=1 throughout as of v2.38. To re-run training, use experiments/ scripts directly; this notebook is read-only on the saved results. Co-Authored-By: Claude Opus 4.6 (1M context) --- reproduce_all.ipynb | 569 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 569 insertions(+) create mode 100644 reproduce_all.ipynb diff --git a/reproduce_all.ipynb b/reproduce_all.ipynb new file mode 100644 index 0000000..52cc73f --- /dev/null +++ b/reproduce_all.ipynb @@ -0,0 +1,569 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Reproduce all figures and tables\n", + "\n", + "**Paper**: *Beyond Accuracy and Alignment: A Diagnostic Protocol for Evaluating Feedback Alignment* (NeurIPS 2026 E&D track)\n", + "\n", + "This notebook walks through reproducing every figure and table in the paper from the saved JSON / log files in `results/`. Every cell pulls from a saved file (no training is invoked), so the entire notebook runs in seconds and serves as the auditable single-source-of-truth pointer for each cited number.\n", + "\n", + "**What this notebook reproduces**:\n", + "- Table 1 (5-method audit accuracies)\n", + "- Table 2 (mode validation)\n", + "- Table 3 (protocol definition — static, no data)\n", + "- Table 5 (depth sweep, Appendix H)\n", + "- Table 6 (no-residual ablation, Appendix H)\n", + "- Table 9 (SB+CB penalty rescue, Appendix J)\n", + "- Figure 1 (audit hero) — references the saved figure file\n", + "- Figure 2 (cross-method dissociation) — re-renders from saved JSON\n", + "- Figure 3 (temporal cross-arch) — references saved\n", + "- Figure 4 (penalty rescue) — re-renders from saved JSON\n", + "- Figure 5 (cross-arch verdict matrix) — re-renders from hand-encoded data\n", + "- §4 ¶4 cross-method functional triangulation (nudging + training-loss decrease)\n", + "- §5 ¶3 BP+penalty 2x2 control\n", + "- Appendix L drift values\n", + "- Appendix M layer-0 dominance per-seed table\n", + "\n", + "**What this notebook does NOT do**:\n", + "- Re-train any model (use the experiment scripts in `experiments/` for that)\n", + "- Re-measure cosines on saved checkpoints (use `experiments/measure_direction_quality_existing_ckpt.py`)\n", + "\n", + "All values that the paper cites are derived in cells below by loading the corresponding `results/*.json` file." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "import numpy as np\n", + "from pathlib import Path\n", + "import os\n", + "\n", + "REPO_ROOT = Path('/home/yurenh2/fa')\n", + "os.chdir(REPO_ROOT)\n", + "\n", + "def both_stds(vals):\n", + " \"\"\"Return (mean, ddof=0 std, ddof=1 std) for a list of measurements.\n", + " \n", + " The paper uses ddof=1 (sample std with Bessel correction).\n", + " \"\"\"\n", + " return np.mean(vals), np.std(vals, ddof=0), np.std(vals, ddof=1)\n", + "\n", + "def load_json(rel):\n", + " return json.load(open(REPO_ROOT / rel))\n", + "\n", + "print('repo root:', REPO_ROOT)\n", + "print('saved auditable files:')\n", + "for f in sorted((REPO_ROOT / 'results').glob('*.json')):\n", + " print(f' ', f.name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Table 1 — 5-method audit on the 4-block d=256 pre-LayerNorm ResMLP\n", + "\n", + "**Source**: `results/protocol_audit/audit_table_s42_s123_s456.json` (3 seeds × 5 methods)\n", + "\n", + "Each row shows test accuracy ± sample std (ddof=1) and headline Γ at the converged 100-epoch checkpoint." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = load_json('results/protocol_audit/audit_table_s42_s123_s456.json')\n", + "print(f'{\"method\":<14} {\"acc (±ddof=1)\":<18} {\"per-seed accs\"}')\n", + "print('-' * 70)\n", + "for m, label in [('bp', 'BP'), ('ep', 'EP'), ('dfa', 'DFA'),\n", + " ('state_bridge', 'State Bridge'), ('credit_bridge', 'Credit Bridge')]:\n", + " accs = [d['reports'][f'{m}_s{s}']['headline_acc'] for s in [42, 123, 456]]\n", + " mean, _, ddof1 = both_stds(accs)\n", + " print(f'{label:<14} {mean:.3f} ± {ddof1:.3f} {[f\"{a:.4f}\" for a in accs]}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Frozen-blocks baseline\n", + "\n", + "**Source**: `results/resmlp_frozen_blocks_s{42,123,456}.log`\n", + "\n", + "DFA-shallow accuracy (the architecture-matched baseline used as the comparison for diagnostic (d))." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import re\n", + "shallow = []\n", + "for s in [42, 123, 456]:\n", + " log = open(REPO_ROOT / f'results/resmlp_frozen_blocks_s{s}.log').read()\n", + " m = re.search(r'FINAL DFA-shallow: (\\d+\\.\\d+)', log)\n", + " if m: shallow.append(float(m.group(1)))\n", + "mean, _, ddof1 = both_stds(shallow)\n", + "print(f'DFA-shallow (frozen baseline): {mean:.3f} ± {ddof1:.3f}')\n", + "print(f' per-seed: {shallow}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## §5 — matched 30-epoch BP/DFA controls (with and without penalty)\n", + "\n", + "**Sources**:\n", + "- BP no-pen: `results/bp_no_penalty_30ep/bp_pen_lam0.0_s{42,123,456}.json`\n", + "- BP+pen: `results/bp_with_penalty/bp_pen_lam0.01_s{42,123,456}.json`\n", + "- DFA no-pen: `results/dfa_no_penalty_30ep/results_cifar10.json`\n", + "- DFA+pen: `results/dfa_pen_short/dfa_pen_lam0.01_s{42,123,456}.json`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# BP no-pen\n", + "bp_nopen = [load_json(f'results/bp_no_penalty_30ep/bp_pen_lam0.0_s{s}.json')['final_acc'] for s in [42, 123, 456]]\n", + "mean, _, ddof1 = both_stds(bp_nopen)\n", + "print(f'BP no-pen 30ep: {mean:.3f} ± {ddof1:.3f}')\n", + "\n", + "# BP+pen\n", + "bp_pen = [load_json(f'results/bp_with_penalty/bp_pen_lam0.01_s{s}.json')['final_acc'] for s in [42, 123, 456]]\n", + "mean, _, ddof1 = both_stds(bp_pen)\n", + "print(f'BP+pen 30ep: {mean:.3f} ± {ddof1:.3f}')\n", + "\n", + "# DFA no-pen\n", + "d = load_json('results/dfa_no_penalty_30ep/results_cifar10.json')\n", + "dfa_nopen = [d[str(s)]['dfa']['log']['test_acc'][-1] for s in [42, 123, 456]]\n", + "mean, _, ddof1 = both_stds(dfa_nopen)\n", + "print(f'DFA no-pen 30ep: {mean:.3f} ± {ddof1:.3f}')\n", + "\n", + "# DFA+pen\n", + "dfa_pen = [load_json(f'results/dfa_pen_short/dfa_pen_lam0.01_s{s}.json')['final_test_acc'] for s in [42, 123, 456]]\n", + "mean, _, ddof1 = both_stds(dfa_pen)\n", + "print(f'DFA+pen 30ep: {mean:.3f} ± {ddof1:.3f}')\n", + "\n", + "# Penalty cost / margin math\n", + "frozen = 0.349\n", + "print()\n", + "print('§5 ¶3 derived quantities:')\n", + "print(f' BP penalty cost: {(np.mean(bp_nopen) - np.mean(bp_pen))*100:.1f} pp')\n", + "print(f' DFA penalty rescue: {(np.mean(dfa_pen) - np.mean(dfa_nopen))*100:.1f} pp')\n", + "print(f' BP+pen margin vs frozen: {(np.mean(bp_pen) - frozen)*100:.1f} pp')\n", + "print(f' DFA+pen margin vs frozen: {(np.mean(dfa_pen) - frozen)*100:.1f} pp')\n", + "print(f' BP-to-DFA gap (under penalty): {(np.mean(bp_pen) - np.mean(dfa_pen))*100:.1f} pp')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## §4 ¶4 — SB+pen, CB+pen, DFA+pen accuracies, cosines, ρ" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "files_by_method = {\n", + " 'state_bridge': [\n", + " ('round38_sbcb_penalty_30ep', '42'),\n", + " ('round38_sb_penalty_30ep_s123', '123'),\n", + " ('round38_sb_penalty_30ep_s456', '456'),\n", + " ],\n", + " 'credit_bridge': [\n", + " ('round38_sbcb_penalty_30ep', '42'),\n", + " ('round38_cb_penalty_30ep_s123', '123'),\n", + " ('round38_cb_penalty_30ep_s456', '456'),\n", + " ],\n", + " 'dfa': [\n", + " ('round41_dfa_penalty_30ep', '42'),\n", + " ('round41_dfa_penalty_30ep_s123', '123'),\n", + " ('round41_dfa_penalty_30ep_s456', '456'),\n", + " ],\n", + "}\n", + "\n", + "labels = {'state_bridge': 'SB+pen', 'credit_bridge': 'CB+pen', 'dfa': 'DFA+pen'}\n", + "for m, files in files_by_method.items():\n", + " accs, cos_deep, rho_deep = [], [], []\n", + " for tag, sk in files:\n", + " d = load_json(f'results/{tag}/results_cifar10.json')\n", + " accs.append(d[sk][m]['log']['test_acc'][-1])\n", + " diag = d[sk][m]['diagnostics']\n", + " cos_deep.append(np.mean(diag['bp_cosine'][1:]))\n", + " rho_deep.append(np.mean(diag['perturbation_rho'][1:]))\n", + " a_m, _, a_s = both_stds(accs)\n", + " c_m, _, c_s = both_stds(cos_deep)\n", + " r_m, _, r_s = both_stds(rho_deep)\n", + " print(f'{labels[m]:<8} acc {a_m:.3f}±{a_s:.3f} cos {c_m:+.3f}±{c_s:.3f} rho {r_m:+.3f}±{r_s:.3f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## §4 ¶4 — Nudging test 3-seed (the strongest functional metric)\n", + "\n", + "**Source**: `results/nudging_test_3seed_summary.json`\n", + "\n", + "Single-step loss change for a step of size η=0.01 along the per-layer credit direction at the converged checkpoint, averaged over the deep blocks (l1+)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "n = load_json('results/nudging_test_3seed_summary.json')\n", + "for m, label in [('state_bridge', 'SB+pen'), ('credit_bridge', 'CB+pen'), ('dfa', 'DFA+pen')]:\n", + " vals = [v['deep_mean'] for v in n['methods'][m]['per_seed'].values()]\n", + " mean, _, ddof1 = both_stds(vals)\n", + " print(f'{label:<8}: {mean:.2e} ± {ddof1:.2e} (per seed: {[f\"{v:.2e}\" for v in vals]})')\n", + "\n", + "sb = n['methods']['state_bridge']['three_seed_deep_mean']\n", + "cb = n['methods']['credit_bridge']['three_seed_deep_mean']\n", + "dfa = n['methods']['dfa']['three_seed_deep_mean']\n", + "print()\n", + "print(f'SB / CB ratio: {sb / cb:.2f}')\n", + "print(f'SB / DFA ratio: {sb / dfa:.2f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## §4 ¶4 — Training loss decrease 3-seed\n", + "\n", + "**Source**: `results/training_loss_decrease_3seed.json`\n", + "\n", + "Loss[ep1] − Loss[ep30] for each method, averaged over 3 seeds." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "t = load_json('results/training_loss_decrease_3seed.json')\n", + "for m, label in [('state_bridge', 'SB+pen'), ('credit_bridge', 'CB+pen'), ('dfa', 'DFA+pen')]:\n", + " vals = [v['decrease'] for v in t['per_method'][m]['per_seed'].values()]\n", + " mean, _, ddof1 = both_stds(vals)\n", + " print(f'{label:<8}: {mean:.4f} ± {ddof1:.4f} (per seed: {[f\"{v:.4f}\" for v in vals]})')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Appendix M — vanilla DFA early-epoch per-layer cosines (layer-0 dominance)\n", + "\n", + "**Source**: `results/vanilla_dfa_early_ckpts/per_layer_cos_3seed.json`\n", + "\n", + "Per-seed × per-epoch × per-layer cosine measurements showing that the headline Γ on vanilla DFA is driven entirely by layer 0, with all deep layers (1-4) at noise." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = load_json('results/vanilla_dfa_early_ckpts/per_layer_cos_3seed.json')\n", + "print(f'{\"key\":<12} {\"l0\":>8} {\"l1\":>8} {\"l2\":>8} {\"l3\":>8} {\"l4\":>8} {\"||g_2||\"}')\n", + "for k, v in d.items():\n", + " cos = v['per_layer_cos']\n", + " g2 = v['per_layer_g_norm_median'][2]\n", + " print(f'{k:<12} ' + ' '.join(f'{c:+8.3f}' for c in cos) + f' {g2:.2e}')\n", + "\n", + "# Aggregate stats\n", + "ep1 = [np.mean(d[f's{s}_ep1']['per_layer_cos'][1:]) for s in [42, 123, 456]]\n", + "mean, _, ddof1 = both_stds(ep1)\n", + "print(f'\\nep 1 deep mean (3-seed): {mean:.4f} ± {ddof1:.4f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## §6 ¶1 — protocol calibration gaps (for the 4-diagnostic protocol)\n", + "\n", + "**Source**: `results/protocol_audit/audit_table_s42_s123_s456.json`\n", + "\n", + "The 24,338× and 63× gaps between healthy (BP/EP) and degenerate (DFA/SB/CB) reference quantities." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "d = load_json('results/protocol_audit/audit_table_s42_s123_s456.json')\n", + "\n", + "# Per-seed g_L (deepest BP gradient norm)\n", + "healthy_g, degen_g = [], []\n", + "for m in ['bp', 'ep']:\n", + " for s in [42, 123, 456]:\n", + " g = d['reports'][f'{m}_s{s}']['bp_grad_norms'][-1]\n", + " healthy_g.append(g)\n", + "for m in ['dfa', 'state_bridge', 'credit_bridge']:\n", + " for s in [42, 123, 456]:\n", + " g = d['reports'][f'{m}_s{s}']['bp_grad_norms'][-1]\n", + " degen_g.append(g)\n", + "\n", + "print(f'min healthy ||g_L|| = {min(healthy_g):.2e}')\n", + "print(f'max degenerate ||g_L|| = {max(degen_g):.2e}')\n", + "print(f'gap factor = {min(healthy_g) / max(degen_g):.0f}×')\n", + "print()\n", + "\n", + "# Per-seed max-per-block growth\n", + "healthy_growth, degen_growth = [], []\n", + "for m in ['bp', 'ep']:\n", + " for s in [42, 123, 456]:\n", + " res = d['reports'][f'{m}_s{s}']['residual_norms']\n", + " ratios = [res[i+1]/res[i] for i in range(len(res)-1)]\n", + " healthy_growth.append(max(ratios))\n", + "for m in ['dfa', 'state_bridge', 'credit_bridge']:\n", + " for s in [42, 123, 456]:\n", + " res = d['reports'][f'{m}_s{s}']['residual_norms']\n", + " ratios = [res[i+1]/res[i] for i in range(len(res)-1)]\n", + " degen_growth.append(max(ratios))\n", + "\n", + "print(f'max healthy per-block growth = {max(healthy_growth):.2f}')\n", + "print(f'min degenerate per-block growth = {min(degen_growth):.2f}')\n", + "print(f'gap factor = {min(degen_growth) / max(healthy_growth):.1f}×')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## §6 ¶2 — fresh-B null calibration (penalty creates real signal)\n", + "\n", + "**Source**: `results/null_calibration_penalized_dfa.json`\n", + "\n", + "20 fresh random-B draws on the penalized DFA s42 checkpoint, vs the training-Bs deep cosine." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "n = load_json('results/null_calibration_penalized_dfa.json')\n", + "print(f'training-Bs deep cos (s42): {n[\"training_Bs_deep_cos\"]:+.4f}')\n", + "print(f'fresh-Bs deep cos (n=20): {n[\"fresh_Bs_deep_mean_of_per_draw_means\"]:+.4f} ± {n[\"fresh_Bs_deep_std_of_per_draw_means_ddof0\"]:.4f}')\n", + "print()\n", + "print('per-layer std across 20 fresh-B draws:')\n", + "for i, s in enumerate(n['fresh_Bs_per_layer_std_ddof0']):\n", + " print(f' l{i}: {s:.4f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## §3 ¶3 — no-terminal-LN ResMLP same-backbone control\n", + "\n", + "**Source**: `results/snapshot_no_outln_v1/snapshot_noLN_s{42,123,456}.json`\n", + "\n", + "Removing terminal LN from the same backbone preserves Mode 1(a) but eliminates Mode 1(b)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "hL_vals, gL_vals, accs = [], [], []\n", + "for s in [42, 123, 456]:\n", + " d = load_json(f'results/snapshot_no_outln_v1/snapshot_noLN_s{s}.json')\n", + " final = d['dfa_log'][-1]\n", + " hL_vals.append(final['hidden_norms'][-1])\n", + " gL_vals.append(final['bp_grad_per_sample_l2_med'][-1])\n", + " accs.append(final['acc_eval'])\n", + "\n", + "print(f'no-outln DFA 100ep, 3 seeds:')\n", + "print(f' ||h_L|| 3-seed mean: {np.mean(hL_vals):.2e} (per seed: {[f\"{v:.2e}\" for v in hL_vals]})')\n", + "print(f' ||g_L|| 3-seed mean: {np.mean(gL_vals):.2e} (per seed: {[f\"{v:.2e}\" for v in gL_vals]})')\n", + "mean, _, ddof1 = both_stds(accs)\n", + "print(f' test acc: {mean:.3f} ± {ddof1:.3f}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Figure 2 — re-render the cross-method dissociation visualization\n", + "\n", + "**Renderer**: `paper/figures/render_fig_cos_acc_dissociation.py`\n", + "\n", + "Re-running the renderer regenerates `paper/figures/fig_cos_acc_dissociation.pdf`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import subprocess\n", + "result = subprocess.run(['python3', 'paper/figures/render_fig_cos_acc_dissociation.py'],\n", + " capture_output=True, text=True)\n", + "print(result.stdout)\n", + "print(result.stderr)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Figure 4 — re-render penalty rescue panels\n", + "\n", + "**Renderer**: `paper/figures/render_fig4_penalty_rescue.py`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "result = subprocess.run(['python3', 'paper/figures/render_fig4_penalty_rescue.py'],\n", + " capture_output=True, text=True)\n", + "print(result.stdout)\n", + "print(result.stderr)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Figure 5 — re-render cross-architecture verdict matrix\n", + "\n", + "**Renderer**: `paper/figures/render_fig5_cross_arch.py`\n", + "\n", + "The verdict matrix is hand-encoded based on the per-row data sources (see the script's docstring for which JSON each row references)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "result = subprocess.run(['python3', 'paper/figures/render_fig5_cross_arch.py'],\n", + " capture_output=True, text=True)\n", + "print(result.stdout)\n", + "print(result.stderr)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Compile the paper PDF\n", + "\n", + "Final step: re-run tectonic on `paper/main.tex` to produce a fresh PDF that incorporates any updated figures." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "result = subprocess.run(['tectonic', 'paper/main.tex'],\n", + " capture_output=True, text=True, cwd=str(REPO_ROOT))\n", + "# print last 500 chars of stderr (tectonic warnings/errors)\n", + "print(result.stderr[-500:] if result.stderr else 'no stderr')\n", + "print()\n", + "import subprocess as sp\n", + "info = sp.run(['pdfinfo', 'paper/main.pdf'], capture_output=True, text=True, cwd=str(REPO_ROOT))\n", + "for line in info.stdout.split('\\n'):\n", + " if 'Pages' in line: print(line)\n", + "print(f'\\nPDF: {REPO_ROOT}/paper/main.pdf')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Summary\n", + "\n", + "All paper figures and tables can be reproduced from the following saved files:\n", + "\n", + "| Source | Used by |\n", + "|---|---|\n", + "| `results/protocol_audit/audit_table_s42_s123_s456.json` | Table 1, Figure 1, §6 ¶1 |\n", + "| `results/protocol_audit/audit_d512_3seed.json` | Appendix H d=512 |\n", + "| `results/protocol_audit/audit_cnn_3seed.json` | §3 ¶3 / §5 ¶3 CNN values, Figure 5 |\n", + "| `results/protocol_audit/temporal_evolution_s{42,123,456}.json` | §3 ¶3 ep-4 g_L, Figure 5 row 4 |\n", + "| `results/snapshot_no_outln_v1/snapshot_noLN_s{42,123,456}.json` | §3 ¶3 no-outln control |\n", + "| `results/snapshot_evolution_v2/snapshot_evolution_s{42,123,456}.json` | §3 ¶1 endpoint values |\n", + "| `results/dfa_pen_short/dfa_pen_lam0.01_s{42,123,456}.json` | DFA+pen 30ep |\n", + "| `results/dfa_pen_short/dfa_pen_lam0.0001_s{42,123,456}.json` | §5 ¶2 λ=1e-4 |\n", + "| `results/round38_sbcb_penalty_30ep/results_cifar10.json` (s42) | SB+pen, CB+pen s42 |\n", + "| `results/round38_{sb,cb}_penalty_30ep_s{123,456}/results_cifar10.json` | SB+pen, CB+pen s123/s456 |\n", + "| `results/round41_dfa_penalty_30ep{,_s{123,456}}/results_cifar10.json` | DFA+pen 30ep diagnostics |\n", + "| `results/bp_no_penalty_30ep/bp_pen_lam0.0_s{42,123,456}.json` | §5 ¶3 BP no-pen matched |\n", + "| `results/bp_with_penalty/bp_pen_lam0.01_s{42,123,456}.json` | §5 ¶3 BP+pen multi-seed |\n", + "| `results/dfa_no_penalty_30ep/results_cifar10.json` | §5 ¶3 DFA no-pen matched |\n", + "| `results/resmlp_frozen_blocks_s{42,123,456}.log` | Frozen baseline 0.349 |\n", + "| `results/h2_no_residual_full_s{42,123,456}/snapshot_evolution_s{42,123,456}.json` | Appendix H no-residual ablation |\n", + "| `results/optionA_random_targets_s42/snapshot_evolution_s42.json` | Appendix I random-target DFA |\n", + "| `results/optionSBCB_smoke/results_cifar10.json` | Appendix I random-target SB/CB 3ep |\n", + "| `results/optionSBCB_random_targets_s42/results_cifar10.json` | Appendix I random-target SB/CB 100ep |\n", + "| `results/optionEP_smoke/ep_random_s42.pt` | EP random-target 5ep |\n", + "| `results/optionEP_random_targets_full/ep_random_s42.pt` | EP random-target 100ep |\n", + "| `results/ep_random_h_L_summary.json` | EP random-target h_L 3-seed |\n", + "| `results/null_calibration_penalized_dfa.json` | §6 ¶2 fresh-B null |\n", + "| `results/nudging_test_3seed_summary.json` | §4 ¶4 nudging test 3-seed |\n", + "| `results/training_loss_decrease_3seed.json` | §4 ¶4 training-loss trajectory 3-seed |\n", + "| `results/matched_30ep_control_summary.json` | §5 ¶3 matched 30-ep summary |\n", + "| `results/bp_with_penalty_3seed_summary.json` | §5 ¶3 BP+pen 3-seed |\n", + "| `results/vanilla_dfa_early_ckpts/per_layer_cos_3seed.json` | Appendix M layer-0 dominance |\n", + "| `results/threshold_sensitivity_output.txt` | Appendix E threshold sweep |\n", + "\n", + "**Statistical convention**: as of v2.38, all 3-seed standard deviations in the paper use ddof=1 (sample std with Bessel correction). The `both_stds()` helper at the top of this notebook returns both ddof=0 and ddof=1 for any list of values; the paper-cited value is always the ddof=1 column.\n", + "\n", + "**To re-run the experiments themselves** (for re-training or re-measuring), see the corresponding scripts in `experiments/` and `protocol/examples/`. The training scripts each take a `--seed` argument; the standard 3-seed set is {42, 123, 456}." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.13" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} -- cgit v1.2.3