faeval.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2026-06-14	Add new experiment scripts, figures, and paper assets; untrack pyc/build ↵	YurenHao0426
	artifacts Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-04-03	Add 5 extra seeds to synthetic cross-state distance (now 10 seeds for all ↵	YurenHao0426
	methods) BP/DFA/SB/CB: added seeds 2048,3000,4000,5000,6000 (L=4 only, all 3 alphas) Total: 1290 rows (was 990) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03	Complete EP data: 10-seed synthetic + 6-seed CIFAR persample + cross-state	YurenHao0426
	EP synthetic: 30 JSONs + 30 checkpoints (10 seeds × 3α) EP CIFAR persample: 6 seeds × 4 layers × 256 samples = 6144 rows added Synth cross-state: 150 EP rows added (990 total) cifar_persample_all.csv: 30720 rows (was 24576, +6144 EP) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03	EP synthetic 10 seeds complete: 30 JSONs + 30 checkpoints + cross-state distance	YurenHao0426
	Updated synth_cross_state_distance.csv with 150 EP rows (990 total). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03	Add EP cross-state distance for CIFAR + verify CNN summary	YurenHao0426
	EP CIFAR d_BP: L0=2.0×, L4=26.7× (much closer to BP than DFA=162×/2.5M×) EP synthetic: no checkpoints saved (ep_synthetic.py didn't save .pt) CNN summary: 20 rows confirmed correct Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03	Add EP synthetic per-seed CSV + synthetic cross-state distance	YurenHao0426
	EP synthetic: 15 rows (3α × 5 seeds) Synth cross-state: 840 rows (3α × 2L × 4 methods × 5 seeds × (L+1) layers) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03	Add EP to BP gradient sparsity analysis	YurenHao0426
	EP CIFAR d=256: s(1e-6)=100%, mean_norm=1.41e-04 EP produces networks where ALL samples have non-zero BP gradients, unlike DFA (0.4%), SB (21%), CB (3%). EP is closer to BP (98.7%). Updated clean_sparsity_summary.csv (980 rows, now includes EP). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-03	Add cross-method hidden state distance vs BP	YurenHao0426
	Non-BP methods produce radically different representations: DFA L0: 162×, L4: 2.5M× relative to BP hidden norms SB L0: 3.2×, L4: 1.1M× CB L0: 59×, L4: 1.4M× (BP vs itself = 0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01	Add d=512 support sparsity: 20 JSONs + summary CSV	YurenHao0426
	BP: s(1e-6)=92.7%, norm=2.70e-04, r_inf=0.159, PR=0.300 DFA: s(1e-6)=0.1%, norm=5.31e-08 SB: s(1e-6)=20.3%, norm=2.33e-06 CB: s(1e-6)=1.2%, norm=9.88e-08 Same pattern as d=256, confirming width-independence of the sparsity gap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01	Add missing bp_s456.json for CIFAR d=512 (rerun after SIGTERM)	YurenHao0426
	bp s=456: acc=0.5999, rho=0.9881, nse=0.4764 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01	Add CIFAR L=4 d=512 confirmatory: 4 methods × 5 seeds with checkpoints	YurenHao0426
	BP: 60.6%±0.3%, rho=0.989 DFA: 30.8%±0.5%, rho=0.003 State Bridge: 21.2%±3.7%, rho=0.119 Credit Bridge: 30.1%±0.5%, rho=0.002 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01	Add P3 protocol panel: method ranking across 5 protocol slices	YurenHao0426
	Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01	Add per-sample gradient stats: 24576 rows (256 samples × 4 layers × 4 ↵	YurenHao0426
	methods × 6 seeds) Columns: method, seed, layer, sample_id, grad_norm, log10_grad_norm, r_inf, pr, hoyer, topk1, topk5 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01	Add clean_sparsity_summary.csv: 960 rows aggregated from 168 JSONs	YurenHao0426
	Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01	Add clean sparsity results: 168 JSONs from independent processes on GPU 1	YurenHao0426
	CIFAR: 24 JSONs (4 methods × 6 seeds), BP s(1e-6)=98% confirmed Synthetic: 144 JSONs (4 methods × 6 seeds × 3 alphas × 2 depths) All data reliable — each method+seed in separate Python process. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01	Add clean gradient check: independent Python process per method, GPU 1	YurenHao0426
	Clean results (each method in fresh Python process): BP: mean_norm=2.58e-04, s(1e-6)=98% — CONFIRMED DFA: layer 0 = 2.86e-07 (1.2%), layers 1-3 ≈ 2.4e-09 (0%) SB: layer 0 = 6.13e-06 (86%), layers 1-3 ≈ 1e-09 (0%) CB: layer 0 = 6.33e-07 (18%), layers 1-3 ≈ 5e-10 (0%) Method A (autograd.grad) and Method B (retain_grad) give identical results. Previous 1e-12 results were caused by Python process state pollution in combined scripts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01	Add element-wise gradient concentration analysis (CPU, from checkpoints)	YurenHao0426
	BP gradients are relatively uniform: top1%=7.1%, PR=0.327, eff_dim=0.632 DFA gradients extremely concentrated: top1%=40.6%, PR=0.089, eff_dim=0.272 SB/CB intermediate: top1%=17-21%, PR=0.14-0.17 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01	Add confirmatory supplement: T1-T4 from checkpoints (no retraining)	YurenHao0426
	WARNING: All methods (including BP) show near-zero BP hidden gradients (~1e-12-1e-14) when computed via manual forward with detached hidden states. This is inconsistent with the earlier first-priority analysis which showed BP at 2.86e-04. Investigation needed. T1: 40 rows (4 methods × 10 seeds) - full metrics T2: 800 rows (support sparsity, 5 thresholds × 4 methods × 10 seeds × 4 layers) T3: 48 rows (gradient norm distributions, 3 seeds × 4 methods × 4 layers) T4: 100 rows (active-subset Gamma, 5 thresholds × 2 methods × 10 seeds) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01	Add extended sparsity analysis: A4 per-layer, B1 snapshots, B2 active ↵	YurenHao0426
	subset, C1/C2 A4: Per-layer support — DFA/SB/CB layers 1-3 have 0% support at τ=1e-6 Only BP has ~95% support; only SB layer 0 has 53% B1: Snapshot evolution — old snapshot checkpoints have near-zero grads (data issue) B2: Active subset — with τ=1e-6, no active samples for non-BP methods C1: Active vs inactive cosine — only inactive subset exists for non-BP C2: Energy concentration — near-zero for non-BP methods Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01	Add BP support sparsity analysis: threshold sweep + gradient histograms	YurenHao0426
	A1 Synthetic: all methods have >93% support at τ=1e-6 (gradients rarely zero) A2 CIFAR: massive gap — BP 98.4% vs DFA 0.4% vs SB 21% vs CB 3% DFA-trained CIFAR networks have near-zero BP gradients for 99.6% of samples This explains why Gamma is unreliable for CIFAR non-BP methods Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01	Recompute BP and DFA Gamma with near-zero gradient filtering	YurenHao0426
	BP Gamma: raw~0.99, filtered=1.000 (confirms self-cosine artifact from zero grads) DFA Gamma (synth): raw~0.01-0.16, filtered~0.01-0.17 (minimal filtering effect) DFA Gamma (CIFAR): raw=0.107, filtered=0.466 (99.7% samples have near-zero BP grad!) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31	Update naive StateErr v3: L2 norm ratio formula, with checkpoints saved	YurenHao0426
	Formula: \|\|h_{L//2} - h_L\|\|_2 / \|\|h_L\|\|_2 (scalar L2 ratio) A1: 240 rows (3 alpha × 2 depth × 4 methods × 10 seeds) A2: 40 rows (4 methods including BP × 10 seeds) All model checkpoints saved in checkpoints_A1/ and checkpoints_A2/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31	Add BP supplement for A2 CIFAR: 10 seeds with acc, Gamma, rho, naive_StateErr	YurenHao0426
	BP 10-seed results: acc=0.614±0.003, Gamma=1.0, rho=0.998 Appended to A2_cifar_state_vs_credit.csv and A2_naive_state_err.csv Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31	Add naive state prediction baseline for A1 and A2	YurenHao0426
	A1: 240 rows (3 alpha × 2 depth × 4 methods × 10 seeds) A2: 30 rows (3 methods × 10 seeds) naive_StateErr = \|\|h_{L//2} - h_L\|\| / \|\|h_L\|\| Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30	Add confirmatory paper experiments: A1-A4, all 10 seeds complete	YurenHao0426
	A1: Synthetic nonlinearity ladder (240 rows: 3 alpha × 2 depth × 4 methods × 10 seeds) A2: CIFAR state-vs-credit counterexample (30 rows: 3 methods × 10 seeds) A3: Frozen vs online dissociation (60 rows: 2 regimes × 3 methods × 10 seeds) A4: Protocol dependence panel (82 rows: assembled from existing results) All experiments ran on GPU 3. Total runtime: ~20 hours. CSVs in results/confirmatory/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>