From 66e0d8b9fd4d0f7a2231d689c055e26fdf1cf04a Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Sat, 13 Jun 2026 12:35:36 -0500 Subject: rrm workspace: TRM/HRM/SRM code, Maze dataset, dynamical-analysis pipeline Curated export for clone-and-run Maze training (2x A6000) + diagnostics. trm/hrm pretrain.py carry trajectory-augmentation code (backward-compatible). Heavy artifacts (checkpoints/wandb/npz) gitignored; see PROVENANCE.md. Co-Authored-By: Claude Fable 5 --- .../analysis_2x2/offline_followups/phase1_e1.md | 27 ++++++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 research/flossing/analysis_2x2/offline_followups/phase1_e1.md (limited to 'research/flossing/analysis_2x2/offline_followups/phase1_e1.md') diff --git a/research/flossing/analysis_2x2/offline_followups/phase1_e1.md b/research/flossing/analysis_2x2/offline_followups/phase1_e1.md new file mode 100644 index 0000000..b3e3f40 --- /dev/null +++ b/research/flossing/analysis_2x2/offline_followups/phase1_e1.md @@ -0,0 +1,27 @@ +# E1 offline batch — bootstrap CIs, settling robustness, TRM multi4 pair + +## Bootstrap / exact CIs (TRM official @58590, n=2048) +- settled-wrong fraction: observed 0/254; exact 95% upper bound 0.0117 (1.17% of failures) +- AUC(-lam1->correct) = 0.9935, bootstrap 95% CI (0.9908244697676584, 0.9957330475628791) +- lam1(wrong) median 95% CI (0.10100110620260239, 0.10556983947753906) +- lam1(correct) median 95% CI (0.011215815320611, 0.011744528077542782) + +## Bootstrap CIs (HRM @26040, n=8192, strict band) +- strict settled-wrong fraction of failures: observed 0.0054, bootstrap 95% CI (0.0032613427182413084, 0.007798538095694945) +- AUC(-lam1->correct) = 0.9841, bootstrap 95% CI (0.9815470536412456, 0.9865145187475995) + +## Settling-criterion robustness (B-cell counts under alternative drift definitions) +- TRM official n=2048 | zH: B=0/A=1724 (tau=1.36) | zL: B=0/A=1728 (tau=1.42) | combined: B=0/A=1727 (tau=1.54) +- HRM n=8192 | zH: B=63/A=4103 (tau=0.77) | zL: B=59/A=4083 (tau=1.01) | combined: B=60/A=4087 (tau=1.07) + +## TRM official-pipeline multi4 vs baseline (matched objective, n=512 each) +- baseline @58590: acc=0.875; A/B/C/D=434/0/14/64; fD=0.125; lam1(D)=+0.1034; lam1(A)=+0.0111 +- multi4 @35805 (best): acc=0.900; A/B/C/D=452/0/9/51; fD=0.100; lam1(D)=+0.1019; lam1(A)=+0.0039 +- multi4 @65100 (final): acc=0.824; A/B/C/D=408/1/14/89; fD=0.174; lam1(D)=+0.0946; lam1(A)=+0.0133 + +## hrm_multi4 provenance (E6a) +- diag_hrm_multi4_step_{20832,23436,26040}_512.npz step grid matches HRM pretrain numbering; + multi4_eval_compare/logs should contain the eval invocations — checked manually below. +- ACTION: if the hrm_multi4 run is pretrain-pipeline (ACT-streaming + perturbation), then the + May-28 multi4 vs righteous baseline comparison IS matched-pipeline and Sec 3.4's caveat is + narrower than written; step9 E-vs-F pair (queued) covers the fixed-unroll objective regardless. \ No newline at end of file -- cgit v1.2.3