rrm workspace: TRM/HRM/SRM code, Maze dataset, dynamical-analysis pipelineHEAD main

Curated export for clone-and-run Maze training (2x A6000) + diagnostics. trm/hrm pretrain.py carry trajectory-augmentation code (backward-compatible). Heavy artifacts (checkpoints/wandb/npz) gitignored; see PROVENANCE.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
author: YurenHao0426 <blackhao0426@gmail.com> 2026-06-13 12:35:36 -0500
committer: YurenHao0426 <blackhao0426@gmail.com> 2026-06-13 12:35:36 -0500
commit: 66e0d8b9fd4d0f7a2231d689c055e26fdf1cf04a (patch)
tree: c29cba61124018755a19b02c9d33e3ad5f2e05cc /research/flossing/analysis_2x2/results.md
1 files changed, 69 insertions, 0 deletions
diff --git a/research/flossing/analysis_2x2/results.md b/research/flossing/analysis_2x2/results.md
new file mode 100644
index 0000000..a62bba2
--- /dev/null
+++ b/research/flossing/analysis_2x2/results.md
@@ -0,0 +1,69 @@
+# 2x2 analysis (convergence x correctness) — generated 2026-06-11
+
+## hrm26040_n8192
+- npz: `/home/yurenh2/rrm/research/flossing/diag_8k.npz`, n=8192, exact_acc=0.525
+- late-drift def: mean(drift_zH[:, -4:]), Otsu tau(log10)=0.766, frac_converged=0.509
+
+| cell | n | lam1 median | lam1 IQR | token_acc med |
+|---|---|---|---|---|
+| A_conv_correct | 4103 | -0.8617 | [-0.9020, -0.8153] | 1.000 |
+| B_conv_wrong | 63 | -0.5912 | [-0.8091, -0.5061] | 0.630 |
+| C_nonconv_correct | 195 | -0.6943 | [-0.7354, -0.6466] | 1.000 |
+| D_nonconv_wrong | 3831 | -0.5998 | [-0.6475, -0.5488] | 0.630 |
+
+- mixture: wrong-that-converged = 0.016; correct-that-nonconverged = 0.045
+- dlam1(correct-wrong): overall -0.2586; within-conv -0.2705; within-nonconv -0.0945
+- dlam1(wrong: conv - nonconv) = +0.0085
+- AUC(-lam1 -> correct): overall 0.984; within-conv 0.852; within-nonconv 0.818
+- AUC(-log d_late -> correct) = 0.964; AUC(-lam1 -> converged) = 0.986
+
+## trm_singleGPU_step260410_n512
+- npz: `/home/yurenh2/rrm/research/flossing/diag_trm_singleGPU_step260410_512.npz`, n=512, exact_acc=0.770
+- late-drift def: mean(drift_zH[:, -4:]), Otsu tau(log10)=1.406, frac_converged=0.748
+
+| cell | n | lam1 median | lam1 IQR | token_acc med |
+|---|---|---|---|---|
+| A_conv_correct | 383 | +0.0047 | [+0.0007, +0.0137] | 1.000 |
+| B_conv_wrong | 0 | - | - | - |
+| C_nonconv_correct | 11 | +0.0998 | [+0.0907, +0.1035] | 1.000 |
+| D_nonconv_wrong | 118 | +0.1023 | [+0.0935, +0.1112] | 0.642 |
+
+- mixture: wrong-that-converged = 0.000; correct-that-nonconverged = 0.028
+- dlam1(correct-wrong): overall -0.0973; within-conv +nan; within-nonconv -0.0025
+- dlam1(wrong: conv - nonconv) = +nan
+- AUC(-lam1 -> correct): overall 0.989; within-conv nan; within-nonconv 0.619
+- AUC(-log d_late -> correct) = 0.975; AUC(-lam1 -> converged) = 0.996
+
+## trm_singleGPU_step130205_n512
+- npz: `/home/yurenh2/rrm/research/flossing/diag_trm_singleGPU_step130205_512.npz`, n=512, exact_acc=0.756
+- late-drift def: mean(drift_zH[:, -4:]), Otsu tau(log10)=1.213, frac_converged=0.713
+
+| cell | n | lam1 median | lam1 IQR | token_acc med |
+|---|---|---|---|---|
+| A_conv_correct | 365 | +0.0043 | [-0.0007, +0.0141] | 1.000 |
+| B_conv_wrong | 0 | - | - | - |
+| C_nonconv_correct | 22 | +0.0852 | [+0.0694, +0.0988] | 1.000 |
+| D_nonconv_wrong | 125 | +0.1067 | [+0.0995, +0.1126] | 0.630 |
+
+- mixture: wrong-that-converged = 0.000; correct-that-nonconverged = 0.057
+- dlam1(correct-wrong): overall -0.1017; within-conv +nan; within-nonconv -0.0215
+- dlam1(wrong: conv - nonconv) = +nan
+- AUC(-lam1 -> correct): overall 0.989; within-conv nan; within-nonconv 0.805
+- AUC(-log d_late -> correct) = 0.957; AUC(-lam1 -> converged) = 0.996
+
+## trm_step13020_n512
+- npz: `/home/yurenh2/rrm/research/flossing/diag_trm_step13020_512.npz`, n=512, exact_acc=0.596
+- late-drift def: mean(drift_zH[:, -4:]), Otsu tau(log10)=0.730, frac_converged=0.598
+
+| cell | n | lam1 median | lam1 IQR | token_acc med |
+|---|---|---|---|---|
+| A_conv_correct | 296 | -0.0043 | [-0.0113, +0.0052] | 1.000 |
+| B_conv_wrong | 10 | -0.0027 | [-0.0090, +0.0009] | 0.556 |
+| C_nonconv_correct | 9 | +0.0419 | [+0.0323, +0.0480] | 1.000 |
+| D_nonconv_wrong | 197 | +0.0317 | [+0.0143, +0.0462] | 0.642 |
+
+- mixture: wrong-that-converged = 0.048; correct-that-nonconverged = 0.030
+- dlam1(correct-wrong): overall -0.0339; within-conv -0.0016; within-nonconv +0.0102
+- dlam1(wrong: conv - nonconv) = -0.0344
+- AUC(-lam1 -> correct): overall 0.849; within-conv 0.515; within-nonconv 0.360
+- AUC(-log d_late -> correct) = 0.976; AUC(-lam1 -> converged) = 0.887
author	YurenHao0426 <blackhao0426@gmail.com>	2026-06-13 12:35:36 -0500
committer	YurenHao0426 <blackhao0426@gmail.com>	2026-06-13 12:35:36 -0500
commit	66e0d8b9fd4d0f7a2231d689c055e26fdf1cf04a (patch)
tree	c29cba61124018755a19b02c9d33e3ad5f2e05cc /research/flossing/analysis_2x2/results.md