rrm workspace: TRM/HRM/SRM code, Maze dataset, dynamical-analysis pipelineHEAD main

Curated export for clone-and-run Maze training (2x A6000) + diagnostics. trm/hrm pretrain.py carry trajectory-augmentation code (backward-compatible). Heavy artifacts (checkpoints/wandb/npz) gitignored; see PROVENANCE.md. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
author: YurenHao0426 <blackhao0426@gmail.com> 2026-06-13 12:35:36 -0500
committer: YurenHao0426 <blackhao0426@gmail.com> 2026-06-13 12:35:36 -0500
commit: 66e0d8b9fd4d0f7a2231d689c055e26fdf1cf04a (patch)
tree: c29cba61124018755a19b02c9d33e3ad5f2e05cc /research/flossing/dynamics_experiment_report.md
1 files changed, 86 insertions, 0 deletions
diff --git a/research/flossing/dynamics_experiment_report.md b/research/flossing/dynamics_experiment_report.md
new file mode 100644
index 0000000..489fb91
--- /dev/null
+++ b/research/flossing/dynamics_experiment_report.md
@@ -0,0 +1,86 @@
+# Dynamics Control Experiment Report
+
+Generated: 2026-05-27T20:24:05
+
+## Summary Table
+
+| Model | Run | Status | Init | Final/Last | Delta | Best | Best Step | Vs Baseline | Evals |
+|---|---|---:|---:|---:|---:|---:|---:|---:|---:|
+| HRM | HRM baseline 10k | complete | 0.5176 | 0.6504 | 0.1328 | 0.6699 | 9000 | NA | 12 |
+| HRM | HRM mixed volume-CF | complete | 0.5176 | 0.6562 | 0.1387 | 0.6660 | 7000 | 0.0059 | 12 |
+| HRM | HRM Engelken interfloss | complete | 0.5176 | 0.6465 | 0.1289 | 0.6621 | 8000 | -0.0039 | 14 |
+| HRM | HRM Engelken+KL interfloss | complete | 0.5176 | 0.6211 | 0.1035 | 0.6250 | 9000 | -0.0293 | 14 |
+| HRM | HRM conservative Engelken+KL | complete | 0.5176 | 0.6367 | 0.1191 | 0.6777 | 7000 | -0.0137 | 14 |
+| HRM | HRM late Engelken+KL | complete | 0.5176 | 0.6191 | 0.1016 | 0.6367 | 7000 | -0.0312 | 14 |
+| HRM | HRM volume-envelope+KL | complete | 0.5176 | 0.6074 | 0.0898 | 0.6211 | 8000 | -0.0430 | 14 |
+| HRM | HRM basin consistency | complete | 0.5176 | 0.6152 | 0.0977 | 0.6152 | 10000 | -0.0352 | 12 |
+| HRM | HRM single perturbed CE | complete | 0.5176 | 0.6660 | 0.1484 | 0.6660 | 5000 | 0.0156 | 12 |
+| HRM | HRM clean+multi perturbed CE | complete | 0.5176 | 0.6543 | 0.1367 | 0.6660 | 7000 | 0.0039 | 12 |
+| HRM | HRM fixed-unroll baseline 50k | complete | 0.5176 | 0.5801 | 0.0625 | 0.6328 | 5000 | -0.0703 | 22 |
+| HRM | HRM multi4 loguniform 50k | complete | 0.5176 | 0.5889 | 0.0713 | 0.6250 | 7500 | -0.0615 | 22 |
+| TRM | TRM baseline 10k | complete | 0.5977 | 0.5762 | -0.0215 | 0.6621 | 2000 | NA | 12 |
+| TRM | TRM mixed volume-CF | complete | 0.5977 | 0.5078 | -0.0898 | 0.6133 | 1000 | -0.0684 | 12 |
+| TRM | TRM Engelken interfloss | complete | 0.5977 | 0.5195 | -0.0781 | 0.5977 | 0 | -0.0566 | 14 |
+| TRM | TRM Engelken+KL interfloss | complete | 0.5977 | 0.5957 | -0.0020 | 0.5977 | 0 | 0.0195 | 14 |
+| TRM | TRM late Engelken+KL | complete | 0.5977 | 0.5449 | -0.0527 | 0.5977 | 0 | -0.0312 | 14 |
+| TRM | TRM volume-envelope+KL | complete | 0.5977 | 0.5508 | -0.0469 | 0.5977 | 0 | -0.0254 | 14 |
+| TRM | TRM basin consistency | complete | 0.5977 | 0.2422 | -0.3555 | 0.5977 | 0 | -0.3340 | 12 |
+| TRM | TRM single perturbed CE | complete | 0.5977 | 0.6191 | 0.0215 | 0.6582 | 2000 | 0.0430 | 12 |
+| TRM | TRM clean+multi perturbed CE | complete | 0.5977 | 0.5918 | -0.0059 | 0.6719 | 4000 | 0.0156 | 12 |
+| TRM | TRM fixed-unroll baseline 50k | complete | 0.5615 | 0.4971 | -0.0645 | 0.5947 | 22500 | -0.0791 | 22 |
+| TRM | TRM multi4 loguniform 50k | complete | 0.5615 | 0.5508 | -0.0107 | 0.6084 | 42500 | -0.0254 | 22 |
+
+## Floss Episode Diagnostics
+
+### HRM Engelken interfloss
+- episode 0 @ train_step 0: floss 0.030840 -> 0.000022, lyap1 -0.1476 -> 0.0047, volume -0.1693 -> 0.0013, KL mean/max 0.000000/0.000000
+- episode 1 @ train_step 500: floss 0.019141 -> 0.000014, lyap1 -0.1208 -> 0.0017, volume -0.1357 -> 0.0005, KL mean/max 0.000000/0.000000
+
+### HRM Engelken+KL interfloss
+- episode 0 @ train_step 0: floss 0.030840 -> 0.001082, lyap1 -0.1476 -> 0.0107, volume -0.1693 -> -0.0180, KL mean/max 0.645335/14.346942
+- episode 1 @ train_step 500: floss 0.007646 -> 0.000361, lyap1 -0.0588 -> 0.0122, volume -0.0812 -> -0.0068, KL mean/max 0.130172/4.959406
+
+### HRM conservative Engelken+KL
+- episode 0 @ train_step 0: floss 0.030840 -> 0.046181, lyap1 -0.1476 -> -0.2018, volume -0.1693 -> -0.2144, KL mean/max 2.819328/11.853225
+- episode 1 @ train_step 500: floss 0.016849 -> 0.048128, lyap1 -0.0850 -> -0.2127, volume -0.1248 -> -0.2180, KL mean/max 1.228630/5.586896
+
+### HRM late Engelken+KL
+- episode 0 @ train_step 0: floss 0.046401 -> 0.000793, lyap1 -0.1729 -> 0.0256, volume -0.1976 -> 0.0086, KL mean/max 0.418510/6.242658
+- episode 1 @ train_step 500: floss 0.007572 -> 0.001378, lyap1 0.0810 -> 0.0078, volume 0.0456 -> -0.0055, KL mean/max 0.132306/2.906027
+
+### HRM volume-envelope+KL
+- episode 0 @ train_step 0: floss 0.000046 -> 0.000000, lyap1 -0.1476 -> -0.2818, volume -0.1693 -> -0.3005, KL mean/max 0.436240/8.779900
+- episode 1 @ train_step 500: floss 0.000801 -> 0.000000, lyap1 -0.1108 -> -0.2791, volume -0.1299 -> -0.2994, KL mean/max 0.112411/1.281256
+
+### TRM Engelken interfloss
+- episode 0 @ train_step 0: floss 0.000883 -> 0.000047, lyap1 0.0089 -> -0.0011, volume 0.0023 -> -0.0047, KL mean/max 0.000000/0.000000
+- episode 1 @ train_step 500: floss 0.000257 -> 0.000006, lyap1 -0.0149 -> -0.0008, volume -0.0157 -> -0.0015, KL mean/max 0.000000/0.000000
+
+### TRM Engelken+KL interfloss
+- episode 0 @ train_step 0: floss 0.000883 -> 0.000242, lyap1 0.0089 -> 0.0158, volume 0.0023 -> 0.0064, KL mean/max 0.309170/7.308133
+- episode 1 @ train_step 500: floss 0.000063 -> 0.000059, lyap1 0.0031 -> 0.0088, volume -0.0017 -> 0.0018, KL mean/max 0.086242/0.927036
+
+### TRM late Engelken+KL
+- episode 0 @ train_step 0: floss 0.004981 -> 0.001447, lyap1 -0.0423 -> -0.0327, volume -0.0482 -> -0.0368, KL mean/max 0.296968/3.760708
+- episode 1 @ train_step 500: floss 0.002911 -> 0.001941, lyap1 -0.0540 -> -0.0372, volume -0.0538 -> -0.0426, KL mean/max 0.092765/3.175512
+
+### TRM volume-envelope+KL
+- episode 0 @ train_step 0: floss 0.000074 -> 0.000000, lyap1 0.0089 -> 0.0145, volume 0.0023 -> -0.0076, KL mean/max 0.286836/6.571321
+- episode 1 @ train_step 500: floss 0.000000 -> 0.000000, lyap1 0.0055 -> -0.0251, volume -0.0009 -> -0.0281, KL mean/max 0.089595/2.901087
+
+## Incomplete Runs / Process Snapshot
+
+- No monitored experiment processes are active.
+
+## Notes
+
+- `Final/Last` is `final_acc` when present, otherwise the latest eval accuracy.
+- `Vs Baseline` compares against the matching HRM/TRM 10k no-floss baseline.
+- A complete report may still show partial rows if an experiment crashed or was interrupted.
+
+## Next Experiment Questions
+
+- Close the faithful-flossing loop before making a claim that flossing is ineffective. Required matrix: from-scratch baseline, direct flossing loss as negative control, faithful prefloss, faithful interfloss with separated floss optimizer steps, and optionally volume/spectrum interfloss.
+- Treat continuation runs as screening only. Final claims about flossing should use from-scratch full training, because continuation can confound optimizer state, EMA horizon, puzzle embeddings, and data-order effects.
+- Run GRM/PTRM after or in parallel with faithful flossing. This answers a different question: whether stochastic multi-rollout plus Q-selection is learning a low-dimensional stability/Lyapunov-spectrum observer.
+- For GRM/PTRM, compare learned Q selection with lambda1 and top-spectrum-feature selection, and measure Q-score correlation with stability features.
author	YurenHao0426 <blackhao0426@gmail.com>	2026-06-13 12:35:36 -0500
committer	YurenHao0426 <blackhao0426@gmail.com>	2026-06-13 12:35:36 -0500
commit	66e0d8b9fd4d0f7a2231d689c055e26fdf1cf04a (patch)
tree	c29cba61124018755a19b02c9d33e3ad5f2e05cc /research/flossing/dynamics_experiment_report.md