From 66e0d8b9fd4d0f7a2231d689c055e26fdf1cf04a Mon Sep 17 00:00:00 2001
From: YurenHao0426 <blackhao0426@gmail.com>
Date: Sat, 13 Jun 2026 12:35:36 -0500
Subject: rrm workspace: TRM/HRM/SRM code, Maze dataset, dynamical-analysis
 pipeline

Curated export for clone-and-run Maze training (2x A6000) + diagnostics.
trm/hrm pretrain.py carry trajectory-augmentation code (backward-compatible).
Heavy artifacts (checkpoints/wandb/npz) gitignored; see PROVENANCE.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 research/flossing/trajectory_augmentation_notes.md | 190 +++++++++++++++++++++
 1 file changed, 190 insertions(+)
 create mode 100644 research/flossing/trajectory_augmentation_notes.md

(limited to 'research/flossing/trajectory_augmentation_notes.md')

diff --git a/research/flossing/trajectory_augmentation_notes.md b/research/flossing/trajectory_augmentation_notes.md
new file mode 100644
index 0000000..0460bf7
--- /dev/null
+++ b/research/flossing/trajectory_augmentation_notes.md
@@ -0,0 +1,190 @@
+# Trajectory Augmentation Notes
+
+## Current Hypothesis
+
+Use Lyapunov-style tiny perturbations of the initial recurrent latent state as
+hidden-trajectory augmentation:
+
+```text
+x, y unchanged
+z0 -> z0 + epsilon
+loss = original supervised ACT/QA loss against y
+```
+
+This tests task-level attractor stability directly: if the dynamics are
+chaotic, a tiny perturbation of the initial trajectory should miss the correct
+answer basin; training forces perturbed trajectories for the same ground-truth
+pair to still reach `y`.
+
+## Running First
+
+Initial queued runs use:
+
+```text
+sigma = 1e-3
+perturb = z_H and z_L
+single_perturbed_ce: one perturbed trajectory, no clean branch
+multi_perturbed_ce: clean plus three perturbed trajectories, averaged CE
+```
+
+No KL, no Lyapunov loss, no JVP/flossing, no data/input augmentation.
+
+## Backup Experiments
+
+Do not conclude from one noise scale. If the current runs fail or are
+ambiguous, test a noise curriculum:
+
+```text
+clean CE warmup for N steps
+then enable trajectory augmentation
+sigma ramp: 0 -> target_sigma
+```
+
+Candidate fixed/ramp target scales:
+
+```text
+1e-5, 3e-5, 1e-4, 3e-4, 1e-3, 3e-3, 1e-2
+```
+
+Instead of treating `sigma` as one fixed value, also test centered noise
+distributions around the no-perturbation trajectory:
+
+```text
+epsilon ~ Normal(0, sigma^2 I)
+sigma sampled per trajectory from LogUniform(sigma_min, sigma_max)
+sigma ramped over training, then sampled in a band around the target
+mixture: p(clean) delta_0 + (1 - p(clean)) Normal(0, sigma^2 I)
+```
+
+The distribution should remain centered at zero so the clean trajectory is the
+mean trajectory. The goal is to cover a small ball/shell around `z0`, not to
+move the model to a new deterministic offset.
+
+Also compare perturb locations:
+
+```text
+z_H only
+z_L only
+z_H and z_L
+```
+
+The expected success signature is not necessarily all `lambda1 < 0`; better
+signals are improved perturbed-rollout success, fewer broad positive modes,
+and deterministic clean accuracy not regressing.
+
+## Long-Train Engineering Note
+
+`step9_trajectory_perturb_train.py` now supports two rollout implementations:
+
+```text
+serial_act:
+  old path; K trajectories are rolled out one by one through the ACT wrapper
+
+parallel_fixed:
+  B -> B*K
+  first rollout is clean for multi_perturbed_ce
+  remaining rollouts sample centered perturbations
+  run exactly halt_max_steps without ACT streaming reset
+  average supervised loss across B*K trajectories
+```
+
+`parallel_fixed` is the default for new runs. It deliberately avoids the ACT
+wrapper's halted-sample reset, because reset would make early-halted copies of
+the same `(x,y)` repeat inside one optimizer step.
+
+Preferred options:
+
+```text
+1. Fixed-unroll multiK:
+   B -> B*K
+   repeat x,y K times
+   initialize clean/noisy z0 variants
+   run exactly halt_max_steps for all trajectories
+   compute supervised loss on fixed rollout outputs
+```
+
+This is simplest and matches the stability question: all initial-neighborhood
+trajectories should reach `y` after the same reasoning budget.
+
+```text
+2. ACT-mask multiK:
+   B -> B*K
+   repeat x,y K times
+   run ACT in parallel
+   maintain active_mask
+   after a trajectory halts, zero/mask later loss contributions
+   normalize per trajectory or by valid trajectory-steps
+```
+
+Do not naively concatenate `B*K` and use the unmodified ACT streaming semantics
+without masking. The wrapper resets halted samples and reloads data, which is
+correct for ordinary streaming training but would make early-halted copies of
+the same `(x,y)` repeat inside one optimizer step.
+
+## Current Long Runs
+
+Started 2026-05-27:
+
+```text
+step9_E_hrm_baseline_parallel_fixed_26040_50k
+step9_F_hrm_multi4_loguniform_ramp_26040_50k
+step9_G_trm_baseline_parallel_fixed_26041_batch4_50k
+step9_H_trm_multi4_loguniform_ramp_26041_batch4_50k
+```
+
+Perturb runs use:
+
+```text
+K = 4
+noise_sampling = loguniform
+sigma interval final = [3e-5, 3e-3]
+sigma ramp = 0 -> final interval over 5000 steps
+perturb = z_H and z_L
+eval_every = 2500
+eval_n = 1024
+save_every_eval + save_best + save_final
+```
+
+These runs include fixed-unroll baselines because the fixed-unroll objective is
+not identical to the old ACT-streaming baseline.
+
+## Long-Run Result Snapshot
+
+Completed 2026-05-27:
+
+```text
+HRM fixed baseline:
+  initial 0.5176, best 0.6328 @ 5000, final 0.5801
+
+HRM multi4 loguniform:
+  initial 0.5176, best 0.6250 @ 7500, final 0.5889
+
+TRM fixed baseline:
+  initial 0.5615, best 0.5947 @ 22500, final 0.4971
+
+TRM multi4 loguniform:
+  initial 0.5615, best 0.6084 @ 42500, final 0.5508
+```
+
+Interpretation: HRM did not get a best-accuracy gain from multi4, though final
+accuracy decayed slightly less. TRM did get both a higher best and a much less
+bad final decay under multi4, matching the earlier 10k signal that trajectory
+augmentation may raise the ceiling or reduce regression, but is still unstable.
+
+## Resume Support
+
+Model-only resume has always worked if `--ckpt-root` points to the original
+config directory and `--ckpt-name` is an absolute path to a saved `best.pt` or
+`final.pt`.
+
+Exact training-state resume is now supported:
+
+```text
+--save-train-state
+  writes latest_state.pt, best_state.pt, final_state.pt
+
+--resume-state path/to/latest_state.pt
+  restores model weights, optimizer state, current train_step, best_acc, and RNG
+```
+
+The launcher now enables `--save-train-state` for future long runs.
-- 
cgit v1.2.3