From 66e0d8b9fd4d0f7a2231d689c055e26fdf1cf04a Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Sat, 13 Jun 2026 12:35:36 -0500 Subject: rrm workspace: TRM/HRM/SRM code, Maze dataset, dynamical-analysis pipeline Curated export for clone-and-run Maze training (2x A6000) + diagnostics. trm/hrm pretrain.py carry trajectory-augmentation code (backward-compatible). Heavy artifacts (checkpoints/wandb/npz) gitignored; see PROVENANCE.md. Co-Authored-By: Claude Fable 5 --- research/flossing/trajectory_augmentation_notes.md | 190 +++++++++++++++++++++ 1 file changed, 190 insertions(+) create mode 100644 research/flossing/trajectory_augmentation_notes.md (limited to 'research/flossing/trajectory_augmentation_notes.md') diff --git a/research/flossing/trajectory_augmentation_notes.md b/research/flossing/trajectory_augmentation_notes.md new file mode 100644 index 0000000..0460bf7 --- /dev/null +++ b/research/flossing/trajectory_augmentation_notes.md @@ -0,0 +1,190 @@ +# Trajectory Augmentation Notes + +## Current Hypothesis + +Use Lyapunov-style tiny perturbations of the initial recurrent latent state as +hidden-trajectory augmentation: + +```text +x, y unchanged +z0 -> z0 + epsilon +loss = original supervised ACT/QA loss against y +``` + +This tests task-level attractor stability directly: if the dynamics are +chaotic, a tiny perturbation of the initial trajectory should miss the correct +answer basin; training forces perturbed trajectories for the same ground-truth +pair to still reach `y`. + +## Running First + +Initial queued runs use: + +```text +sigma = 1e-3 +perturb = z_H and z_L +single_perturbed_ce: one perturbed trajectory, no clean branch +multi_perturbed_ce: clean plus three perturbed trajectories, averaged CE +``` + +No KL, no Lyapunov loss, no JVP/flossing, no data/input augmentation. + +## Backup Experiments + +Do not conclude from one noise scale. If the current runs fail or are +ambiguous, test a noise curriculum: + +```text +clean CE warmup for N steps +then enable trajectory augmentation +sigma ramp: 0 -> target_sigma +``` + +Candidate fixed/ramp target scales: + +```text +1e-5, 3e-5, 1e-4, 3e-4, 1e-3, 3e-3, 1e-2 +``` + +Instead of treating `sigma` as one fixed value, also test centered noise +distributions around the no-perturbation trajectory: + +```text +epsilon ~ Normal(0, sigma^2 I) +sigma sampled per trajectory from LogUniform(sigma_min, sigma_max) +sigma ramped over training, then sampled in a band around the target +mixture: p(clean) delta_0 + (1 - p(clean)) Normal(0, sigma^2 I) +``` + +The distribution should remain centered at zero so the clean trajectory is the +mean trajectory. The goal is to cover a small ball/shell around `z0`, not to +move the model to a new deterministic offset. + +Also compare perturb locations: + +```text +z_H only +z_L only +z_H and z_L +``` + +The expected success signature is not necessarily all `lambda1 < 0`; better +signals are improved perturbed-rollout success, fewer broad positive modes, +and deterministic clean accuracy not regressing. + +## Long-Train Engineering Note + +`step9_trajectory_perturb_train.py` now supports two rollout implementations: + +```text +serial_act: + old path; K trajectories are rolled out one by one through the ACT wrapper + +parallel_fixed: + B -> B*K + first rollout is clean for multi_perturbed_ce + remaining rollouts sample centered perturbations + run exactly halt_max_steps without ACT streaming reset + average supervised loss across B*K trajectories +``` + +`parallel_fixed` is the default for new runs. It deliberately avoids the ACT +wrapper's halted-sample reset, because reset would make early-halted copies of +the same `(x,y)` repeat inside one optimizer step. + +Preferred options: + +```text +1. Fixed-unroll multiK: + B -> B*K + repeat x,y K times + initialize clean/noisy z0 variants + run exactly halt_max_steps for all trajectories + compute supervised loss on fixed rollout outputs +``` + +This is simplest and matches the stability question: all initial-neighborhood +trajectories should reach `y` after the same reasoning budget. + +```text +2. ACT-mask multiK: + B -> B*K + repeat x,y K times + run ACT in parallel + maintain active_mask + after a trajectory halts, zero/mask later loss contributions + normalize per trajectory or by valid trajectory-steps +``` + +Do not naively concatenate `B*K` and use the unmodified ACT streaming semantics +without masking. The wrapper resets halted samples and reloads data, which is +correct for ordinary streaming training but would make early-halted copies of +the same `(x,y)` repeat inside one optimizer step. + +## Current Long Runs + +Started 2026-05-27: + +```text +step9_E_hrm_baseline_parallel_fixed_26040_50k +step9_F_hrm_multi4_loguniform_ramp_26040_50k +step9_G_trm_baseline_parallel_fixed_26041_batch4_50k +step9_H_trm_multi4_loguniform_ramp_26041_batch4_50k +``` + +Perturb runs use: + +```text +K = 4 +noise_sampling = loguniform +sigma interval final = [3e-5, 3e-3] +sigma ramp = 0 -> final interval over 5000 steps +perturb = z_H and z_L +eval_every = 2500 +eval_n = 1024 +save_every_eval + save_best + save_final +``` + +These runs include fixed-unroll baselines because the fixed-unroll objective is +not identical to the old ACT-streaming baseline. + +## Long-Run Result Snapshot + +Completed 2026-05-27: + +```text +HRM fixed baseline: + initial 0.5176, best 0.6328 @ 5000, final 0.5801 + +HRM multi4 loguniform: + initial 0.5176, best 0.6250 @ 7500, final 0.5889 + +TRM fixed baseline: + initial 0.5615, best 0.5947 @ 22500, final 0.4971 + +TRM multi4 loguniform: + initial 0.5615, best 0.6084 @ 42500, final 0.5508 +``` + +Interpretation: HRM did not get a best-accuracy gain from multi4, though final +accuracy decayed slightly less. TRM did get both a higher best and a much less +bad final decay under multi4, matching the earlier 10k signal that trajectory +augmentation may raise the ceiling or reduce regression, but is still unstable. + +## Resume Support + +Model-only resume has always worked if `--ckpt-root` points to the original +config directory and `--ckpt-name` is an absolute path to a saved `best.pt` or +`final.pt`. + +Exact training-state resume is now supported: + +```text +--save-train-state + writes latest_state.pt, best_state.pt, final_state.pt + +--resume-state path/to/latest_state.pt + restores model weights, optimizer state, current train_step, best_acc, and RNG +``` + +The launcher now enables `--save-train-state` for future long runs. -- cgit v1.2.3