# Trajectory Augmentation Notes ## Current Hypothesis Use Lyapunov-style tiny perturbations of the initial recurrent latent state as hidden-trajectory augmentation: ```text x, y unchanged z0 -> z0 + epsilon loss = original supervised ACT/QA loss against y ``` This tests task-level attractor stability directly: if the dynamics are chaotic, a tiny perturbation of the initial trajectory should miss the correct answer basin; training forces perturbed trajectories for the same ground-truth pair to still reach `y`. ## Running First Initial queued runs use: ```text sigma = 1e-3 perturb = z_H and z_L single_perturbed_ce: one perturbed trajectory, no clean branch multi_perturbed_ce: clean plus three perturbed trajectories, averaged CE ``` No KL, no Lyapunov loss, no JVP/flossing, no data/input augmentation. ## Backup Experiments Do not conclude from one noise scale. If the current runs fail or are ambiguous, test a noise curriculum: ```text clean CE warmup for N steps then enable trajectory augmentation sigma ramp: 0 -> target_sigma ``` Candidate fixed/ramp target scales: ```text 1e-5, 3e-5, 1e-4, 3e-4, 1e-3, 3e-3, 1e-2 ``` Instead of treating `sigma` as one fixed value, also test centered noise distributions around the no-perturbation trajectory: ```text epsilon ~ Normal(0, sigma^2 I) sigma sampled per trajectory from LogUniform(sigma_min, sigma_max) sigma ramped over training, then sampled in a band around the target mixture: p(clean) delta_0 + (1 - p(clean)) Normal(0, sigma^2 I) ``` The distribution should remain centered at zero so the clean trajectory is the mean trajectory. The goal is to cover a small ball/shell around `z0`, not to move the model to a new deterministic offset. Also compare perturb locations: ```text z_H only z_L only z_H and z_L ``` The expected success signature is not necessarily all `lambda1 < 0`; better signals are improved perturbed-rollout success, fewer broad positive modes, and deterministic clean accuracy not regressing. ## Long-Train Engineering Note `step9_trajectory_perturb_train.py` now supports two rollout implementations: ```text serial_act: old path; K trajectories are rolled out one by one through the ACT wrapper parallel_fixed: B -> B*K first rollout is clean for multi_perturbed_ce remaining rollouts sample centered perturbations run exactly halt_max_steps without ACT streaming reset average supervised loss across B*K trajectories ``` `parallel_fixed` is the default for new runs. It deliberately avoids the ACT wrapper's halted-sample reset, because reset would make early-halted copies of the same `(x,y)` repeat inside one optimizer step. Preferred options: ```text 1. Fixed-unroll multiK: B -> B*K repeat x,y K times initialize clean/noisy z0 variants run exactly halt_max_steps for all trajectories compute supervised loss on fixed rollout outputs ``` This is simplest and matches the stability question: all initial-neighborhood trajectories should reach `y` after the same reasoning budget. ```text 2. ACT-mask multiK: B -> B*K repeat x,y K times run ACT in parallel maintain active_mask after a trajectory halts, zero/mask later loss contributions normalize per trajectory or by valid trajectory-steps ``` Do not naively concatenate `B*K` and use the unmodified ACT streaming semantics without masking. The wrapper resets halted samples and reloads data, which is correct for ordinary streaming training but would make early-halted copies of the same `(x,y)` repeat inside one optimizer step. ## Current Long Runs Started 2026-05-27: ```text step9_E_hrm_baseline_parallel_fixed_26040_50k step9_F_hrm_multi4_loguniform_ramp_26040_50k step9_G_trm_baseline_parallel_fixed_26041_batch4_50k step9_H_trm_multi4_loguniform_ramp_26041_batch4_50k ``` Perturb runs use: ```text K = 4 noise_sampling = loguniform sigma interval final = [3e-5, 3e-3] sigma ramp = 0 -> final interval over 5000 steps perturb = z_H and z_L eval_every = 2500 eval_n = 1024 save_every_eval + save_best + save_final ``` These runs include fixed-unroll baselines because the fixed-unroll objective is not identical to the old ACT-streaming baseline. ## Long-Run Result Snapshot Completed 2026-05-27: ```text HRM fixed baseline: initial 0.5176, best 0.6328 @ 5000, final 0.5801 HRM multi4 loguniform: initial 0.5176, best 0.6250 @ 7500, final 0.5889 TRM fixed baseline: initial 0.5615, best 0.5947 @ 22500, final 0.4971 TRM multi4 loguniform: initial 0.5615, best 0.6084 @ 42500, final 0.5508 ``` Interpretation: HRM did not get a best-accuracy gain from multi4, though final accuracy decayed slightly less. TRM did get both a higher best and a much less bad final decay under multi4, matching the earlier 10k signal that trajectory augmentation may raise the ceiling or reduce regression, but is still unstable. ## Resume Support Model-only resume has always worked if `--ckpt-root` points to the original config directory and `--ckpt-name` is an absolute path to a saved `best.pt` or `final.pt`. Exact training-state resume is now supported: ```text --save-train-state writes latest_state.pt, best_state.pt, final_state.pt --resume-state path/to/latest_state.pt restores model weights, optimizer state, current train_step, best_acc, and RNG ``` The launcher now enables `--save-train-state` for future long runs.