# Trajectory Augmentation Notes

## Current Hypothesis

Use Lyapunov-style tiny perturbations of the initial recurrent latent state as
hidden-trajectory augmentation:

```text
x, y unchanged
z0 -> z0 + epsilon
loss = original supervised ACT/QA loss against y
```

This tests task-level attractor stability directly: if the dynamics are
chaotic, a tiny perturbation of the initial trajectory should miss the correct
answer basin; training forces perturbed trajectories for the same ground-truth
pair to still reach `y`.

## Running First

Initial queued runs use:

```text
sigma = 1e-3
perturb = z_H and z_L
single_perturbed_ce: one perturbed trajectory, no clean branch
multi_perturbed_ce: clean plus three perturbed trajectories, averaged CE
```

No KL, no Lyapunov loss, no JVP/flossing, no data/input augmentation.

## Backup Experiments

Do not conclude from one noise scale. If the current runs fail or are
ambiguous, test a noise curriculum:

```text
clean CE warmup for N steps
then enable trajectory augmentation
sigma ramp: 0 -> target_sigma
```

Candidate fixed/ramp target scales:

```text
1e-5, 3e-5, 1e-4, 3e-4, 1e-3, 3e-3, 1e-2
```

Instead of treating `sigma` as one fixed value, also test centered noise
distributions around the no-perturbation trajectory:

```text
epsilon ~ Normal(0, sigma^2 I)
sigma sampled per trajectory from LogUniform(sigma_min, sigma_max)
sigma ramped over training, then sampled in a band around the target
mixture: p(clean) delta_0 + (1 - p(clean)) Normal(0, sigma^2 I)
```

The distribution should remain centered at zero so the clean trajectory is the
mean trajectory. The goal is to cover a small ball/shell around `z0`, not to
move the model to a new deterministic offset.

Also compare perturb locations:

```text
z_H only
z_L only
z_H and z_L
```

The expected success signature is not necessarily all `lambda1 < 0`; better
signals are improved perturbed-rollout success, fewer broad positive modes,
and deterministic clean accuracy not regressing.

## Long-Train Engineering Note

`step9_trajectory_perturb_train.py` now supports two rollout implementations:

```text
serial_act:
  old path; K trajectories are rolled out one by one through the ACT wrapper

parallel_fixed:
  B -> B*K
  first rollout is clean for multi_perturbed_ce
  remaining rollouts sample centered perturbations
  run exactly halt_max_steps without ACT streaming reset
  average supervised loss across B*K trajectories
```

`parallel_fixed` is the default for new runs. It deliberately avoids the ACT
wrapper's halted-sample reset, because reset would make early-halted copies of
the same `(x,y)` repeat inside one optimizer step.

Preferred options:

```text
1. Fixed-unroll multiK:
   B -> B*K
   repeat x,y K times
   initialize clean/noisy z0 variants
   run exactly halt_max_steps for all trajectories
   compute supervised loss on fixed rollout outputs
```

This is simplest and matches the stability question: all initial-neighborhood
trajectories should reach `y` after the same reasoning budget.

```text
2. ACT-mask multiK:
   B -> B*K
   repeat x,y K times
   run ACT in parallel
   maintain active_mask
   after a trajectory halts, zero/mask later loss contributions
   normalize per trajectory or by valid trajectory-steps
```

Do not naively concatenate `B*K` and use the unmodified ACT streaming semantics
without masking. The wrapper resets halted samples and reloads data, which is
correct for ordinary streaming training but would make early-halted copies of
the same `(x,y)` repeat inside one optimizer step.

## Current Long Runs

Started 2026-05-27:

```text
step9_E_hrm_baseline_parallel_fixed_26040_50k
step9_F_hrm_multi4_loguniform_ramp_26040_50k
step9_G_trm_baseline_parallel_fixed_26041_batch4_50k
step9_H_trm_multi4_loguniform_ramp_26041_batch4_50k
```

Perturb runs use:

```text
K = 4
noise_sampling = loguniform
sigma interval final = [3e-5, 3e-3]
sigma ramp = 0 -> final interval over 5000 steps
perturb = z_H and z_L
eval_every = 2500
eval_n = 1024
save_every_eval + save_best + save_final
```

These runs include fixed-unroll baselines because the fixed-unroll objective is
not identical to the old ACT-streaming baseline.

## Long-Run Result Snapshot

Completed 2026-05-27:

```text
HRM fixed baseline:
  initial 0.5176, best 0.6328 @ 5000, final 0.5801

HRM multi4 loguniform:
  initial 0.5176, best 0.6250 @ 7500, final 0.5889

TRM fixed baseline:
  initial 0.5615, best 0.5947 @ 22500, final 0.4971

TRM multi4 loguniform:
  initial 0.5615, best 0.6084 @ 42500, final 0.5508
```

Interpretation: HRM did not get a best-accuracy gain from multi4, though final
accuracy decayed slightly less. TRM did get both a higher best and a much less
bad final decay under multi4, matching the earlier 10k signal that trajectory
augmentation may raise the ceiling or reduce regression, but is still unstable.

## Resume Support

Model-only resume has always worked if `--ckpt-root` points to the original
config directory and `--ckpt-name` is an absolute path to a saved `best.pt` or
`final.pt`.

Exact training-state resume is now supported:

```text
--save-train-state
  writes latest_state.pt, best_state.pt, final_state.pt

--resume-state path/to/latest_state.pt
  restores model weights, optimizer state, current train_step, best_acc, and RNG
```

The launcher now enables `--save-train-state` for future long runs.