diff options
Diffstat (limited to 'research/flossing/dynamics_experiment_report.md')
| -rw-r--r-- | research/flossing/dynamics_experiment_report.md | 86 |
1 files changed, 86 insertions, 0 deletions
diff --git a/research/flossing/dynamics_experiment_report.md b/research/flossing/dynamics_experiment_report.md new file mode 100644 index 0000000..489fb91 --- /dev/null +++ b/research/flossing/dynamics_experiment_report.md @@ -0,0 +1,86 @@ +# Dynamics Control Experiment Report + +Generated: 2026-05-27T20:24:05 + +## Summary Table + +| Model | Run | Status | Init | Final/Last | Delta | Best | Best Step | Vs Baseline | Evals | +|---|---|---:|---:|---:|---:|---:|---:|---:|---:| +| HRM | HRM baseline 10k | complete | 0.5176 | 0.6504 | 0.1328 | 0.6699 | 9000 | NA | 12 | +| HRM | HRM mixed volume-CF | complete | 0.5176 | 0.6562 | 0.1387 | 0.6660 | 7000 | 0.0059 | 12 | +| HRM | HRM Engelken interfloss | complete | 0.5176 | 0.6465 | 0.1289 | 0.6621 | 8000 | -0.0039 | 14 | +| HRM | HRM Engelken+KL interfloss | complete | 0.5176 | 0.6211 | 0.1035 | 0.6250 | 9000 | -0.0293 | 14 | +| HRM | HRM conservative Engelken+KL | complete | 0.5176 | 0.6367 | 0.1191 | 0.6777 | 7000 | -0.0137 | 14 | +| HRM | HRM late Engelken+KL | complete | 0.5176 | 0.6191 | 0.1016 | 0.6367 | 7000 | -0.0312 | 14 | +| HRM | HRM volume-envelope+KL | complete | 0.5176 | 0.6074 | 0.0898 | 0.6211 | 8000 | -0.0430 | 14 | +| HRM | HRM basin consistency | complete | 0.5176 | 0.6152 | 0.0977 | 0.6152 | 10000 | -0.0352 | 12 | +| HRM | HRM single perturbed CE | complete | 0.5176 | 0.6660 | 0.1484 | 0.6660 | 5000 | 0.0156 | 12 | +| HRM | HRM clean+multi perturbed CE | complete | 0.5176 | 0.6543 | 0.1367 | 0.6660 | 7000 | 0.0039 | 12 | +| HRM | HRM fixed-unroll baseline 50k | complete | 0.5176 | 0.5801 | 0.0625 | 0.6328 | 5000 | -0.0703 | 22 | +| HRM | HRM multi4 loguniform 50k | complete | 0.5176 | 0.5889 | 0.0713 | 0.6250 | 7500 | -0.0615 | 22 | +| TRM | TRM baseline 10k | complete | 0.5977 | 0.5762 | -0.0215 | 0.6621 | 2000 | NA | 12 | +| TRM | TRM mixed volume-CF | complete | 0.5977 | 0.5078 | -0.0898 | 0.6133 | 1000 | -0.0684 | 12 | +| TRM | TRM Engelken interfloss | complete | 0.5977 | 0.5195 | -0.0781 | 0.5977 | 0 | -0.0566 | 14 | +| TRM | TRM Engelken+KL interfloss | complete | 0.5977 | 0.5957 | -0.0020 | 0.5977 | 0 | 0.0195 | 14 | +| TRM | TRM late Engelken+KL | complete | 0.5977 | 0.5449 | -0.0527 | 0.5977 | 0 | -0.0312 | 14 | +| TRM | TRM volume-envelope+KL | complete | 0.5977 | 0.5508 | -0.0469 | 0.5977 | 0 | -0.0254 | 14 | +| TRM | TRM basin consistency | complete | 0.5977 | 0.2422 | -0.3555 | 0.5977 | 0 | -0.3340 | 12 | +| TRM | TRM single perturbed CE | complete | 0.5977 | 0.6191 | 0.0215 | 0.6582 | 2000 | 0.0430 | 12 | +| TRM | TRM clean+multi perturbed CE | complete | 0.5977 | 0.5918 | -0.0059 | 0.6719 | 4000 | 0.0156 | 12 | +| TRM | TRM fixed-unroll baseline 50k | complete | 0.5615 | 0.4971 | -0.0645 | 0.5947 | 22500 | -0.0791 | 22 | +| TRM | TRM multi4 loguniform 50k | complete | 0.5615 | 0.5508 | -0.0107 | 0.6084 | 42500 | -0.0254 | 22 | + +## Floss Episode Diagnostics + +### HRM Engelken interfloss +- episode 0 @ train_step 0: floss 0.030840 -> 0.000022, lyap1 -0.1476 -> 0.0047, volume -0.1693 -> 0.0013, KL mean/max 0.000000/0.000000 +- episode 1 @ train_step 500: floss 0.019141 -> 0.000014, lyap1 -0.1208 -> 0.0017, volume -0.1357 -> 0.0005, KL mean/max 0.000000/0.000000 + +### HRM Engelken+KL interfloss +- episode 0 @ train_step 0: floss 0.030840 -> 0.001082, lyap1 -0.1476 -> 0.0107, volume -0.1693 -> -0.0180, KL mean/max 0.645335/14.346942 +- episode 1 @ train_step 500: floss 0.007646 -> 0.000361, lyap1 -0.0588 -> 0.0122, volume -0.0812 -> -0.0068, KL mean/max 0.130172/4.959406 + +### HRM conservative Engelken+KL +- episode 0 @ train_step 0: floss 0.030840 -> 0.046181, lyap1 -0.1476 -> -0.2018, volume -0.1693 -> -0.2144, KL mean/max 2.819328/11.853225 +- episode 1 @ train_step 500: floss 0.016849 -> 0.048128, lyap1 -0.0850 -> -0.2127, volume -0.1248 -> -0.2180, KL mean/max 1.228630/5.586896 + +### HRM late Engelken+KL +- episode 0 @ train_step 0: floss 0.046401 -> 0.000793, lyap1 -0.1729 -> 0.0256, volume -0.1976 -> 0.0086, KL mean/max 0.418510/6.242658 +- episode 1 @ train_step 500: floss 0.007572 -> 0.001378, lyap1 0.0810 -> 0.0078, volume 0.0456 -> -0.0055, KL mean/max 0.132306/2.906027 + +### HRM volume-envelope+KL +- episode 0 @ train_step 0: floss 0.000046 -> 0.000000, lyap1 -0.1476 -> -0.2818, volume -0.1693 -> -0.3005, KL mean/max 0.436240/8.779900 +- episode 1 @ train_step 500: floss 0.000801 -> 0.000000, lyap1 -0.1108 -> -0.2791, volume -0.1299 -> -0.2994, KL mean/max 0.112411/1.281256 + +### TRM Engelken interfloss +- episode 0 @ train_step 0: floss 0.000883 -> 0.000047, lyap1 0.0089 -> -0.0011, volume 0.0023 -> -0.0047, KL mean/max 0.000000/0.000000 +- episode 1 @ train_step 500: floss 0.000257 -> 0.000006, lyap1 -0.0149 -> -0.0008, volume -0.0157 -> -0.0015, KL mean/max 0.000000/0.000000 + +### TRM Engelken+KL interfloss +- episode 0 @ train_step 0: floss 0.000883 -> 0.000242, lyap1 0.0089 -> 0.0158, volume 0.0023 -> 0.0064, KL mean/max 0.309170/7.308133 +- episode 1 @ train_step 500: floss 0.000063 -> 0.000059, lyap1 0.0031 -> 0.0088, volume -0.0017 -> 0.0018, KL mean/max 0.086242/0.927036 + +### TRM late Engelken+KL +- episode 0 @ train_step 0: floss 0.004981 -> 0.001447, lyap1 -0.0423 -> -0.0327, volume -0.0482 -> -0.0368, KL mean/max 0.296968/3.760708 +- episode 1 @ train_step 500: floss 0.002911 -> 0.001941, lyap1 -0.0540 -> -0.0372, volume -0.0538 -> -0.0426, KL mean/max 0.092765/3.175512 + +### TRM volume-envelope+KL +- episode 0 @ train_step 0: floss 0.000074 -> 0.000000, lyap1 0.0089 -> 0.0145, volume 0.0023 -> -0.0076, KL mean/max 0.286836/6.571321 +- episode 1 @ train_step 500: floss 0.000000 -> 0.000000, lyap1 0.0055 -> -0.0251, volume -0.0009 -> -0.0281, KL mean/max 0.089595/2.901087 + +## Incomplete Runs / Process Snapshot + +- No monitored experiment processes are active. + +## Notes + +- `Final/Last` is `final_acc` when present, otherwise the latest eval accuracy. +- `Vs Baseline` compares against the matching HRM/TRM 10k no-floss baseline. +- A complete report may still show partial rows if an experiment crashed or was interrupted. + +## Next Experiment Questions + +- Close the faithful-flossing loop before making a claim that flossing is ineffective. Required matrix: from-scratch baseline, direct flossing loss as negative control, faithful prefloss, faithful interfloss with separated floss optimizer steps, and optionally volume/spectrum interfloss. +- Treat continuation runs as screening only. Final claims about flossing should use from-scratch full training, because continuation can confound optimizer state, EMA horizon, puzzle embeddings, and data-order effects. +- Run GRM/PTRM after or in parallel with faithful flossing. This answers a different question: whether stochastic multi-rollout plus Q-selection is learning a low-dimensional stability/Lyapunov-spectrum observer. +- For GRM/PTRM, compare learned Q selection with lambda1 and top-spectrum-feature selection, and measure Q-score correlation with stability features. |
