summaryrefslogtreecommitdiff
path: root/research/flossing/rainer_email_bundle_20260605/email_draft.md
diff options
context:
space:
mode:
Diffstat (limited to 'research/flossing/rainer_email_bundle_20260605/email_draft.md')
-rw-r--r--research/flossing/rainer_email_bundle_20260605/email_draft.md18
1 files changed, 18 insertions, 0 deletions
diff --git a/research/flossing/rainer_email_bundle_20260605/email_draft.md b/research/flossing/rainer_email_bundle_20260605/email_draft.md
new file mode 100644
index 0000000..cd92465
--- /dev/null
+++ b/research/flossing/rainer_email_bundle_20260605/email_draft.md
@@ -0,0 +1,18 @@
+Subject: Question on gradient flossing vs forward trajectory stability in recursive reasoning models
+
+Hi Rainer,
+
+I hope you are doing well. I have been studying recursive reasoning models: small recurrent/iterative models that solve a problem by repeatedly refining a latent reasoning state before emitting an answer, for example HRM/TRM-style models on Sudoku-like reasoning tasks.
+
+We found a strong dynamical signal during inference. If we measure finite-time Lyapunov exponents along the model's recurrent inference trajectory, failed examples are much more chaotic than successful examples. This is already visible in the first exponent (Fig. 1). After measuring more of the spectrum, the effect looks less like one isolated bad mode and more like a broad shift of the spectrum toward expansion on failing examples (Fig. 2).
+
+This made me revisit your gradient flossing work. My current interpretation is that gradient flossing is mainly about improving the stability/conditioning of Jacobian products for learning through recurrent dynamics, while our signal may be more about forward inference stability: whether the correct answer lies in a sufficiently stable attractor basin. We tried some Engelken-style pre/interflossing analogues; they reproduce the toy RNN effect, but in our RRM setting the effect is weak so far. In contrast, a simple forward perturbation training objective looks more promising: for the same supervised pair `(x, y)`, we run additional trajectories with small perturbations to the recurrent latent state and require them to reach the same target. This raises peak accuracy/ceiling in our current runs (Fig. 3).
+
+My question is whether this distinction between gradient-propagation stability and forward-inference attractor stability makes sense from your perspective. If so, would you expect spectrum flossing to need to be local, task-conditioned, or late-trajectory-only to help here? Or is there a more standard dynamical-systems/control framing, such as transverse stability, basin enlargement, shadowing, or stochastic stabilization, that you think is more appropriate?
+
+I attached a small figure pack. Fig. 4 is optional context: in a probabilistic TRM variant with multiple noisy rollouts and a learned Q-head selector, the learned Q score is correlated with a finite-difference stability proxy, suggesting that the selector may be learning a low-dimensional projection of trajectory stability.
+
+I would be grateful for any pointer or sanity check, especially if we are misusing the gradient flossing intuition.
+
+Best,
+Yuren