From a6ec4288a2232988b130b2f00bb2565f81706966 Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Mon, 29 Jun 2026 12:15:51 -0500 Subject: Recursive reasoning dynamics: analysis pipeline, paper drafts, toy models Failure=more-chaotic (task-general under validity labeling) reduces to convergence/completeness detection; mechanism (transient chaos vs multistability vs input-induced) under investigation. Co-Authored-By: Claude Fable 5 --- papers/txt/gram2026_generative_recursive.txt | 1532 ++++++++++++++++++++++++++ 1 file changed, 1532 insertions(+) create mode 100644 papers/txt/gram2026_generative_recursive.txt (limited to 'papers/txt/gram2026_generative_recursive.txt') diff --git a/papers/txt/gram2026_generative_recursive.txt b/papers/txt/gram2026_generative_recursive.txt new file mode 100644 index 0000000..d5f299d --- /dev/null +++ b/papers/txt/gram2026_generative_recursive.txt @@ -0,0 +1,1532 @@ + Generative Recursive Reasoning + + + Junyeob Baek1†∗ Mingyu Jo1†∗ Minsu Kim1,2 + + Mengye Ren3 Yoshua Bengio2,4 Sungjin Ahn1,3† +arXiv:2605.19376v2 [cs.AI] 20 May 2026 + + + + + 1 + KAIST 2 Mila – Québec AI Institute + 3 + New York University 4 Université de Montréal + + + + Abstract + How should future neural reasoning systems implement extended computation? + Recursive Reasoning Models (RRMs) offer a promising alternative to autoregres- + sive sequence extension by performing iterative latent-state refinement with shared + transition functions. Yet existing RRMs are largely deterministic, following a + single latent trajectory and converging to a single prediction. We introduce Gen- + erative Recursive reAsoning Models (GRAM), a framework that turns recursive + latent reasoning into probabilistic multi-trajectory computation. GRAM models + reasoning as a stochastic latent trajectory, enabling multiple hypotheses, alterna- + tive solution strategies, and inference-time scaling through both recursive depth + and parallel trajectory sampling. This yields a latent-variable generative model + supporting conditional reasoning via pθ (y | x) and, with fixed or absent inputs, + unconditional generation via pθ (x). Trained with amortized variational inference, + GRAM improves over deterministic recurrent and recursive baselines on structured + reasoning and multi-solution constraint satisfaction tasks, while demonstrating an + unconditional generation capability. https://ahn-ml.github.io/gram-website + + + 1 Introduction + A central question for future neural reasoning systems is how extended computation should be imple- + mented. Large autoregressive models typically scale reasoning by extending a sequence-generation + process, whether intermediate computation is expressed explicitly as chain-of-thought tokens or im- + plicitly in hidden or latent representations [1–6]. A complementary direction is explored by Recursive + Reasoning Models (RRMs), which use repeated computation to refine a persistent latent state rather + than to append new elements to an output or reasoning sequence [7–9]. This approach is appealing + because it decouples reasoning depth from both parameter scale and output length: a compact model + can perform many steps of internal computation by repeatedly applying shared transition functions + over time. + Recent recursive reasoning models such as HRM [8] and TRM [9] provide early evidence for the + potential of this approach in structured reasoning. Rather than producing a solution in a single + feedforward pass, they perform extended computation through iterative latent-state refinement, deep + supervision across refinement steps, and reasoning-oriented recurrent designs such as hierarchical + latent dynamics. These features make them well suited to problems requiring constraint propagation, + state tracking, iterative correction, and multi-step inference. More broadly, they build on a principle + ∗ Equal contribution + † Correspondence to: Junyeob Baek (wnsdlqjtm@kaist.ac.kr), Mingyu Jo (mingyu.jo@kaist.ac.kr), Sungjin Ahn + + (sungjin.ahn@kaist.ac.kr) + + + Preprint. + Solution 1, + + + Input Task, + + + + Solution 2, + + + + + (a) Deterministic RRMs (b) GRAM (Ours) + +Figure 1: Comparison of Latent Reasoning Trajectories. Left: N-Queens Example with two valid solutions. +Right: Given three independent runs for latent reasoning (τ1 , τ2 , τ3 ): (a) Prior RRMs (e.g. HRM, TRM) are +deterministic—all runs collapse to an identical trajectory, converging to a single solution and failing to explore +alternatives, while (b) GRAM explores diverse trajectories, producing diverse trajectories that reach multiple +valid solutions y1 and y2 , while naturally enabling parallel inference-time scaling. + +also explored in recurrent Transformer architectures such as Universal Transformers [10] and Looped +Transformers [7]: shared Transformer blocks can be repeatedly applied to increase computational +depth without increasing parameter count. Together, these models suggest that reasoning capability +can emerge not only from scaling model size or generating longer traces, but also from the organization +of computation itself. +While recurrent latent-state refinement provides an appealing mechanism for efficiently increasing +reasoning depth, depth alone is not sufficient for many reasoning problems. A capable reasoning +system should also be able to maintain uncertainty, consider alternative hypotheses, and explore +multiple possible solution strategies [11, 12]. This is especially important in settings where ambiguity +or multiple valid solutions are intrinsic, and more generally in problems where a single refinement +path may become trapped in a suboptimal reasoning trajectory. In this sense, future RRMs should be +not only deep, in the sense of repeated refinement, but also wide, in the sense of maintaining and +exploring multiple latent trajectories in parallel. +Existing RRMs [7–10], however, remain fundamentally deterministic: given the same input and +initialization, they follow a single latent trajectory and converge to a single prediction. This deter- +ministic recursion collapses the space of plausible reasoning paths into a single attractor, leaving +probabilistic multi-hypothesis latent reasoning largely unexplored within the RRM paradigm. This +motivates the central question of our work: can recursive latent computation support probabilistic, +generative, multi-hypothesis reasoning while preserving the efficiency of compact recurrent models? +In this paper, we propose Generative Recursive reAsoning Models (GRAM), a framework that turns +recursive latent reasoning into probabilistic multi-trajectory computation. GRAM treats the reasoning +process itself as a stochastic latent trajectory: at each recursion step, the model samples a transition +conditioned on the input and the current reasoning state, rather than deterministically updating to a +single next state. Repeating this process defines a distribution over possible reasoning trajectories, +allowing the model to maintain multiple hypotheses, explore alternative solution strategies, and scale +inference not only by increasing recursive depth but also by sampling trajectories in parallel. From +a probabilistic perspective, GRAM is a latent-variable generative model: it models pθ (y | x) by +marginalizing over latent reasoning trajectories, while the same recursive process can also define an +unconditional generative model pθ (x) when the input is fixed or absent. +We evaluate GRAM on controlled reasoning and generation tasks that serve as probes of the ar- +chitectural properties targeted by our formulation: recursive refinement, stochastic exploration, +multi-solution coverage, and inference-time scaling. Given this goal, our experiments focus on +comparisons with the most relevant deterministic recurrent and recursive latent reasoning baselines, +including Looped Transformers, HRM, and TRM, rather than frontier-scale general-purpose LLMs +whose training data, inference budgets, and external scaffolding are not directly comparable. Sudoku- +Extreme [8] and ARC-AGI [13, 14] test structured reasoning under hard constraints and abstract +transformations; N-Queens and Graph Coloring evaluate multi-solution recovery; and binarized +MNIST [15] probes the unconditional generative interpretation. +Our main contribution is to establish probabilistic multi-trajectory recursion as a design principle +for future recurrent and recursive reasoning architectures. Concretely, we make three contributions. + + + 2 + First, we formulate recursive reasoning as a latent-variable generative process, where solutions are +obtained by marginalizing over stochastic reasoning trajectories. Second, we introduce width-based +inference-time scaling, enabling inference to scale not only with recursive depth but also with the +number of sampled latent trajectories. Third, we provide empirical evidence that this formulation +yields the intended architectural advantages over deterministic recurrent and recursive baselines, +improving structured reasoning, multi-solution constraint satisfaction, and unconditional generation. + +2 Generative Recursive Reasoning Models +In this section, we introduce Generative Recursive reAsoning Models (GRAM), an instantiation +of probabilistic recursive reasoning. We describe the architecture in Section 2.1 and the training +procedure in Section 2.2, with an architecture schematic shown in Figure 2. + +2.1 Architecture + +Overview. GRAM models the conditional distri- 𝑦 (A) CE loss +bution pθ (y | x) by marginalizing over stochas- + Prior (B) KL Div. Posterior Decoder +tic latent reasoning trajectories. Given an input 𝑝𝜃 (⋅ |𝑢𝑡 ) 𝑞𝜙 (⋅ |𝑢𝑡 , 𝑦) ~ 𝜖𝑡 𝑓dec +x, GRAM first computes an embedding + ℎ𝑡−1 𝑓𝐻 𝑢𝑡 ℎ𝑡 + ex = fenc (x; θ), (1) + 𝐾 times +which is reused throughout the entire recursive 𝑙𝑡−1 𝑓𝐿 𝑓𝐿 𝑙𝑡 +computation. Starting from a fixed initial la- + Encoder +tent state z0 , the model evolves the latent state 𝑥 + 𝑓enc +through learned stochastic transitions. The re- +cursive computation is organized into two nested Figure 2: GRAM Architecture. A single stochastic +levels: inner and outer loops. latent transition in the hierarchical instantiation z = +At the inner level, a latent transition samples (h, l). After K low-level refinements via fL , the high- + level update fH produces a deterministic proposal ut , to +a new latent state conditioned on the previous which stochastic guidance ϵt is added: ht = ut + ϵt . +latent state and the input embedding, + zt ∼ pθ (zt | zt−1 , ex ), t = 1, . . . , T. (2) +At the end of the T transitions, the decoder produces a prediction, ŷ = arg max fdec (zT ; θ). We refer +to the sequence of T transitions from the initial state z0 to the final state zT as a supervision step. A +supervision step is the unit at which the decoder is invoked, and the training objective is applied, with +gradients computed as described in Section 2.2. +At the outer level, Nsup supervision steps are applied recursively, with the final state of one supervision +step serving as the initial state of the next, thereby forming the full recursive computation: + (1) T transitions (1) (2) T transitions T transitions (N ) + z0 −−−−−−−→ zT = z0 −−−−−−−→ · · · −−−−−−−→ zT sup , (3) + (n) (1) +where zt denotes the latent state at the t-th transition of the n-th supervision step, z0 is the +fixed initial state, and the terminal state of one supervision step serves as the initial state of the next + (n+1) (n) +(z0 := zT ). This abstract formulation can be instantiated with various recurrent Transformer +backbones, including flat designs such as Universal Transformers and Looped Transformers [10, 7], +as well as hierarchical designs such as HRM and TRM [8, 9]. +Stochastic Latent Transitions. Unlike prior recursive reasoning models (RRMs) that update the +latent state deterministically and follow a single fixed trajectory [8, 9], GRAM defines pθ (zt | +zt−1 , ex ) as a stochastic transition, so that repeated computation induces a distribution over latent +reasoning trajectories. Concretely, GRAM realizes this transition as a learned stochastic residual +perturbation around a deterministic update: at each transition, the model first computes a deterministic +update ut from zt−1 and ex , then samples a conditional perturbation from a state-dependent Gaussian, +and adds it to ut : + ϵt ∼ pθ (ϵt | ut ) := N µθ (ut ), σθ2 (ut )I , +  + (4) + zt = ut + ϵt . (5) + + + 3 + We refer to ϵt as the learnable stochastic guidance. The mean µθ (ut ) encodes a state-dependent +direction in which the trajectory is steered, while the variance σθ2 (ut ) controls the amount of ex- +ploration. This design allows GRAM to capture uncertainty, prevent convergence to local minima, +and support robust exploration of the solution space without discarding the deterministic refinement +performed by ut . +Hierarchical Instantiation. We instantiate the latent state with two interacting components, z = +(h, l). The high-level component h is updated once per latent transition and carries abstract reasoning +state, while the low-level component l is updated K times within a single transition and carries +fine-grained intermediate computation. This decomposition separates the two roles across time scales, +with h accumulating slowly across transitions and l refined rapidly within each one. +With this hierarchical multi-scale structure, a single transition zt−1 → zt is computed as follows. +The low-level component is first refined for K updates, with the high-level component held fixed: + lt,k = fL (ht−1 , lt,k−1 , ex ; θ), k = 1, . . . , K, (6) +where lt,0 := lt−1 and we write lt := lt,K for the refined low-level component. The high-level +component is then updated as a stochastic transition conditioned on the refined lt , + ut = fH (ht−1 , lt ; θ), (7) + ϵt ∼ pθ (ϵt | ut ) := N µθ (ut ), σθ2 (ut ) I +  + , (8) + ht = ut + ϵt , (9) +and we set zt = (ht , lt ). Note that stochasticity is introduced only at the high level: the low- +level refinement is fully deterministic, while the stochastic guidance signal ϵt acts on the slower, +more abstract component of the latent state, where it can steer the overall reasoning trajectory +across transitions3 . Under this instantiation, the decoder reads only the high-level component, i.e., +fdec (zT ) = fdec (hT ). Additional architectural details are provided in Appendix B.1. +Modeling Unconditional Distribution. While the description so far focuses on the conditional +setting pθ (y | x), the same recursive process can also be defined as an unconditional generative model +pθ (x) when the input is replaced with an empty conditioning embedding. We use this formulation for +generation tasks in Section 4.3. + +2.2 Training + +GRAM is trained to model the conditional distribution pθ (y | x), where each training example +consists of an input x and its corresponding target y. As a probabilistic model, GRAM adopts a +latent-variable formulation and is optimized by maximizing an evidence lower bound (ELBO) with +respect to the generative parameters θ and variational parameters ϕ. +Latent Variable Modeling. We model GRAM as a latent-variable probabilistic model pθ , where +the full latent trajectory τ = (z0 → · · · → zTTotal ) consists of a sequence of latent variables, with +TTotal = T × Nsup . The conditional likelihood is defined as + Z + pθ (y | x) = pθ (y | τ, x) pθ (τ | x) dτ, (10) + +where x denotes the input problem and y denotes the corresponding ground-truth output. +Direct maximum likelihood estimation of log pθ (y | x) is intractable due to the marginalization +over latent trajectories. We therefore introduce a variational posterior qϕ (τ | x, y) and optimize the +evidence lower bound (ELBO), jointly training θ and ϕ via variational inference: + log pθ (y | x) ≥ Eqϕ (τ |x,y) [log pθ (y | τ, x)] − KL(qϕ (τ | x, y) ∥ pθ (τ | x)) . (11) + +During training, latent trajectories are sampled from the variational posterior qϕ (· | x, y), which has +access to both the input problem x and the target output y. At inference time, where y is unavailable, +trajectories are instead generated from the learned prior pθ (· | x). + 3We also tried injecting noise into the low-level state, but found that it did not improve performance. + + + + + 4 + Both the prior and the posterior are modeled as conditional Markov processes over latent states: + TY + Total TY + Total + + pθ (τ | x) = p(z0 ) pθ (zt | zt−1 , x), qϕ (τ | x, y) = p(z0 ) qϕ (zt | zt−1 , x, y). (12) + t=1 t=1 +Here, z0 is a fixed initial state shared by the prior and posterior. Both transitions are implemented by +adding reparameterized Gaussian noise ϵt after a deterministic update ut ; the posterior uses the same +transition module as the prior, but samples from a target-conditioned noise distribution qϕ (ϵt | ut , y), +whereas the prior uses pθ (ϵt | ut ). +Since the two processes share the same Markov structure and all stochasticity is introduced through +ϵ1:TTotal , their trajectory distributions can be equivalently represented in noise space. Moreover, +since GRAM decodes the output only from the terminal latent state, the likelihood term satisfies +pθ (y | τ, x) = pθ (y | zTTotal , x). Therefore, the full trajectory-level ELBO can be written as +   TX Total h i + LELBO = Eqϕ log pθ (y | zTTotal , x) − Eqϕ (ϵ