# Revised LaTeX: Abstract ## Changes from original 1. "math-hard benchmark" → 3 task domains 2. "57.0% → 71.0%" → actual numbers (55.2% best, significance claims) 3. "halving tokens (1384→689)" → 16.9% reduction in user effort (p=0.021) 4. "correlates strongly" → significant quartile test (p=0.006) 5. "small-scale pilot" → 60 profiles × 60 sessions 6. Added dual-vector finding --- ## Revised LaTeX ```latex \begin{abstract} Large language models (LLMs) are increasingly deployed as conversational agents, yet most systems still treat each session as an isolated event and lack an explicit user representation that can be updated over time. At the same time, per-user fine-tuning or RLHF is often impractical in real deployments. We present a frozen-backbone user modeling framework that represents each user as a low-dimensional dual vector (long-term and short-term) in a global preference space, updated online from weak rewards. The framework decouples preference extraction, user-state representation, and downstream personalization: preferences are extracted as condition--action rules, stored in a retrieval-augmented memory, and aggregated into a user vector that modulates retrieval scores. Feedback from interaction---explicit or implicit---is mapped to a scalar reward that drives REINFORCE-style updates of the user vector, while keeping all backbone models (chat LLM, embedding, reranker) frozen. We instantiate this framework on the \textsc{MultiSessionCollab} benchmark across three task domains (math-hard, math-500, bigcodebench) with $60$ user profiles and $60$ sessions per profile. Our RAG+Vector agent achieves the highest task success rate ($55.2\%$) among six system modes and significantly reduces interaction friction compared to a Reflection baseline: timeout rate drops by $2.4$ percentage points ($p = 0.046$) and user effort by $6.7\%$ ($p = 0.021$), yielding the highest interaction efficiency ($2.83$ successes per $1{,}000$ user tokens). Analysis of the learned user vectors confirms that the dual-vector design separates stable user identity from session-specific context: long-term vectors significantly associate with cross-user preference overlap ($p = 0.006$), while short-term vectors do not ($p = 0.586$). We discuss limitations of this simulator-based study and outline directions for scaling to more users, richer preference types, and stronger feedback signals. \end{abstract} ```