# Revised LaTeX: Conclusion ## Changes from original 1. All numbers updated (54.3%→55.2%, not 57%→71%; 60 profiles not 5; 3 datasets not 1) 2. Token claim corrected: 16.9% reduction, not "halving" 3. User-vector correlation updated: modest ρ but significant quartile test 4. Scale description updated: 60 profiles from 200, 60 sessions 5. Future work updated: ablation underway, LLM judge ready, 200 profiles target --- ## Revised LaTeX ```latex \section{Conclusion} \label{sec:conclusion} We have presented a frozen-backbone personalization framework that combines structured preference extraction, a retrieval-augmented preference memory, and online user vectors updated via lightweight reinforcement learning. Rather than fine-tuning the base LLM for each user, the framework keeps all backbone models (chat, embedding, reranker) fixed and introduces personalization only through an external layer that represents user preferences as condition--action rules and aggregates them into a low-dimensional user vector. A key design choice is to treat feedback as a scalar reward, making the user-vector update agnostic to the specific feedback channel and, in principle, compatible with both explicit and implicit signals. We instantiated this framework on the \textsc{MultiSessionCollab} benchmark across three task domains (math-hard, math-500, bigcodebench) with an LLM-based user simulator and style-oriented user profiles. Under a frozen 8B backbone evaluated over $60$ profiles and $60$ sessions ($3{,}600$ sessions per method), our \textbf{RAG+Vector} method achieves the highest task success rate ($55.2\%$) among all six system modes, significantly reduces timeout rate ($26.4\%$ vs.\ $28.8\%$, $p = 0.046$) and user effort ($193.6$ vs.\ $207.5$ tokens, $p = 0.021$) compared to the Reflection baseline, and attains the highest interaction efficiency ($2.83$ successes per $1{,}000$ user tokens). Analysis of the learned user vectors confirms that the dual-vector design separates stable user identity from session-specific context: long-term vectors $z^{(L)}$ significantly associate with preference overlap across users ($p = 0.006$), while short-term vectors $z^{(S)}$ do not ($p = 0.586$), consistent with the intended division of labor. Our study evaluates on $60$ profiles drawn from a pool of $200$, focuses on style preferences, and compares against a prompted Reflection baseline rather than the GRPO-trained agents from the original \textsc{MultiSessionCollab} paper. Several extensions are underway or planned: component ablations that isolate the contribution of each user-vector component, deployment of an LLM-based reward model to replace the current keyword heuristic, and scaling experiments to all $200$ profiles. We view this work as a case study demonstrating that lightweight user modeling on top of RAG---without any per-user fine-tuning of the backbone---can yield measurable efficiency gains in online, multi-session personalization, and as a starting point for larger-scale investigations of long-term adaptation with frozen LLM backbones. ```