From 1956aed8bc8a72355adbe9f1d16ea678d67f214c Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Wed, 11 Feb 2026 03:09:48 +0000 Subject: Add revised abstract LaTeX MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Updated all claims to match actual experiments: - 3 domains, 60 profiles × 60 sessions - 55.2% success (not 71%), significance tests - 16.9% user effort reduction (not "halving") - Dual-vector separation (z_long p=0.006, z_short p=0.586) Co-Authored-By: Claude Opus 4.6 --- docs/abstract_revised.md | 58 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 58 insertions(+) create mode 100644 docs/abstract_revised.md (limited to 'docs') diff --git a/docs/abstract_revised.md b/docs/abstract_revised.md new file mode 100644 index 0000000..37bc43a --- /dev/null +++ b/docs/abstract_revised.md @@ -0,0 +1,58 @@ +# Revised LaTeX: Abstract + +## Changes from original + +1. "math-hard benchmark" → 3 task domains +2. "57.0% → 71.0%" → actual numbers (55.2% best, significance claims) +3. "halving tokens (1384→689)" → 16.9% reduction in user effort (p=0.021) +4. "correlates strongly" → significant quartile test (p=0.006) +5. "small-scale pilot" → 60 profiles × 60 sessions +6. Added dual-vector finding + +--- + +## Revised LaTeX + +```latex +\begin{abstract} +Large language models (LLMs) are increasingly deployed as +conversational agents, yet most systems still treat each session +as an isolated event and lack an explicit user representation +that can be updated over time. +At the same time, per-user fine-tuning or RLHF is often +impractical in real deployments. +We present a frozen-backbone user modeling framework that +represents each user as a low-dimensional dual vector +(long-term and short-term) in a global preference space, +updated online from weak rewards. +The framework decouples preference extraction, user-state +representation, and downstream personalization: preferences are +extracted as condition--action rules, stored in a +retrieval-augmented memory, and aggregated into a user vector +that modulates retrieval scores. +Feedback from interaction---explicit or implicit---is mapped to +a scalar reward that drives REINFORCE-style updates of the user +vector, while keeping all backbone models (chat LLM, embedding, +reranker) frozen. + +We instantiate this framework on the +\textsc{MultiSessionCollab} benchmark across three task domains +(math-hard, math-500, bigcodebench) with $60$ user profiles and +$60$ sessions per profile. +Our RAG+Vector agent achieves the highest task success rate +($55.2\%$) among six system modes and significantly reduces +interaction friction compared to a Reflection baseline: +timeout rate drops by $2.4$ percentage points ($p = 0.046$) +and user effort by $6.7\%$ ($p = 0.021$), yielding the highest +interaction efficiency ($2.83$ successes per $1{,}000$ user +tokens). +Analysis of the learned user vectors confirms that the +dual-vector design separates stable user identity from +session-specific context: long-term vectors significantly +associate with cross-user preference overlap ($p = 0.006$), +while short-term vectors do not ($p = 0.586$). +We discuss limitations of this simulator-based study and +outline directions for scaling to more users, richer preference +types, and stronger feedback signals. +\end{abstract} +``` -- cgit v1.2.3