summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/abstract_revised.md58
1 files changed, 58 insertions, 0 deletions
diff --git a/docs/abstract_revised.md b/docs/abstract_revised.md
new file mode 100644
index 0000000..37bc43a
--- /dev/null
+++ b/docs/abstract_revised.md
@@ -0,0 +1,58 @@
+# Revised LaTeX: Abstract
+
+## Changes from original
+
+1. "math-hard benchmark" → 3 task domains
+2. "57.0% → 71.0%" → actual numbers (55.2% best, significance claims)
+3. "halving tokens (1384→689)" → 16.9% reduction in user effort (p=0.021)
+4. "correlates strongly" → significant quartile test (p=0.006)
+5. "small-scale pilot" → 60 profiles × 60 sessions
+6. Added dual-vector finding
+
+---
+
+## Revised LaTeX
+
+```latex
+\begin{abstract}
+Large language models (LLMs) are increasingly deployed as
+conversational agents, yet most systems still treat each session
+as an isolated event and lack an explicit user representation
+that can be updated over time.
+At the same time, per-user fine-tuning or RLHF is often
+impractical in real deployments.
+We present a frozen-backbone user modeling framework that
+represents each user as a low-dimensional dual vector
+(long-term and short-term) in a global preference space,
+updated online from weak rewards.
+The framework decouples preference extraction, user-state
+representation, and downstream personalization: preferences are
+extracted as condition--action rules, stored in a
+retrieval-augmented memory, and aggregated into a user vector
+that modulates retrieval scores.
+Feedback from interaction---explicit or implicit---is mapped to
+a scalar reward that drives REINFORCE-style updates of the user
+vector, while keeping all backbone models (chat LLM, embedding,
+reranker) frozen.
+
+We instantiate this framework on the
+\textsc{MultiSessionCollab} benchmark across three task domains
+(math-hard, math-500, bigcodebench) with $60$ user profiles and
+$60$ sessions per profile.
+Our RAG+Vector agent achieves the highest task success rate
+($55.2\%$) among six system modes and significantly reduces
+interaction friction compared to a Reflection baseline:
+timeout rate drops by $2.4$ percentage points ($p = 0.046$)
+and user effort by $6.7\%$ ($p = 0.021$), yielding the highest
+interaction efficiency ($2.83$ successes per $1{,}000$ user
+tokens).
+Analysis of the learned user vectors confirms that the
+dual-vector design separates stable user identity from
+session-specific context: long-term vectors significantly
+associate with cross-user preference overlap ($p = 0.006$),
+while short-term vectors do not ($p = 0.586$).
+We discuss limitations of this simulator-based study and
+outline directions for scaling to more users, richer preference
+types, and stronger feedback signals.
+\end{abstract}
+```