summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-03-18 18:48:39 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-03-18 18:48:39 -0500
commitf5b8fe7275fd5c0b41d1e50034424865fe906564 (patch)
tree72504aad3ec022e5076aeef595d90296c1fde2ba
parent2274de15fcffb11d302ddcac9fabe5fc0e26ed47 (diff)
Add paper and update README with full method description and resultsHEADmaster
-rw-r--r--README.md112
-rw-r--r--paper/paper.pdfbin0 -> 1561212 bytes
2 files changed, 84 insertions, 28 deletions
diff --git a/README.md b/README.md
index 74fd9e5..534e390 100644
--- a/README.md
+++ b/README.md
@@ -1,20 +1,61 @@
-# VARS: Vector-Augmented Retrieval System for Personalized LLM Assistants
+# VARS: Vector-Adapted Retrieval Scoring for Personalized LLM Assistants
-VARS is a personalization framework that enables LLM assistants to learn and adapt to individual user preferences over multi-session interactions. It combines **dense retrieval**, **reranking**, and **REINFORCE-based user vector learning** to deliver personalized responses without explicit user configuration.
+> **User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction**
+>
+> Yuren Hao, Shuhaib Mehri, ChengXiang Zhai, Dilek Hakkani-Tür
+>
+> University of Illinois at Urbana-Champaign
+>
+> [[Paper]](paper/paper.pdf)
+
+## Overview
+
+Large language models are increasingly used as conversational collaborators, yet most lack a persistent user model, forcing users to repeatedly restate preferences across sessions. **VARS** (Vector-Adapted Retrieval Scoring) is a pipeline-agnostic, frozen-backbone framework that represents each user with long-term and short-term vectors in a shared preference space and uses these vectors to bias retrieval scoring over structured preference memory. The vectors are updated online from weak scalar feedback, enabling personalization without per-user fine-tuning.
+
+### Key Idea
+
+At each turn, the system:
+1. **Extracts** structured (condition, action) preferences from dialogue via a lightweight finetuned model
+2. **Stores** preferences as memory cards with dense embeddings in a FAISS index
+3. **Retrieves** relevant preferences via dense search + cross-encoder reranking, biased by a user-specific vector bonus
+4. **Updates** dual user vectors (long-term + short-term) online via REINFORCE from keyword-based reward signals
+
+The effective user vector at turn *t* combines stable cross-session identity with transient within-session context:
+
+```
+z_eff = β_L · z_long + β_S · z_short
+```
+
+## Results
+
+Evaluated on [MultiSessionCollab](https://github.com/shmehri/MultiSessionCollab) (60 profiles × 60 sessions, 3,600 sessions per method) across math and code tasks:
+
+| Method | Success (%) ↑ | Timeout (%) ↓ | User tokens ↓ |
+|--------|:---:|:---:|:---:|
+| Vanilla | 54.3 | 29.2 | 232.9 |
+| Contextual | 52.4 | 31.4 | 213.7 |
+| All-memory | 50.9 | 33.4 | 226.8 |
+| Reflection | 54.4 | 28.8 | 207.5 |
+| RAG | 52.0 | 44.3 | **188.4** |
+| **VARS** | **55.2** | **26.4** | 193.6 |
+
+VARS achieves the strongest overall performance, matches Reflection in task success while significantly reducing timeout rate (-2.4 pp, *p*=0.046) and user effort (-13.9 tokens, *p*=0.021). The learned long-term vectors align with cross-user preference overlap (*p*=0.006), while short-term vectors capture session-specific adaptation.
## Architecture
```
-User Query ──► Preference Retrieval (Dense + Rerank) ──► Augmented Prompt ──► LLM Response
- ▲ │
- │ ▼
- User Vector ◄──── REINFORCE Update ◄──── Implicit Feedback Signal
- ▲
- │
- Preference Extractor ◄──── Conversation History
+User Query u_t ──► Dense Retrieval ──► Reranker ──► User-Aware Scoring ──► Top-J notes ──► LLM Response
+ │ s(u,m;U) = s_0 + ⟨z_eff, v_m⟩
+ │ ▲
+ Preference User Vector
+ Memory (FAISS) z_eff = β_L·z_L + β_S·z_S
+ ▲ ▲
+ │ REINFORCE Update
+ Preference Extractor from keyword reward r̂_t
+ M_ext (Qwen3-0.6B)
```
-### Core Components
+### Core Modules
| Module | Description |
|--------|-------------|
@@ -22,30 +63,34 @@ User Query ──► Preference Retrieval (Dense + Rerank) ──► Augmented P
| `models/llm/vllm_chat.py` | vLLM HTTP client for high-throughput batched inference |
| `models/embedding/qwen3_8b.py` | Dense embedding (Qwen3-Embedding-8B) |
| `models/reranker/qwen3_reranker.py` | Cross-encoder reranking (Qwen3-Reranker-8B) |
-| `models/preference_extractor/` | Online preference extraction from conversation |
-| `retrieval/pipeline.py` | RAG retrieval pipeline with FAISS vector store |
-| `user_model/policy/reinforce.py` | REINFORCE policy for user vector optimization |
-| `feedback/` | Reward model (keyword / LLM judge) and online RL updates |
+| `models/preference_extractor/` | Lightweight preference extraction from conversation |
+| `retrieval/pipeline.py` | RAG retrieval pipeline with FAISS vector store and PCA item space |
+| `user_model/policy/reinforce.py` | REINFORCE policy for dual user-vector optimization |
+| `feedback/reward_model.py` | Keyword-based reward heuristic |
+| `feedback/handlers.py` | Retrieval-attribution gating and online RL updates |
-## Models
+## Models and Data
### Preference Extractor
-We fine-tuned a Qwen3-0.6B model for structured preference extraction from conversational context.
+A 0.6B-parameter Qwen3 model finetuned for structured preference extraction. Given a dialogue window, it outputs JSON preference tuples `{condition, action, confidence}`.
- **Model**: [blackhao0426/pref-extractor-qwen3-0.6b-full-sft](https://huggingface.co/blackhao0426/pref-extractor-qwen3-0.6b-full-sft)
-- **Training Data**: [blackhao0426/user-preference-564k](https://huggingface.co/datasets/blackhao0426/user-preference-564k) (564K examples of user preference extraction)
+- **Training Data**: [blackhao0426/user-preference-564k](https://huggingface.co/datasets/blackhao0426/user-preference-564k) — 564K examples constructed from public chat logs (LMSYS-Chat, WildChat), instruction-tuning corpora (Alpaca, SlimOrca), and GPT-5.1-labeled preference JSON
-The extractor takes conversation turns as input and outputs structured `{condition, action, confidence}` preference tuples.
+On a held-out set, the extractor achieves 99.7% JSON validity and 97.5% recall at 37.7% precision (intentionally high-recall; downstream reranker and user vector filter irrelevant cards).
-### Other Models Used
+### Backbone Models
-| Role | Model |
-|------|-------|
-| Agent LLM | LLaMA-3.1-8B-Instruct (via vLLM) |
-| Dense Embedding | Qwen3-Embedding-8B |
-| Reranker | Qwen3-Reranker-8B |
-| Reward Judge | LLaMA-3.1-8B-Instruct or GPT-4o-mini |
+| Role | Model | Parameters |
+|------|-------|------------|
+| Agent LLM | Llama-3.1-8B-Instruct (via vLLM) | 8B |
+| User Simulator | Llama-3.3-70B-Instruct (via vLLM) | 70B |
+| Dense Embedding | Qwen3-Embedding-8B | 8B |
+| Reranker | Qwen3-Reranker-8B | 8B |
+| Preference Extractor | Qwen3-0.6B (finetuned) | 0.6B |
+
+All backbone components are kept frozen; online adaptation occurs only through the per-user vectors.
## Installation
@@ -59,7 +104,7 @@ pip install -e .
- PyTorch >= 2.3.0
- Transformers >= 4.44.0
-## Usage
+## Quick Start
```python
from personalization.serving import PersonalizedLLM
@@ -72,8 +117,19 @@ response = llm.chat(user_id="user_001", query="Explain quicksort")
# The system automatically:
# 1. Extracts preferences from conversation history
# 2. Retrieves relevant preferences via dense retrieval + reranking
-# 3. Augments the prompt with personalized context
-# 4. Updates user vector from implicit feedback (REINFORCE)
+# 3. Adds user-vector bonus to retrieval scores
+# 4. Augments the LLM prompt with top-ranked preference notes
+# 5. Updates user vectors from implicit feedback (REINFORCE)
+```
+
+## Citation
+
+```bibtex
+@article{hao2025vars,
+ title={User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction},
+ author={Hao, Yuren and Mehri, Shuhaib and Zhai, ChengXiang and Hakkani-T{\"u}r, Dilek},
+ year={2025}
+}
```
## License
diff --git a/paper/paper.pdf b/paper/paper.pdf
new file mode 100644
index 0000000..611e99a
--- /dev/null
+++ b/paper/paper.pdf
Binary files differ