blob: 74fd9e553d810c78afe7fe144c2babfa5de3f81a (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
|
# VARS: Vector-Augmented Retrieval System for Personalized LLM Assistants
VARS is a personalization framework that enables LLM assistants to learn and adapt to individual user preferences over multi-session interactions. It combines **dense retrieval**, **reranking**, and **REINFORCE-based user vector learning** to deliver personalized responses without explicit user configuration.
## Architecture
```
User Query ──► Preference Retrieval (Dense + Rerank) ──► Augmented Prompt ──► LLM Response
▲ │
│ ▼
User Vector ◄──── REINFORCE Update ◄──── Implicit Feedback Signal
▲
│
Preference Extractor ◄──── Conversation History
```
### Core Components
| Module | Description |
|--------|-------------|
| `serving/personalized_llm.py` | Main inference interface (`chat()`, `chat_prepare()`, `chat_complete()`) |
| `models/llm/vllm_chat.py` | vLLM HTTP client for high-throughput batched inference |
| `models/embedding/qwen3_8b.py` | Dense embedding (Qwen3-Embedding-8B) |
| `models/reranker/qwen3_reranker.py` | Cross-encoder reranking (Qwen3-Reranker-8B) |
| `models/preference_extractor/` | Online preference extraction from conversation |
| `retrieval/pipeline.py` | RAG retrieval pipeline with FAISS vector store |
| `user_model/policy/reinforce.py` | REINFORCE policy for user vector optimization |
| `feedback/` | Reward model (keyword / LLM judge) and online RL updates |
## Models
### Preference Extractor
We fine-tuned a Qwen3-0.6B model for structured preference extraction from conversational context.
- **Model**: [blackhao0426/pref-extractor-qwen3-0.6b-full-sft](https://huggingface.co/blackhao0426/pref-extractor-qwen3-0.6b-full-sft)
- **Training Data**: [blackhao0426/user-preference-564k](https://huggingface.co/datasets/blackhao0426/user-preference-564k) (564K examples of user preference extraction)
The extractor takes conversation turns as input and outputs structured `{condition, action, confidence}` preference tuples.
### Other Models Used
| Role | Model |
|------|-------|
| Agent LLM | LLaMA-3.1-8B-Instruct (via vLLM) |
| Dense Embedding | Qwen3-Embedding-8B |
| Reranker | Qwen3-Reranker-8B |
| Reward Judge | LLaMA-3.1-8B-Instruct or GPT-4o-mini |
## Installation
```bash
pip install -e .
```
### Requirements
- Python >= 3.10
- PyTorch >= 2.3.0
- Transformers >= 4.44.0
## Usage
```python
from personalization.serving import PersonalizedLLM
llm = PersonalizedLLM.from_config("configs/local_models.yaml")
# Multi-turn personalized chat
response = llm.chat(user_id="user_001", query="Explain quicksort")
# The system automatically:
# 1. Extracts preferences from conversation history
# 2. Retrieves relevant preferences via dense retrieval + reranking
# 3. Augments the prompt with personalized context
# 4. Updates user vector from implicit feedback (REINFORCE)
```
## License
Apache-2.0
|