From 2274de15fcffb11d302ddcac9fabe5fc0e26ed47 Mon Sep 17 00:00:00 2001
From: YurenHao0426 <Blackhao0426@gmail.com>
Date: Wed, 18 Mar 2026 18:27:48 -0500
Subject: Add README with model and dataset links

---
 README.md | 81 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)
 create mode 100644 README.md

diff --git a/README.md b/README.md
new file mode 100644
index 0000000..74fd9e5
--- /dev/null
+++ b/README.md
@@ -0,0 +1,81 @@
+# VARS: Vector-Augmented Retrieval System for Personalized LLM Assistants
+
+VARS is a personalization framework that enables LLM assistants to learn and adapt to individual user preferences over multi-session interactions. It combines **dense retrieval**, **reranking**, and **REINFORCE-based user vector learning** to deliver personalized responses without explicit user configuration.
+
+## Architecture
+
+```
+User Query ──► Preference Retrieval (Dense + Rerank) ──► Augmented Prompt ──► LLM Response
+                       ▲                                                          │
+                       │                                                          ▼
+                 User Vector ◄──── REINFORCE Update ◄──── Implicit Feedback Signal
+                       ▲
+                       │
+              Preference Extractor ◄──── Conversation History
+```
+
+### Core Components
+
+| Module | Description |
+|--------|-------------|
+| `serving/personalized_llm.py` | Main inference interface (`chat()`, `chat_prepare()`, `chat_complete()`) |
+| `models/llm/vllm_chat.py` | vLLM HTTP client for high-throughput batched inference |
+| `models/embedding/qwen3_8b.py` | Dense embedding (Qwen3-Embedding-8B) |
+| `models/reranker/qwen3_reranker.py` | Cross-encoder reranking (Qwen3-Reranker-8B) |
+| `models/preference_extractor/` | Online preference extraction from conversation |
+| `retrieval/pipeline.py` | RAG retrieval pipeline with FAISS vector store |
+| `user_model/policy/reinforce.py` | REINFORCE policy for user vector optimization |
+| `feedback/` | Reward model (keyword / LLM judge) and online RL updates |
+
+## Models
+
+### Preference Extractor
+
+We fine-tuned a Qwen3-0.6B model for structured preference extraction from conversational context.
+
+- **Model**: [blackhao0426/pref-extractor-qwen3-0.6b-full-sft](https://huggingface.co/blackhao0426/pref-extractor-qwen3-0.6b-full-sft)
+- **Training Data**: [blackhao0426/user-preference-564k](https://huggingface.co/datasets/blackhao0426/user-preference-564k) (564K examples of user preference extraction)
+
+The extractor takes conversation turns as input and outputs structured `{condition, action, confidence}` preference tuples.
+
+### Other Models Used
+
+| Role | Model |
+|------|-------|
+| Agent LLM | LLaMA-3.1-8B-Instruct (via vLLM) |
+| Dense Embedding | Qwen3-Embedding-8B |
+| Reranker | Qwen3-Reranker-8B |
+| Reward Judge | LLaMA-3.1-8B-Instruct or GPT-4o-mini |
+
+## Installation
+
+```bash
+pip install -e .
+```
+
+### Requirements
+
+- Python >= 3.10
+- PyTorch >= 2.3.0
+- Transformers >= 4.44.0
+
+## Usage
+
+```python
+from personalization.serving import PersonalizedLLM
+
+llm = PersonalizedLLM.from_config("configs/local_models.yaml")
+
+# Multi-turn personalized chat
+response = llm.chat(user_id="user_001", query="Explain quicksort")
+
+# The system automatically:
+# 1. Extracts preferences from conversation history
+# 2. Retrieves relevant preferences via dense retrieval + reranking
+# 3. Augments the prompt with personalized context
+# 4. Updates user vector from implicit feedback (REINFORCE)
+```
+
+## License
+
+Apache-2.0
-- 
cgit v1.2.3