diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-03-18 18:27:48 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-03-18 18:27:48 -0500 |
| commit | 2274de15fcffb11d302ddcac9fabe5fc0e26ed47 (patch) | |
| tree | 26c31cca39f5e7b30f7bfe9f754f3ac4de95e981 | |
| parent | b6c3e4e51eeab703b40284459c6e9fff2151216c (diff) | |
Add README with model and dataset links
| -rw-r--r-- | README.md | 81 |
1 files changed, 81 insertions, 0 deletions
diff --git a/README.md b/README.md new file mode 100644 index 0000000..74fd9e5 --- /dev/null +++ b/README.md @@ -0,0 +1,81 @@ +# VARS: Vector-Augmented Retrieval System for Personalized LLM Assistants + +VARS is a personalization framework that enables LLM assistants to learn and adapt to individual user preferences over multi-session interactions. It combines **dense retrieval**, **reranking**, and **REINFORCE-based user vector learning** to deliver personalized responses without explicit user configuration. + +## Architecture + +``` +User Query ──► Preference Retrieval (Dense + Rerank) ──► Augmented Prompt ──► LLM Response + ▲ │ + │ ▼ + User Vector ◄──── REINFORCE Update ◄──── Implicit Feedback Signal + ▲ + │ + Preference Extractor ◄──── Conversation History +``` + +### Core Components + +| Module | Description | +|--------|-------------| +| `serving/personalized_llm.py` | Main inference interface (`chat()`, `chat_prepare()`, `chat_complete()`) | +| `models/llm/vllm_chat.py` | vLLM HTTP client for high-throughput batched inference | +| `models/embedding/qwen3_8b.py` | Dense embedding (Qwen3-Embedding-8B) | +| `models/reranker/qwen3_reranker.py` | Cross-encoder reranking (Qwen3-Reranker-8B) | +| `models/preference_extractor/` | Online preference extraction from conversation | +| `retrieval/pipeline.py` | RAG retrieval pipeline with FAISS vector store | +| `user_model/policy/reinforce.py` | REINFORCE policy for user vector optimization | +| `feedback/` | Reward model (keyword / LLM judge) and online RL updates | + +## Models + +### Preference Extractor + +We fine-tuned a Qwen3-0.6B model for structured preference extraction from conversational context. + +- **Model**: [blackhao0426/pref-extractor-qwen3-0.6b-full-sft](https://huggingface.co/blackhao0426/pref-extractor-qwen3-0.6b-full-sft) +- **Training Data**: [blackhao0426/user-preference-564k](https://huggingface.co/datasets/blackhao0426/user-preference-564k) (564K examples of user preference extraction) + +The extractor takes conversation turns as input and outputs structured `{condition, action, confidence}` preference tuples. + +### Other Models Used + +| Role | Model | +|------|-------| +| Agent LLM | LLaMA-3.1-8B-Instruct (via vLLM) | +| Dense Embedding | Qwen3-Embedding-8B | +| Reranker | Qwen3-Reranker-8B | +| Reward Judge | LLaMA-3.1-8B-Instruct or GPT-4o-mini | + +## Installation + +```bash +pip install -e . +``` + +### Requirements + +- Python >= 3.10 +- PyTorch >= 2.3.0 +- Transformers >= 4.44.0 + +## Usage + +```python +from personalization.serving import PersonalizedLLM + +llm = PersonalizedLLM.from_config("configs/local_models.yaml") + +# Multi-turn personalized chat +response = llm.chat(user_id="user_001", query="Explain quicksort") + +# The system automatically: +# 1. Extracts preferences from conversation history +# 2. Retrieves relevant preferences via dense retrieval + reranking +# 3. Augments the prompt with personalized context +# 4. Updates user vector from implicit feedback (REINFORCE) +``` + +## License + +Apache-2.0 |
