From c06ec2f3b80f8968f09eb801b69237495b055ec1 Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Tue, 27 Jan 2026 10:08:01 -0600 Subject: add CLAUDE.md --- CLAUDE.md | 295 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 295 insertions(+) create mode 100644 CLAUDE.md diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 0000000..8819e73 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,295 @@ +# CLAUDE.md + +This file provides guidance to Claude Code when working with this repository. + +--- + +## Project Overview + +**Personalization User Model** is a research project for **personalized LLM assistants** that learn and adapt to individual user preferences over multi-session interactions. + +### Research Goal +Build an AI assistant that: +1. **Extracts** user preferences from conversation (e.g., "I prefer bullet points", "show me step-by-step math") +2. **Stores** preferences in a retrieval-augmented memory system +3. **Retrieves** relevant preferences using dense retrieval + reranking +4. **Adapts** responses using a learned user vector (REINFORCE-based RL) +5. **Improves** over multiple sessions without explicit user configuration + +### Key Innovation +Unlike static preference profiles, this system uses: +- **Online preference extraction** from natural conversation +- **Policy-based memory retrieval** with user-specific vectors +- **REINFORCE updates** from implicit user feedback (enforcement signals) + +--- + +## Repository Structure + +``` +personalization-user-model/ +├── src/personalization/ # Core personalization library +│ ├── serving/ # Main PersonalizedLLM class +│ │ └── personalized_llm.py # Primary inference interface +│ ├── models/ # LLM backends (vLLM, local) +│ │ └── llm/vllm_chat.py # vLLM HTTP client +│ ├── retrieval/ # Dense retrieval + reranking +│ ├── feedback/ # REINFORCE reward processing +│ ├── user_model/ # User vector learning +│ └── evaluation/ # Metrics and analysis +│ +├── collaborativeagents/ # Experiment framework (MULTISESSIONCOLLAB) +│ ├── adapters/ # Method adapters for experiments +│ │ ├── personalized_llm_adapter.py # RAG methods +│ │ ├── contextual_adapter.py # Full-context baseline +│ │ └── reflection_adapter.py # CollaborativeAgents baseline +│ ├── agents/ # Batch processing clients +│ │ └── batch_vllm_agent.py # Async vLLM/OpenAI batching +│ ├── scripts/ # Experiment runners +│ │ └── run_experiments.py # Main experiment script +│ ├── slurm/ # HPC job scripts +│ │ └── fullscale/ # Full-scale experiment jobs +│ ├── data/ # User profiles and datasets +│ │ └── complex_profiles_v2/ # 200 user profiles (43 prefs each) +│ └── results/ # Experiment outputs +│ +├── models/ # Downloaded HuggingFace models (not in git) +│ ├── llama-3.1-8b-instruct/ # Agent LLM +│ ├── qwen3-embedding-8b/ # Dense embeddings +│ └── rerankers/ # Qwen3 8B reranker +│ +├── data/ # Datasets and corpora (not in git) +│ ├── corpora/ # Memory card storage +│ └── users/ # User state persistence +│ +└── LLaMA-Factory/ # External fine-tuning toolkit +``` + +--- + +## Methods (Baselines) + +The experiment compares **6 methods** for multi-session personalization: + +| Method | Description | Memory Type | +|--------|-------------|-------------| +| `vanilla` | No memory, pure LLM | None | +| `contextual` | Full conversation history in context | In-context | +| `reflection` | Session-level reflection → agent_notes | Summarized notes | +| `all_memory` | All extracted preferences in context | All memories | +| `rag` | Dense retrieval + reranking (no user vector) | Retrieved top-k | +| `rag_vector` | RAG + learned user vector (proposed) | Retrieved + personalized | + +--- + +## Setup + +### 1. Environment +```bash +cd /projects/bfqt/users/yurenh2/ml-projects/personalization-user-model +source /u/yurenh2/miniforge3/etc/profile.d/conda.sh +conda activate eval + +export PYTHONPATH="${PWD}/src:${PWD}/collaborativeagents:${PYTHONPATH}" +export HF_HOME=/projects/bfqt/users/yurenh2/hf_cache/huggingface +``` + +### 2. Environment Variables +Create `.env` file with: +```bash +OPENAI_API_KEY=sk-... # For GPT user simulator +HF_TOKEN=hf_... # For gated models +``` + +### 3. Models Required +- **Agent LLM**: `models/llama-3.1-8b-instruct/` (local) or vLLM server +- **User Simulator**: 70B model via vLLM or OpenAI API (gpt-5-mini) +- **Embeddings**: `models/qwen3-embedding-8b/` (for RAG methods) +- **Reranker**: `models/rerankers/` or Qwen3-8B (for RAG methods) + +### 4. Storage Locations +| Path | Quota | Usage | +|------|-------|-------| +| `/projects/bfqt` | 500GB (soft) | Code, models, results | +| `/work/hdd/bfqt` | 1TB | Overflow storage, large checkpoints | +| `/work/nvme/bfqt` | 500GB | Fast scratch (temporary) | + +--- + +## Running Experiments + +### Quick Test (2 profiles, 2 sessions) +```bash +cd collaborativeagents/scripts +python run_experiments.py \ + --methods vanilla \ + --datasets math-hard \ + --n-profiles 2 \ + --n-sessions 2 \ + --max-turns 8 \ + --use-vllm \ + --vllm-agent-url http://localhost:8003/v1 \ + --parallel-profiles 2 \ + --profile-path ../data/complex_profiles_v2/profiles_200.jsonl \ + --output-dir ../results/test +``` + +### Full-Scale Experiment +```bash +# GPU Layout (4x A100 80GB): +# GPU 0-1: 70B user simulator (TP=2) +# GPU 2: 8B agent +# GPU 3: Embedding + Reranker + +cd collaborativeagents/slurm/fullscale +sbatch test_local_user.sh +``` + +### Key Arguments +| Argument | Description | +|----------|-------------| +| `--methods` | Comma-separated: vanilla,contextual,reflection,all_memory,rag,rag_vector | +| `--n-profiles` | Number of user profiles (max 200) | +| `--n-sessions` | Sessions per profile | +| `--max-turns` | Max turns per session | +| `--use-vllm` | Use vLLM for agent (required for batching) | +| `--use-openai-user` | Use OpenAI API for user simulator | +| `--vllm-user-url` | Local vLLM user simulator URL | +| `--parallel-profiles` | Batch size for turn-synchronous processing | +| `--reward-mode` | `keyword` (heuristic) or `llm` (GPT-5-nano judge) | + +--- + +## Current Results + +### Completed Experiments +Located in `collaborativeagents/results/`: + +| Experiment | Profiles | Sessions | Methods | Status | +|------------|----------|----------|---------|--------| +| `rag_vector_v2_*` | 10 | 10 | rag_vector | Complete | +| `gpt_user_all_methods_*` | 5-10 | 2-5 | all 6 | Partial | +| `test_50parallel_*` | 50 | 1 | vanilla | Test only | + +### Throughput Benchmarks +| Setup | Throughput | Notes | +|-------|------------|-------| +| OpenAI user + vLLM agent | ~60 sessions/hr | API latency bottleneck | +| Local 70B user + 8B agent | ~2000+ sessions/hr | Expected (not yet tested) | + +--- + +## Experiments To Be Done + +### 1. Full-Scale Benchmark (Priority: HIGH) +**Goal**: 200 profiles × 6 methods × 15 sessions = 18,000 sessions + +**Setup**: +- User simulator: Local 70B (vLLM, TP=2) - NOT OpenAI (too slow) +- Agent: 8B LLaMA (vLLM) +- Reward: LLM judge (GPT-5-nano via API) + +**GPU Layout** (4x A100 80GB): +``` +GPU 0-1: 70B user simulator (AWQ INT4, TP=2) +GPU 2: 8B agent +GPU 3: Embedding + Reranker (for RAG methods) +``` + +**Jobs**: Split by method and profile range (50 profiles each) +``` +collaborativeagents/slurm/fullscale/ +├── run_vanilla_p{0,50,100,150}.sh +├── run_contextual_p{0,50,100,150}.sh +├── run_reflection_p{0,50,100,150}.sh +├── run_all_memory_p{0,50,100,150}.sh +├── run_rag_p{0,50,100,150}.sh +├── run_rag_vector_p{0,50,100,150}.sh +└── submit_all.sh +``` + +### 2. Session Extension (Priority: MEDIUM) +If 15 sessions insufficient, continue to 30 sessions using checkpoint: +```bash +python run_experiments.py \ + --n-sessions 30 \ + --continue-from ../results/fullscale_15sess/... +``` + +### 3. Ablation Studies (Priority: LOW) +- RAG with BGE reranker (278M) vs Qwen3 (8B) +- Best-of-N sampling (N=3) for RAG methods +- Different embedding models + +--- + +## Key Files Reference + +### Core Personalization +| File | Purpose | +|------|---------| +| `src/personalization/serving/personalized_llm.py` | Main inference class with `chat()`, `chat_prepare()`, `chat_complete()` | +| `src/personalization/models/llm/vllm_chat.py` | vLLM HTTP client with `build_messages()`, `answer()` | +| `src/personalization/retrieval/policy.py` | Memory retrieval with user vector | +| `src/personalization/feedback/llm_reward.py` | GPT-based reward judge | + +### Experiment Framework +| File | Purpose | +|------|---------| +| `collaborativeagents/scripts/run_experiments.py` | Main experiment runner with batch processing | +| `collaborativeagents/adapters/*.py` | Method-specific adapters with `prepare_prompt()`, `process_response()` | +| `collaborativeagents/agents/batch_vllm_agent.py` | `BatchVLLMClient` and `BatchOpenAIClient` for async batching | + +### Data +| File | Purpose | +|------|---------| +| `collaborativeagents/data/complex_profiles_v2/profiles_200.jsonl` | 200 user profiles with 43 preferences each | +| `data/corpora/empty_store/` | Empty memory store for fresh experiments | + +--- + +## Troubleshooting + +### Quota Exceeded +```bash +# Check quota +quota -s + +# Move large files to HDD storage +mv /projects/bfqt/users/yurenh2/large_dir /work/hdd/bfqt/users/yurenh2/ +``` + +### vLLM Server Issues +```bash +# Check if server is running +curl http://localhost:8003/health + +# Kill existing servers +pkill -f "vllm.entrypoints" +``` + +### Out of GPU Memory +- Reduce `--gpu-memory-utilization` (default 0.90) +- Reduce `--max-model-len` (default 8192) +- Use quantized models (AWQ INT4) + +### Slow Throughput +- Use local vLLM user simulator instead of OpenAI API +- Increase `--parallel-profiles` for better batching +- Check vLLM logs for "Running: N reqs" to verify batching + +--- + +## Code Conventions + +1. **Batch Processing**: All adapters must implement `prepare_prompt()` and `process_response()` for batched vLLM calls +2. **Device Assignment**: GPUs 0-1 for large models, GPU 2 for agent, GPU 3 for embedding/reranker +3. **Checkpoints**: Session-level tracking in `checkpoint.json` with `sessions_per_profile` dict +4. **Results**: JSON format in `results.json` with metrics per session + +--- + +## Contact + +For questions about this codebase, refer to the experiment plan at: +`/u/yurenh2/.claude/plans/effervescent-mapping-ocean.md` -- cgit v1.2.3