personalization-user-model.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2026-02-11	Add query transformation, global preferences, and hyperparameter table	YurenHao0426
	Three additions to Method/Setup sections: 1. Query transformation: keyword-based task detection + multi-query dense retrieval to bridge semantic gap (Section 3.5) 2. Global vs conditional preferences: universal prefs bypass retrieval, always injected into prompt (Section 3.4) 3. Hyperparameter table with all key values (Section 4) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Fix z_long definition to match code (zero-init + REINFORCE, not mean)	YurenHao0426
	Paper incorrectly defined z_long as mean of item vectors. Code initializes z_long at zero and learns purely via REINFORCE. Also clarifies z_short reset-per-session behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Rewrite reward section to describe keyword heuristic (matches experiments)	YurenHao0426
	Replaced LLM-as-judge description with actual keyword-based system: - Reward: sentiment keyword matching + topic coherence via embedding similarity - Gating: separate retrieval-attribution heuristic using memory-query cosine similarity (g_t=0.9 retrieval fault, g_t=0.2 LLM fault, etc.) - No additional model needed (fast, no GPU) - REINFORCE update unchanged Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Add revised abstract LaTeX	YurenHao0426
	Updated all claims to match actual experiments: - 3 domains, 60 profiles × 60 sessions - 55.2% success (not 71%), significance tests - 16.9% user effort reduction (not "halving") - Dual-vector separation (z_long p=0.006, z_short p=0.586) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Add revised introduction LaTeX section	YurenHao0426
	Key changes: - Domain: math-only → 3 task domains (math-hard, math-500, bigcodebench) - Scale: 5 profiles/40 pool → 60 profiles/200 pool, 60 sessions - "correlates strongly" → significance-based claims (p=0.006, p=0.046) - Contributions rewritten: efficiency gains + dual-vector separation - Related work paragraphs unchanged (still accurate) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Add revised conclusion LaTeX section	YurenHao0426
	Updated all numbers to match current experiments: - 55.2% success (not 71%), 60 profiles (not 5), 3 datasets - Token reduction 16.9% (not "halving") - Significance results (timeout p=0.046, effort p=0.021) - Dual-vector separation (z_long p=0.006, z_short p=0.586) - Updated future work (ablation underway, LLM judge ready) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Add preference format compliance paragraph to discussion	YurenHao0426
	Discusses how structured JSON preferences are harder for 8B agent to follow vs Reflection's natural language. Notes prompt template bias toward Reflection. Reports RAG+Rewrite improvement (+0.8pp success, -1.4pp timeout), closing ~50% of RAG-Reflection gap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Add revised discussion & limitations LaTeX section	YurenHao0426
	Complete rewrite with current data (60 profiles × 60 sessions): - Updated all numbers and removed stale references - Removed duplicate paragraph - Added: user vector role analysis (RAG 44.3% → RAG+Vec 26.4% timeout) - Added: E/T decomposition (79% from enforcements, not negative) - Added: why Vanilla performs well discussion - Updated: user-vector geometry (ρ=0.040, dual-vector separation) - Updated: limitations (keyword reward, no GRPO, 60 profiles) - Updated: future directions (ablation underway, LLM judge ready) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Add revised results LaTeX section with actual data	YurenHao0426
	Key changes: - Fixed metadata (3 datasets, 60 profiles × 60 sessions) - Removed false "three random seeds" claim - Replaced all placeholder/TBD text with actual analysis - Added significance tests table (paired t-test, p-values) - Added E/T decomposition analysis - Filled in user-vector representation analysis with actual data (Spearman rho, quartile tests, dual-vector separation) - Added bug cleaning disclosure (repetition bug) - Refined failure modes section Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Add revised experimental setup LaTeX section	YurenHao0426
	Key corrections: - 3 datasets (math-hard, math-500, bigcodebench), not math-hard only - 60 profiles × 60 sessions, not 200 profiles × 60 turns - User simulator: Llama-3.3-70B-Instruct (not 3.1) - GPU layout: agent on GPU 2, embed/reranker on GPU 3 - Added reward model description - Fixed incomplete sentence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Add revised reward modeling LaTeX section matching code implementation	YurenHao0426
	Key changes from original: - Input: (q_t, a_t, q_{t+1}) only, removed A_t (not used in judge prompt) - Single 7-label LLM classifier replaces abstract C_reward/C_gate - Gating = classifier confidence (threshold tau_c=0.6), not memory attribution - Explicitly describes Llama-3.1-8B-Instruct as judge model Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10	Add RAG rewrite, 60-session experiment scripts, and analysis tools	YurenHao0426
	- RAG rewrite adapter and vector preference pipeline in personalized_llm - 60-session experiment queue scripts (reflection, rag, rag_vector, rag_rewrite) - Vector-preference correlation analysis and visualization scripts - Local reward model batch processing improvements - Updated CLAUDE.md with full experiment documentation and notes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>