personalization-user-model.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2026-02-11	Add revised introduction LaTeX section	YurenHao0426
	Key changes: - Domain: math-only → 3 task domains (math-hard, math-500, bigcodebench) - Scale: 5 profiles/40 pool → 60 profiles/200 pool, 60 sessions - "correlates strongly" → significance-based claims (p=0.006, p=0.046) - Contributions rewritten: efficiency gains + dual-vector separation - Related work paragraphs unchanged (still accurate) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Add revised conclusion LaTeX section	YurenHao0426
	Updated all numbers to match current experiments: - 55.2% success (not 71%), 60 profiles (not 5), 3 datasets - Token reduction 16.9% (not "halving") - Significance results (timeout p=0.046, effort p=0.021) - Dual-vector separation (z_long p=0.006, z_short p=0.586) - Updated future work (ablation underway, LLM judge ready) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Add preference format compliance paragraph to discussion	YurenHao0426
	Discusses how structured JSON preferences are harder for 8B agent to follow vs Reflection's natural language. Notes prompt template bias toward Reflection. Reports RAG+Rewrite improvement (+0.8pp success, -1.4pp timeout), closing ~50% of RAG-Reflection gap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Add revised discussion & limitations LaTeX section	YurenHao0426
	Complete rewrite with current data (60 profiles × 60 sessions): - Updated all numbers and removed stale references - Removed duplicate paragraph - Added: user vector role analysis (RAG 44.3% → RAG+Vec 26.4% timeout) - Added: E/T decomposition (79% from enforcements, not negative) - Added: why Vanilla performs well discussion - Updated: user-vector geometry (ρ=0.040, dual-vector separation) - Updated: limitations (keyword reward, no GRPO, 60 profiles) - Updated: future directions (ablation underway, LLM judge ready) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Add revised results LaTeX section with actual data	YurenHao0426
	Key changes: - Fixed metadata (3 datasets, 60 profiles × 60 sessions) - Removed false "three random seeds" claim - Replaced all placeholder/TBD text with actual analysis - Added significance tests table (paired t-test, p-values) - Added E/T decomposition analysis - Filled in user-vector representation analysis with actual data (Spearman rho, quartile tests, dual-vector separation) - Added bug cleaning disclosure (repetition bug) - Refined failure modes section Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Add revised experimental setup LaTeX section	YurenHao0426
	Key corrections: - 3 datasets (math-hard, math-500, bigcodebench), not math-hard only - 60 profiles × 60 sessions, not 200 profiles × 60 turns - User simulator: Llama-3.3-70B-Instruct (not 3.1) - GPU layout: agent on GPU 2, embed/reranker on GPU 3 - Added reward model description - Fixed incomplete sentence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11	Add revised reward modeling LaTeX section matching code implementation	YurenHao0426
	Key changes from original: - Input: (q_t, a_t, q_{t+1}) only, removed A_t (not used in judge prompt) - Single 7-label LLM classifier replaces abstract C_reward/C_gate - Gating = classifier confidence (threshold tau_c=0.6), not memory attribution - Explicitly describes Llama-3.1-8B-Instruct as judge model Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10	Add RAG rewrite, 60-session experiment scripts, and analysis tools	YurenHao0426
	- RAG rewrite adapter and vector preference pipeline in personalized_llm - 60-session experiment queue scripts (reflection, rag, rag_vector, rag_rewrite) - Vector-preference correlation analysis and visualization scripts - Local reward model batch processing improvements - Updated CLAUDE.md with full experiment documentation and notes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>