summaryrefslogtreecommitdiff
path: root/docs
AgeCommit message (Collapse)Author
2026-02-11Add query transformation, global preferences, and hyperparameter tableYurenHao0426
Three additions to Method/Setup sections: 1. Query transformation: keyword-based task detection + multi-query dense retrieval to bridge semantic gap (Section 3.5) 2. Global vs conditional preferences: universal prefs bypass retrieval, always injected into prompt (Section 3.4) 3. Hyperparameter table with all key values (Section 4) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11Fix z_long definition to match code (zero-init + REINFORCE, not mean)YurenHao0426
Paper incorrectly defined z_long as mean of item vectors. Code initializes z_long at zero and learns purely via REINFORCE. Also clarifies z_short reset-per-session behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11Rewrite reward section to describe keyword heuristic (matches experiments)YurenHao0426
Replaced LLM-as-judge description with actual keyword-based system: - Reward: sentiment keyword matching + topic coherence via embedding similarity - Gating: separate retrieval-attribution heuristic using memory-query cosine similarity (g_t=0.9 retrieval fault, g_t=0.2 LLM fault, etc.) - No additional model needed (fast, no GPU) - REINFORCE update unchanged Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11Add revised abstract LaTeXYurenHao0426
Updated all claims to match actual experiments: - 3 domains, 60 profiles × 60 sessions - 55.2% success (not 71%), significance tests - 16.9% user effort reduction (not "halving") - Dual-vector separation (z_long p=0.006, z_short p=0.586) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11Add revised introduction LaTeX sectionYurenHao0426
Key changes: - Domain: math-only → 3 task domains (math-hard, math-500, bigcodebench) - Scale: 5 profiles/40 pool → 60 profiles/200 pool, 60 sessions - "correlates strongly" → significance-based claims (p=0.006, p=0.046) - Contributions rewritten: efficiency gains + dual-vector separation - Related work paragraphs unchanged (still accurate) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11Add revised conclusion LaTeX sectionYurenHao0426
Updated all numbers to match current experiments: - 55.2% success (not 71%), 60 profiles (not 5), 3 datasets - Token reduction 16.9% (not "halving") - Significance results (timeout p=0.046, effort p=0.021) - Dual-vector separation (z_long p=0.006, z_short p=0.586) - Updated future work (ablation underway, LLM judge ready) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11Add preference format compliance paragraph to discussionYurenHao0426
Discusses how structured JSON preferences are harder for 8B agent to follow vs Reflection's natural language. Notes prompt template bias toward Reflection. Reports RAG+Rewrite improvement (+0.8pp success, -1.4pp timeout), closing ~50% of RAG-Reflection gap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11Add revised discussion & limitations LaTeX sectionYurenHao0426
Complete rewrite with current data (60 profiles × 60 sessions): - Updated all numbers and removed stale references - Removed duplicate paragraph - Added: user vector role analysis (RAG 44.3% → RAG+Vec 26.4% timeout) - Added: E/T decomposition (79% from enforcements, not negative) - Added: why Vanilla performs well discussion - Updated: user-vector geometry (ρ=0.040, dual-vector separation) - Updated: limitations (keyword reward, no GRPO, 60 profiles) - Updated: future directions (ablation underway, LLM judge ready) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11Add revised results LaTeX section with actual dataYurenHao0426
Key changes: - Fixed metadata (3 datasets, 60 profiles × 60 sessions) - Removed false "three random seeds" claim - Replaced all placeholder/TBD text with actual analysis - Added significance tests table (paired t-test, p-values) - Added E/T decomposition analysis - Filled in user-vector representation analysis with actual data (Spearman rho, quartile tests, dual-vector separation) - Added bug cleaning disclosure (repetition bug) - Refined failure modes section Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11Add revised experimental setup LaTeX sectionYurenHao0426
Key corrections: - 3 datasets (math-hard, math-500, bigcodebench), not math-hard only - 60 profiles × 60 sessions, not 200 profiles × 60 turns - User simulator: Llama-3.3-70B-Instruct (not 3.1) - GPU layout: agent on GPU 2, embed/reranker on GPU 3 - Added reward model description - Fixed incomplete sentence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-11Add revised reward modeling LaTeX section matching code implementationYurenHao0426
Key changes from original: - Input: (q_t, a_t, q_{t+1}) only, removed A_t (not used in judge prompt) - Single 7-label LLM classifier replaces abstract C_reward/C_gate - Gating = classifier confidence (threshold tau_c=0.6), not memory attribution - Explicitly describes Llama-3.1-8B-Instruct as judge model Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10Add RAG rewrite, 60-session experiment scripts, and analysis toolsYurenHao0426
- RAG rewrite adapter and vector preference pipeline in personalized_llm - 60-session experiment queue scripts (reflection, rag, rag_vector, rag_rewrite) - Vector-preference correlation analysis and visualization scripts - Local reward model batch processing improvements - Updated CLAUDE.md with full experiment documentation and notes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>