| Age | Commit message (Collapse) | Author |
|
Key corrections:
- 3 datasets (math-hard, math-500, bigcodebench), not math-hard only
- 60 profiles × 60 sessions, not 200 profiles × 60 turns
- User simulator: Llama-3.3-70B-Instruct (not 3.1)
- GPU layout: agent on GPU 2, embed/reranker on GPU 3
- Added reward model description
- Fixed incomplete sentence
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Key changes from original:
- Input: (q_t, a_t, q_{t+1}) only, removed A_t (not used in judge prompt)
- Single 7-label LLM classifier replaces abstract C_reward/C_gate
- Gating = classifier confidence (threshold tau_c=0.6), not memory attribution
- Explicitly describes Llama-3.1-8B-Instruct as judge model
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- RAG rewrite adapter and vector preference pipeline in personalized_llm
- 60-session experiment queue scripts (reflection, rag, rag_vector, rag_rewrite)
- Vector-preference correlation analysis and visualization scripts
- Local reward model batch processing improvements
- Updated CLAUDE.md with full experiment documentation and notes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|