| Age | Commit message (Collapse) | Author |
|
- E/T difference 79% from slightly more enforcements, 20% from fewer turns
- Neither component individually significant
- rag_vector achieves results in fewer turns with lower user effort
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- Detect agent repetition bugs (7.1% rag_vector, 3.8% reflection)
- After cleanup: timeout rate significantly lower (p=0.046)
- User effort significantly lower (p=0.021)
- Paired t-test and Wilcoxon results with effect sizes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- RAG rewrite adapter and vector preference pipeline in personalized_llm
- 60-session experiment queue scripts (reflection, rag, rag_vector, rag_rewrite)
- Vector-preference correlation analysis and visualization scripts
- Local reward model batch processing improvements
- Updated CLAUDE.md with full experiment documentation and notes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
|
Add Python wrappers for:
- Qwen3/Nemotron embedding models
- BGE/Qwen3 rerankers
- vLLM/Llama/Qwen LLM backends
- GPT-4o/LLM-based preference extractors
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
|
|
|
|
|
- Add collaborativeagents subproject with adapters, agents, and evaluation modules
- Update .gitignore to exclude large binary files (.whl, .tar), wandb logs, and results
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
|
|