| Age | Commit message (Collapse) | Author |
|
- z_long correlates with long-term prefs (p=0.006), z_short does not (p=0.586)
- This confirms dual-vector design: z_long=stable identity, z_short=transient context
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- Replace old 30s data (r=0.09 unreproducible) with 60s results
- z_long: bottom/top 25% comparison p=0.021*
- z_long captures long-term preference trends, z_short has no signal
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- Efficiency: +8.4% success/token vs reflection
- Late-session performance: 54.1% vs 51.8%
- Head-to-head, quick resolution, zero-enforcement, profile improvement stats
- Comprehensive report story summary
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- E/T difference 79% from slightly more enforcements, 20% from fewer turns
- Neither component individually significant
- rag_vector achieves results in fewer turns with lower user effort
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- Detect agent repetition bugs (7.1% rag_vector, 3.8% reflection)
- After cleanup: timeout rate significantly lower (p=0.046)
- User effort significantly lower (p=0.021)
- Paired t-test and Wilcoxon results with effect sizes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- RAG rewrite adapter and vector preference pipeline in personalized_llm
- 60-session experiment queue scripts (reflection, rag, rag_vector, rag_rewrite)
- Vector-preference correlation analysis and visualization scripts
- Local reward model batch processing improvements
- Updated CLAUDE.md with full experiment documentation and notes
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|