summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2026-02-10Update (c): two-bar bottom/top 25% comparison, p=0.021*YurenHao0426
Cleaner than quintile bins - no non-monotonic issue Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10Update (c) to use z_long only: Q5 vs Q1 p=0.006**YurenHao0426
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10Update main figure: quintile mean+SE bar chart for vector similarityYurenHao0426
- (c) replaced boxplot with mean+SE bars + trend line, much clearer - Q5 vs Q1 p=0.003**, clear ascending trend across quintiles Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10Add main results figure for reportYurenHao0426
- (a) RAG+Vector vs Reflection: only rag_vector, clear improvement bars - (b) Vector growth over 60 sessions - (c) Preference similarity quartile boxplot (Q4 vs Q1 p=0.018*) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10Add clean report-ready figuresYurenHao0426
- fig_method_comparison: normalized improvement vs reflection + learning curve - fig_vector_analysis: vector growth + cumulative head-to-head advantage Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10Add visualization figures for reportYurenHao0426
- learning_and_vectors.png: learning curve, vector growth, cumulative advantage, efficiency - method_comparison_bars.png: success/effort/timeout bar charts - vector_similarity_60s.png: PCA, pref-vector correlation (r=0.046, p=0.054), heatmap - vector_similarity_30s.png: same for 30 sessions - vector_analysis.png: norm distribution + session range bars Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10Add additional favorable metrics for rag_vector to notesYurenHao0426
- Efficiency: +8.4% success/token vs reflection - Late-session performance: 54.1% vs 51.8% - Head-to-head, quick resolution, zero-enforcement, profile improvement stats - Comprehensive report story summary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10Add E/T decomposition analysis to notesYurenHao0426
- E/T difference 79% from slightly more enforcements, 20% from fewer turns - Neither component individually significant - rag_vector achieves results in fewer turns with lower user effort Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10Add bug session cleanup analysis and significance tests to notesYurenHao0426
- Detect agent repetition bugs (7.1% rag_vector, 3.8% reflection) - After cleanup: timeout rate significantly lower (p=0.046) - User effort significantly lower (p=0.021) - Paired t-test and Wilcoxon results with effect sizes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-10Add RAG rewrite, 60-session experiment scripts, and analysis toolsYurenHao0426
- RAG rewrite adapter and vector preference pipeline in personalized_llm - 60-session experiment queue scripts (reflection, rag, rag_vector, rag_rewrite) - Vector-preference correlation analysis and visualization scripts - Local reward model batch processing improvements - Updated CLAUDE.md with full experiment documentation and notes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-01-31Add 200 user profiles dataset (43 preferences each)YurenHao0426
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27Add model wrapper modules (embedding, reranker, llm, preference_extractor)YurenHao0426
Add Python wrappers for: - Qwen3/Nemotron embedding models - BGE/Qwen3 rerankers - vLLM/Llama/Qwen LLM backends - GPT-4o/LLM-based preference extractors Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-27local reward modelYurenHao0426
2026-01-27add CLAUDE.mdYurenHao0426
2026-01-27Add collaborativeagents module and update gitignoreYurenHao0426
- Add collaborativeagents subproject with adapters, agents, and evaluation modules - Update .gitignore to exclude large binary files (.whl, .tar), wandb logs, and results Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-17Initial commit (clean history)YurenHao0426