diff options
| author | YurenHao0426 <blackhao0426@gmail.com> | 2026-02-10 20:28:22 +0000 |
|---|---|---|
| committer | YurenHao0426 <blackhao0426@gmail.com> | 2026-02-10 20:28:22 +0000 |
| commit | 0c39a60d34ad8aff7b61b244c19bfd0160d9b446 (patch) | |
| tree | 8e130be5d80fc13e17fde0008526bb3b149a6166 | |
| parent | 440ef7dedf4198a15abb57e17f4a6e189657d810 (diff) | |
Add E/T decomposition analysis to notes
- E/T difference 79% from slightly more enforcements, 20% from fewer turns
- Neither component individually significant
- rag_vector achieves results in fewer turns with lower user effort
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| -rw-r--r-- | notes.md | 13 |
1 files changed, 13 insertions, 0 deletions
@@ -416,6 +416,19 @@ Wilcoxon signed-rank (non-parametric) 结果一致: E/T偏高可解释为: retrieval方法surface更多specific preferences,导致user给出更targeted feedback。 +### E/T分解分析 + +| 因素 | reflection | rag_vector | diff | 对E/T贡献 | +|------|-----------|-----------|------|----------| +| Enforcements/session | 1.47 | 1.54 | +0.07 (+4.8%) | **79%** | +| Turns/session | 8.41 | 8.31 | -0.10 (-1.2%) | 20% | +| E/T | 0.175 | 0.185 | +0.011 (+6.0%) | | + +- Enforcements差异 marginally significant (p=0.058),turns差异不显著 (p=0.19) +- E/T偏高79%来自enforcements略多,20%来自turns略少 +- rag_vector用更少turns完成任务 → 整体交互效率更高 +- **报告说法**: E/T差异不显著,而rag_vector用更少turns和更低user effort完成任务,说明整体交互效率更高 + --- ## 后续计划 |
