diff options
Diffstat (limited to 'notes.md')
| -rw-r--r-- | notes.md | 18 |
1 files changed, 12 insertions, 6 deletions
@@ -432,12 +432,18 @@ Rewrite draft: "(3, π/2)" | Repetition bug | 254 (7.1%) | 138 (3.8%) | | JSON leak | ~0 | ~0 | -### 清理后指标对比 - -| Method | Success | Timeout | E/T | User Tokens | -|--------|---------|---------|-----|-------------| -| reflection (cleaned) | 54.4% | 28.8% | 0.175 | 207.5 | -| **rag_vector (cleaned)** | **55.2%** | **26.4%** | 0.186 | **193.6** | +### 清理后指标对比 (全方法) + +| Method | Success | Timeout | E/T | User Tokens | 来源 | +|--------|---------|---------|-----|-------------|------| +| **rag_vector** | **55.2%** | **26.4%** | 0.186 | **193.6** | 实际 | +| vanilla | ~54.3% | ~29.2% | ~0.177 | ~232.9 | 估算 | +| reflection | 54.4% | 28.8% | 0.175 | 207.5 | 实际 | +| contextual | ~52.4% | ~31.4% | ~0.191 | ~213.7 | 估算 | +| all_memory | ~50.9% | ~33.4% | ~0.175 | ~226.8 | 估算 | + +估算方法: fullscale_200p60s中各方法与reflection的比例 × 60s实际reflection值。 +vanilla/contextual/all_memory代码无版本差异,比例稳定(fullscale中vanilla/reflection=0.999)。 ### 显著性检验 (paired t-test, one-sided, N=60 profiles) |
