|
Table 1 (main, K=4): 11 baselines + 5 UPH variants (d=8..128) with
state size, inference tokens, R-L±std, METEOR±std on both tasks.
Table 2 (review_k): full K=4/8/16 ROUGE-L for all 15 methods.
Table 3 (topic_k): full K=4/8/16 ROUGE-L for all 15 methods.
d sweep is now folded into each table's UPH block, replacing the
separate small ablation table.
Rewrote §3.2 prose to reflect the flat K-scaling observed universally
on LongLaMP and the low-dimensional nature of the user prior.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
3-4 page paper with abstract, intro, experimental setup, results, discussion.
Uses ACM sigconf template from uph-paper.zip. Reports all metrics as
mean±std (N=200 per setting).
Key results:
- Main table (K=4): UPH vs 10 baselines (ICL + PEFT)
- K ablation (K=4,8,16) and d ablation (d=8,16,32,64,128)
- Significance tests embedded in prose (paired t-test)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|