personalization-user-model.git, branch main

Add PCA explained variance analysis for item-space dimension justification

2026-02-11T06:01:44+00:00

k=256 captures 99.9% variance from 4096-dim Qwen3-8B embeddings;
99% variance at k=41 confirms low intrinsic dimensionality of preference space.

Co-Authored-By: Claude Opus 4.6

Add query transformation, global preferences, and hyperparameter table

2026-02-11T03:33:47+00:00

Three additions to Method/Setup sections:
1. Query transformation: keyword-based task detection + multi-query
   dense retrieval to bridge semantic gap (Section 3.5)
2. Global vs conditional preferences: universal prefs bypass retrieval,
   always injected into prompt (Section 3.4)
3. Hyperparameter table with all key values (Section 4)

Co-Authored-By: Claude Opus 4.6

Fix z_long definition to match code (zero-init + REINFORCE, not mean)

2026-02-11T03:28:09+00:00

Paper incorrectly defined z_long as mean of item vectors.
Code initializes z_long at zero and learns purely via REINFORCE.
Also clarifies z_short reset-per-session behavior.

Co-Authored-By: Claude Opus 4.6

Rewrite reward section to describe keyword heuristic (matches experiments)

2026-02-11T03:14:37+00:00

Replaced LLM-as-judge description with actual keyword-based system:
- Reward: sentiment keyword matching + topic coherence via embedding similarity
- Gating: separate retrieval-attribution heuristic using memory-query cosine
  similarity (g_t=0.9 retrieval fault, g_t=0.2 LLM fault, etc.)
- No additional model needed (fast, no GPU)
- REINFORCE update unchanged

Co-Authored-By: Claude Opus 4.6

Add revised abstract LaTeX

2026-02-11T03:09:48+00:00

Updated all claims to match actual experiments:
- 3 domains, 60 profiles × 60 sessions
- 55.2% success (not 71%), significance tests
- 16.9% user effort reduction (not "halving")
- Dual-vector separation (z_long p=0.006, z_short p=0.586)

Co-Authored-By: Claude Opus 4.6

Add revised introduction LaTeX section

2026-02-11T03:07:56+00:00

Key changes:
- Domain: math-only → 3 task domains (math-hard, math-500, bigcodebench)
- Scale: 5 profiles/40 pool → 60 profiles/200 pool, 60 sessions
- "correlates strongly" → significance-based claims (p=0.006, p=0.046)
- Contributions rewritten: efficiency gains + dual-vector separation
- Related work paragraphs unchanged (still accurate)

Co-Authored-By: Claude Opus 4.6

Add revised conclusion LaTeX section

2026-02-11T03:05:53+00:00

Updated all numbers to match current experiments:
- 55.2% success (not 71%), 60 profiles (not 5), 3 datasets
- Token reduction 16.9% (not "halving")
- Significance results (timeout p=0.046, effort p=0.021)
- Dual-vector separation (z_long p=0.006, z_short p=0.586)
- Updated future work (ablation underway, LLM judge ready)

Co-Authored-By: Claude Opus 4.6

Add preference format compliance paragraph to discussion

2026-02-11T03:03:53+00:00

Discusses how structured JSON preferences are harder for 8B agent
to follow vs Reflection's natural language. Notes prompt template
bias toward Reflection. Reports RAG+Rewrite improvement (+0.8pp
success, -1.4pp timeout), closing ~50% of RAG-Reflection gap.

Co-Authored-By: Claude Opus 4.6

Add revised discussion & limitations LaTeX section

2026-02-11T02:59:00+00:00

Complete rewrite with current data (60 profiles × 60 sessions):
- Updated all numbers and removed stale references
- Removed duplicate paragraph
- Added: user vector role analysis (RAG 44.3% → RAG+Vec 26.4% timeout)
- Added: E/T decomposition (79% from enforcements, not negative)
- Added: why Vanilla performs well discussion
- Updated: user-vector geometry (ρ=0.040, dual-vector separation)
- Updated: limitations (keyword reward, no GRPO, 60 profiles)
- Updated: future directions (ablation underway, LLM judge ready)

Co-Authored-By: Claude Opus 4.6

Add revised results LaTeX section with actual data

2026-02-11T02:51:36+00:00

Key changes:
- Fixed metadata (3 datasets, 60 profiles × 60 sessions)
- Removed false "three random seeds" claim
- Replaced all placeholder/TBD text with actual analysis
- Added significance tests table (paired t-test, p-values)
- Added E/T decomposition analysis
- Filled in user-vector representation analysis with actual data
  (Spearman rho, quartile tests, dual-vector separation)
- Added bug cleaning disclosure (repetition bug)
- Refined failure modes section

Co-Authored-By: Claude Opus 4.6