diff options
| author | YurenHao0426 <blackhao0426@gmail.com> | 2026-02-11 03:28:09 +0000 |
|---|---|---|
| committer | YurenHao0426 <blackhao0426@gmail.com> | 2026-02-11 03:28:09 +0000 |
| commit | dcc20b1f77702e5b45e2e6c08b0f243124c4676e (patch) | |
| tree | 28a2a2c7d98202f4e93de0cdbc7412c38c9fec65 | |
| parent | 6a917d3eda85e5725c2d5ad3bf5ec9bd30262198 (diff) | |
Fix z_long definition to match code (zero-init + REINFORCE, not mean)
Paper incorrectly defined z_long as mean of item vectors.
Code initializes z_long at zero and learns purely via REINFORCE.
Also clarifies z_short reset-per-session behavior.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| -rw-r--r-- | docs/user_vector_section_fix.md | 74 |
1 files changed, 74 insertions, 0 deletions
diff --git a/docs/user_vector_section_fix.md b/docs/user_vector_section_fix.md new file mode 100644 index 0000000..242ff6b --- /dev/null +++ b/docs/user_vector_section_fix.md @@ -0,0 +1,74 @@ +# Fix: Section 3.5 User Vector Definition + +## Problem + +Paper defines $z^{(L)}_u$ as the mean of item vectors, but code initializes it at zero and learns purely via REINFORCE. + +## Change + +Replace the $z^{(L)}$ definition paragraph. Everything else in Section 3.5 stays the same. + +--- + +## Original (replace this) + +```latex +\paragraph{Long-term and short-term user vectors.} +For each user $u$, let $H(u)$ be the set of memory cards +associated with $u$. +We define the long-term user vector as the mean of their item +vectors: +\[ + z^{(L)}_u = \frac{1}{|H(u)|} \sum_{m \in H(u)} v_m \in \mathbb{R}^k. +\] +This vector captures stable preferences across sessions, such as +preferred language or typical level of detail. + +We also maintain a short-term vector $z^{(S)}_{u,t} \in \mathbb{R}^k$ +that captures session-specific drift and recency effects. +At the start of a session, $z^{(S)}_{u,0}$ is initialized to $0$ +and updated online from feedback (Section~\ref{sec:method-rl}). +The effective user vector at time $t$ is a convex combination +\[ + z^{\text{eff}}_{u,t} = \beta_L z^{(L)}_u + \beta_S z^{(S)}_{u,t}, +\] +where $\beta_L, \beta_S \ge 0$ control the relative weight of +long- and short-term preferences. +In our style instantiation, $z^{(L)}_u$ encodes long-run style +tendencies (e.g., always short/Chinese), while $z^{(S)}_{u,t}$ +can adapt to transient changes within a session. +``` + +## Replacement + +```latex +\paragraph{Long-term and short-term user vectors.} +For each user $u$, we maintain two learned vectors in the +item space $\mathbb{R}^k$. + +The \emph{long-term vector} $z^{(L)}_u \in \mathbb{R}^k$ is +initialized to zero and updated across sessions via the +REINFORCE rule in Section~\ref{sec:method-rl}. +Because it is never reset, $z^{(L)}_u$ accumulates gradient +information from all past interactions and captures stable +preferences such as preferred language or typical level of +detail. + +The \emph{short-term vector} $z^{(S)}_{u,t} \in \mathbb{R}^k$ +captures session-specific context and recency effects. +It is reset to zero at the start of each session and updated +within the session from turn-level feedback, with an +exponential decay that down-weights older signals +(Section~\ref{sec:method-rl}). + +The effective user vector at time $t$ is a weighted combination +\[ + z^{\text{eff}}_{u,t} + = \beta_L\, z^{(L)}_u + \beta_S\, z^{(S)}_{u,t}, +\] +where $\beta_L, \beta_S \ge 0$ control the relative influence +of cross-session and within-session preferences. +Both vectors are learned entirely from interaction feedback; +no preference labels, pre-computed centroids, or explicit user +features are required. +``` |
