# Fix: Section 3.5 User Vector Definition ## Problem Paper defines $z^{(L)}_u$ as the mean of item vectors, but code initializes it at zero and learns purely via REINFORCE. ## Change Replace the $z^{(L)}$ definition paragraph. Everything else in Section 3.5 stays the same. --- ## Original (replace this) ```latex \paragraph{Long-term and short-term user vectors.} For each user $u$, let $H(u)$ be the set of memory cards associated with $u$. We define the long-term user vector as the mean of their item vectors: \[ z^{(L)}_u = \frac{1}{|H(u)|} \sum_{m \in H(u)} v_m \in \mathbb{R}^k. \] This vector captures stable preferences across sessions, such as preferred language or typical level of detail. We also maintain a short-term vector $z^{(S)}_{u,t} \in \mathbb{R}^k$ that captures session-specific drift and recency effects. At the start of a session, $z^{(S)}_{u,0}$ is initialized to $0$ and updated online from feedback (Section~\ref{sec:method-rl}). The effective user vector at time $t$ is a convex combination \[ z^{\text{eff}}_{u,t} = \beta_L z^{(L)}_u + \beta_S z^{(S)}_{u,t}, \] where $\beta_L, \beta_S \ge 0$ control the relative weight of long- and short-term preferences. In our style instantiation, $z^{(L)}_u$ encodes long-run style tendencies (e.g., always short/Chinese), while $z^{(S)}_{u,t}$ can adapt to transient changes within a session. ``` ## Replacement ```latex \paragraph{Long-term and short-term user vectors.} For each user $u$, we maintain two learned vectors in the item space $\mathbb{R}^k$. The \emph{long-term vector} $z^{(L)}_u \in \mathbb{R}^k$ is initialized to zero and updated across sessions via the REINFORCE rule in Section~\ref{sec:method-rl}. Because it is never reset, $z^{(L)}_u$ accumulates gradient information from all past interactions and captures stable preferences such as preferred language or typical level of detail. The \emph{short-term vector} $z^{(S)}_{u,t} \in \mathbb{R}^k$ captures session-specific context and recency effects. It is reset to zero at the start of each session and updated within the session from turn-level feedback, with an exponential decay that down-weights older signals (Section~\ref{sec:method-rl}). The effective user vector at time $t$ is a weighted combination \[ z^{\text{eff}}_{u,t} = \beta_L\, z^{(L)}_u + \beta_S\, z^{(S)}_{u,t}, \] where $\beta_L, \beta_S \ge 0$ control the relative influence of cross-session and within-session preferences. Both vectors are learned entirely from interaction feedback; no preference labels, pre-computed centroids, or explicit user features are required. ```