summaryrefslogtreecommitdiff
path: root/docs/method_additions.md
blob: 82b2c2fa95bb007ca98f9e4497425699f8b5d57d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
# Method Additions: Query Transformation, Global Preferences, Hyperparameters

Three additions to the Method section (Section 3).

---

## 1. Query Transformation (add to Section 3.5, after "Personalized Retrieval" paragraph)

```latex
\paragraph{Query transformation.}
A practical challenge for dense retrieval is the semantic gap
between task-oriented user queries (e.g., ``solve this
integral'') and preference descriptions (e.g., ``when math
problems, show step-by-step work'').
To bridge this gap, we apply a lightweight keyword-based
query transformation before dense retrieval.

Given a user query $q_t$, we detect the task type (math,
coding, writing, or explanation) by matching against curated
keyword lists.
If a task type is detected, we construct a supplementary query
\[
  q'_t = \texttt{"user preferences for \{task\_type\} tasks: "} \| \; q_t
\]
and perform multi-query dense retrieval: both $q_t$ and $q'_t$
are embedded, and for each memory card we take the
\emph{maximum} cosine similarity across the two query
embeddings.
The top-$k$ candidates by this max-similarity are then passed
to the reranker, which still uses only the original query
$q_t$.
This simple transformation improves recall of task-relevant
preferences without introducing an additional LLM call.
```

---

## 2. Global vs Conditional Preferences (add to Section 3.4, after "Memory cards" paragraph)

```latex
\paragraph{Global vs.\ conditional preferences.}
Not all preferences require retrieval.
Some preferences are universally applicable regardless of
task context (e.g., ``always respond in Chinese'',
``use numbered lists''), while others are
conditional on the task type (e.g., ``when coding, include
type hints'').
At extraction time, we classify each preference as
\emph{global} or \emph{conditional} based on its condition
field:
a preference is classified as global if its condition
contains universal indicators (e.g., ``general'', ``always'',
``any task'') or consists of fewer than three words with no
domain-specific terms (e.g., ``math'', ``code'').

Global preferences bypass the retrieval pipeline entirely
and are always injected into the agent prompt (up to a cap
of $10$), ensuring that universally applicable preferences
are never missed due to retrieval failure.
Only conditional preferences enter the dense retrieval and
reranking pipeline described above.
This two-tier design reduces the retrieval burden and
guarantees that high-frequency, always-applicable preferences
are consistently applied.
```

---

## 3. Hyperparameter Table (add to Section 4, after Models subsection or as a new subsection)

```latex
\subsection{Hyperparameters}
\label{sec:setup-hyperparams}

Table~\ref{tab:hyperparams} lists the key hyperparameters
used in all experiments.
These values are set heuristically and held fixed across all
methods and profiles.

\begin{table}[t]
  \centering
  \small
  \caption{Hyperparameters used in all experiments.}
  \label{tab:hyperparams}
  \begin{tabular}{llc}
    \toprule
    Component & Parameter & Value \\
    \midrule
    \multirow{4}{*}{User vector}
    & Item-space dimension $k$ & 256 \\
    & Long-term weight $\beta_L$ & 2.0 \\
    & Short-term weight $\beta_S$ & 5.0 \\
    & Softmax temperature $\tau$ & 1.0 \\
    \midrule
    \multirow{4}{*}{REINFORCE}
    & Long-term learning rate $\eta_L$ & 0.01 \\
    & Short-term learning rate $\eta_S$ & 0.05 \\
    & Short-term decay $\lambda$ & 0.1 \\
    & Baseline EMA coefficient $\alpha$ & 0.05 \\
    \midrule
    \multirow{2}{*}{Retrieval}
    & Dense retrieval top-$k$ & 64 \\
    & Reranker top-$k$ & 5 \\
    \midrule
    \multirow{2}{*}{Global prefs}
    & Max global notes in prompt & 10 \\
    & Min condition words (global) & $\leq 2$ \\
    \midrule
    \multirow{2}{*}{Embedding}
    & Embedding dimension $d$ & 4096 \\
    & PCA components $k$ & 256 \\
    \midrule
    \multirow{3}{*}{Interaction}
    & Sessions per profile & 60 \\
    & Max turns per session & 10 \\
    & Max generation tokens & 512 \\
    \bottomrule
  \end{tabular}
\end{table}
```