diff options
| author | YurenHao0426 <blackhao0426@gmail.com> | 2026-02-11 02:29:27 +0000 |
|---|---|---|
| committer | YurenHao0426 <blackhao0426@gmail.com> | 2026-02-11 02:29:27 +0000 |
| commit | f23b25dda044046ef6d21ed9c2e28df6f54e04d6 (patch) | |
| tree | e065e31ae42d0fcfcd66c7628adffdf0391df805 /src/personalization/serving/api/routes/query.py | |
| parent | 8af96d046e69fe9463ce89f000f06916cc043b31 (diff) | |
Add revised reward modeling LaTeX section matching code implementation
Key changes from original:
- Input: (q_t, a_t, q_{t+1}) only, removed A_t (not used in judge prompt)
- Single 7-label LLM classifier replaces abstract C_reward/C_gate
- Gating = classifier confidence (threshold tau_c=0.6), not memory attribution
- Explicitly describes Llama-3.1-8B-Instruct as judge model
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diffstat (limited to 'src/personalization/serving/api/routes/query.py')
0 files changed, 0 insertions, 0 deletions
