| Age | Commit message (Collapse) | Author | |
|---|---|---|---|
| 2026-02-11 | Add revised reward modeling LaTeX section matching code implementation | YurenHao0426 | |
| Key changes from original: - Input: (q_t, a_t, q_{t+1}) only, removed A_t (not used in judge prompt) - Single 7-label LLM classifier replaces abstract C_reward/C_gate - Gating = classifier confidence (threshold tau_c=0.6), not memory attribution - Explicitly describes Llama-3.1-8B-Instruct as judge model Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> | |||
