personalization-user-model.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	YurenHao0426 <blackhao0426@gmail.com>	2026-02-11 02:29:27 +0000
committer	YurenHao0426 <blackhao0426@gmail.com>	2026-02-11 02:29:27 +0000
commit	f23b25dda044046ef6d21ed9c2e28df6f54e04d6 (patch)
tree	e065e31ae42d0fcfcd66c7628adffdf0391df805 /collaborativeagents/scripts/quick_test_batch.sh
parent	8af96d046e69fe9463ce89f000f06916cc043b31 (diff)

Add revised reward modeling LaTeX section matching code implementation

Key changes from original: - Input: (q_t, a_t, q_{t+1}) only, removed A_t (not used in judge prompt) - Single 7-label LLM classifier replaces abstract C_reward/C_gate - Gating = classifier confidence (threshold tau_c=0.6), not memory attribution - Explicitly describes Llama-3.1-8B-Instruct as judge model Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Diffstat (limited to 'collaborativeagents/scripts/quick_test_batch.sh')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: