summaryrefslogtreecommitdiff
path: root/collaborativeagents/scripts/quick_test_batch.sh
diff options
context:
space:
mode:
authorYurenHao0426 <blackhao0426@gmail.com>2026-02-11 02:29:27 +0000
committerYurenHao0426 <blackhao0426@gmail.com>2026-02-11 02:29:27 +0000
commitf23b25dda044046ef6d21ed9c2e28df6f54e04d6 (patch)
treee065e31ae42d0fcfcd66c7628adffdf0391df805 /collaborativeagents/scripts/quick_test_batch.sh
parent8af96d046e69fe9463ce89f000f06916cc043b31 (diff)
Add revised reward modeling LaTeX section matching code implementation
Key changes from original: - Input: (q_t, a_t, q_{t+1}) only, removed A_t (not used in judge prompt) - Single 7-label LLM classifier replaces abstract C_reward/C_gate - Gating = classifier confidence (threshold tau_c=0.6), not memory attribution - Explicitly describes Llama-3.1-8B-Instruct as judge model Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diffstat (limited to 'collaborativeagents/scripts/quick_test_batch.sh')
0 files changed, 0 insertions, 0 deletions