personalization-user-model.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	YurenHao0426 <blackhao0426@gmail.com>	2026-02-11 03:14:37 +0000
committer	YurenHao0426 <blackhao0426@gmail.com>	2026-02-11 03:14:37 +0000
commit	6a917d3eda85e5725c2d5ad3bf5ec9bd30262198 (patch)
tree	5c9408962f01036119ebe29cd34b45bf951865bd /collaborativeagents/training/grpo/generate_grpo_data.py
parent	1956aed8bc8a72355adbe9f1d16ea678d67f214c (diff)

Rewrite reward section to describe keyword heuristic (matches experiments)

Replaced LLM-as-judge description with actual keyword-based system: - Reward: sentiment keyword matching + topic coherence via embedding similarity - Gating: separate retrieval-attribution heuristic using memory-query cosine similarity (g_t=0.9 retrieval fault, g_t=0.2 LLM fault, etc.) - No additional model needed (fast, no GPU) - REINFORCE update unchanged Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Diffstat (limited to 'collaborativeagents/training/grpo/generate_grpo_data.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: