=== Reward Model Comparison Test === Local: Qwen2.5-1.5B-Instruct API: GPT-5-nano ================================================================================ Reward Model Comparison: Qwen2.5-1.5B vs GPT-5-nano ================================================================================ Loading models/qwen2.5-1.5b-instruct... Model loaded. Running 12 test cases... --- Test 1/12: neg_constraint_restate - format preference --- Expected: neg_constraint_restate === Test Complete ===