summaryrefslogtreecommitdiff
path: root/blazing_env.py
diff options
context:
space:
mode:
authorhaoyuren <13851610112@163.com>2026-02-22 12:09:01 -0600
committerhaoyuren <13851610112@163.com>2026-02-22 12:09:01 -0600
commit0735c68037566ae6731ac5dd349329b1c8d44851 (patch)
tree1adc41bdce029d627dc7b318a8dd379630325ec3 /blazing_env.py
parent800e1f1f33d93cb7a1812dff1dc0ef85289ef075 (diff)
Add entropy annealing to escape greedy local minimum after warmup
After behavioral cloning warmup, policy is very peaked on greedy actions. Start with higher entropy coefficient (default: 5x ent_coef) and linearly decay to target, encouraging exploration of non-greedy strategies early in training. New arg: --ent_start (default: 5x --ent_coef) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diffstat (limited to 'blazing_env.py')
0 files changed, 0 insertions, 0 deletions