blazing8.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	haoyuren <13851610112@163.com>	2026-02-22 12:09:01 -0600
committer	haoyuren <13851610112@163.com>	2026-02-22 12:09:01 -0600
commit	0735c68037566ae6731ac5dd349329b1c8d44851 (patch)
tree	1adc41bdce029d627dc7b318a8dd379630325ec3 /blazing_env.py
parent	800e1f1f33d93cb7a1812dff1dc0ef85289ef075 (diff)

Add entropy annealing to escape greedy local minimum after warmup

After behavioral cloning warmup, policy is very peaked on greedy actions. Start with higher entropy coefficient (default: 5x ent_coef) and linearly decay to target, encouraging exploration of non-greedy strategies early in training. New arg: --ent_start (default: 5x --ent_coef) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Diffstat (limited to 'blazing_env.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: