blazing8.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	haoyuren <13851610112@163.com>	2026-02-22 12:09:01 -0600
committer	haoyuren <13851610112@163.com>	2026-02-22 12:09:01 -0600
commit	0735c68037566ae6731ac5dd349329b1c8d44851 (patch)
tree	1adc41bdce029d627dc7b318a8dd379630325ec3 /train_colab.ipynb
parent	800e1f1f33d93cb7a1812dff1dc0ef85289ef075 (diff)

Add entropy annealing to escape greedy local minimum after warmup

After behavioral cloning warmup, policy is very peaked on greedy actions. Start with higher entropy coefficient (default: 5x ent_coef) and linearly decay to target, encouraging exploration of non-greedy strategies early in training. New arg: --ent_start (default: 5x --ent_coef) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Diffstat (limited to 'train_colab.ipynb')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: