diff options
| author | haoyuren <13851610112@163.com> | 2026-02-22 12:09:01 -0600 |
|---|---|---|
| committer | haoyuren <13851610112@163.com> | 2026-02-22 12:09:01 -0600 |
| commit | 0735c68037566ae6731ac5dd349329b1c8d44851 (patch) | |
| tree | 1adc41bdce029d627dc7b318a8dd379630325ec3 /train_colab.ipynb | |
| parent | 800e1f1f33d93cb7a1812dff1dc0ef85289ef075 (diff) | |
Add entropy annealing to escape greedy local minimum after warmup
After behavioral cloning warmup, policy is very peaked on greedy
actions. Start with higher entropy coefficient (default: 5x ent_coef)
and linearly decay to target, encouraging exploration of non-greedy
strategies early in training.
New arg: --ent_start (default: 5x --ent_coef)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diffstat (limited to 'train_colab.ipynb')
0 files changed, 0 insertions, 0 deletions
