blazing8.git, branch main

Add 2-player PPO training log (500k episodes, 60.4% vs greedy)

2026-02-22T21:32:53+00:00

Co-Authored-By: Claude Opus 4.6

Raise entropy floor to 0.02, increase eval games to 2000

2026-02-22T18:55:03+00:00

Prevents premature convergence with higher entropy minimum and
reduces eval variance with 4x more evaluation games.

Co-Authored-By: Claude Opus 4.6

Change default eval_every from 10000 to 2500

2026-02-22T18:19:57+00:00

Co-Authored-By: Claude Opus 4.6

Use auto-calibrated collect_batch in Colab notebook

2026-02-22T18:17:41+00:00

Co-Authored-By: Claude Opus 4.6

Add training curve plots to Colab notebook

2026-02-22T18:14:42+00:00

Co-Authored-By: Claude Opus 4.6

Add entropy annealing to escape greedy local minimum after warmup

2026-02-22T18:09:01+00:00

After behavioral cloning warmup, policy is very peaked on greedy
actions. Start with higher entropy coefficient (default: 5x ent_coef)
and linearly decay to target, encouraging exploration of non-greedy
strategies early in training.

New arg: --ent_start (default: 5x --ent_coef)

Co-Authored-By: Claude Opus 4.6

Auto-calibrate collect_batch when not specified

2026-02-22T18:06:23+00:00

Benchmarks batch sizes [64,128,256,512] and picks smallest
within 10% of peak throughput. Smaller batches = more frequent
PPO updates = better training quality at similar speed.

Co-Authored-By: Claude Opus 4.6

Fix total_mem → total_memory in Colab GPU check

2026-02-22T18:01:17+00:00

Co-Authored-By: Claude Opus 4.6

Fix invalid notebook cell schema (markdown with execution_count)

2026-02-22T17:59:01+00:00

Co-Authored-By: Claude Opus 4.6

Batched game collection for ~7x training speedup

2026-02-22T17:56:48+00:00

- collect_games_batch(): run N games in parallel with single batched forward pass per step
- evaluate_vs_greedy_batch(): batched evaluation replacing sequential eval
- Add --collect_batch CLI arg for configurable parallel game count
- Use torch.inference_mode() for faster collection
- Update Colab notebook: GPU info, --collect_batch, log download cell

Co-Authored-By: Claude Opus 4.6