summaryrefslogtreecommitdiff
BranchCommit messageAuthorAge
mainAdd 2-player PPO training log (500k episodes, 60.4% vs greedy)YurenHao0426101 min.
 
 
AgeCommit messageAuthor
101 min.Add 2-player PPO training log (500k episodes, 60.4% vs greedy)HEADmainYurenHao0426
4 hoursRaise entropy floor to 0.02, increase eval games to 2000haoyuren
5 hoursChange default eval_every from 10000 to 2500haoyuren
5 hoursUse auto-calibrated collect_batch in Colab notebookhaoyuren
5 hoursAdd training curve plots to Colab notebookhaoyuren
5 hoursAdd entropy annealing to escape greedy local minimum after warmuphaoyuren
5 hoursAuto-calibrate collect_batch when not specifiedhaoyuren
5 hoursFix total_mem → total_memory in Colab GPU checkhaoyuren
5 hoursFix invalid notebook cell schema (markdown with execution_count)haoyuren
5 hoursBatched game collection for ~7x training speeduphaoyuren
[...]