summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
102 min.Add 2-player PPO training log (500k episodes, 60.4% vs greedy)HEADmainYurenHao0426
4 hoursRaise entropy floor to 0.02, increase eval games to 2000haoyuren
5 hoursChange default eval_every from 10000 to 2500haoyuren
5 hoursUse auto-calibrated collect_batch in Colab notebookhaoyuren
5 hoursAdd training curve plots to Colab notebookhaoyuren
5 hoursAdd entropy annealing to escape greedy local minimum after warmuphaoyuren
5 hoursAuto-calibrate collect_batch when not specifiedhaoyuren
5 hoursFix total_mem → total_memory in Colab GPU checkhaoyuren
5 hoursFix invalid notebook cell schema (markdown with execution_count)haoyuren
5 hoursBatched game collection for ~7x training speeduphaoyuren
6 hoursUpdate README and Colab notebook for current rules and featureshaoyuren
6 hoursSeparate CPU collect / GPU train, add training CSV loghaoyuren
6 hoursFix SWAP inheritance, stalemate logic, add greedy warmuphaoyuren
14 hoursImprove versus UI: suit colors, AI highlighting, draw tellhaoyuren
14 hoursUpdate rules: free draw/pass, remove Q in 2-player gameshaoyuren
15 hoursAdd tqdm progress bar, fix Colab usernamehaoyuren
15 hoursAdd Colab GPU training notebookhaoyuren
15 hoursInitial commit: Blazing Eights RL agenthaoyuren