summaryrefslogtreecommitdiff
AgeCommit message (Expand)Author
9 hoursAdd 2-player PPO training log (500k episodes, 60.4% vs greedy)HEADmainYurenHao0426
11 hoursRaise entropy floor to 0.02, increase eval games to 2000haoyuren
12 hoursChange default eval_every from 10000 to 2500haoyuren
12 hoursUse auto-calibrated collect_batch in Colab notebookhaoyuren
12 hoursAdd training curve plots to Colab notebookhaoyuren
12 hoursAdd entropy annealing to escape greedy local minimum after warmuphaoyuren
12 hoursAuto-calibrate collect_batch when not specifiedhaoyuren
12 hoursFix total_mem → total_memory in Colab GPU checkhaoyuren
12 hoursFix invalid notebook cell schema (markdown with execution_count)haoyuren
12 hoursBatched game collection for ~7x training speeduphaoyuren
13 hoursUpdate README and Colab notebook for current rules and featureshaoyuren
13 hoursSeparate CPU collect / GPU train, add training CSV loghaoyuren
13 hoursFix SWAP inheritance, stalemate logic, add greedy warmuphaoyuren
21 hoursImprove versus UI: suit colors, AI highlighting, draw tellhaoyuren
21 hoursUpdate rules: free draw/pass, remove Q in 2-player gameshaoyuren
22 hoursAdd tqdm progress bar, fix Colab usernamehaoyuren
23 hoursAdd Colab GPU training notebookhaoyuren
23 hoursInitial commit: Blazing Eights RL agenthaoyuren