blazing8.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
13 hours	Raise entropy floor to 0.02, increase eval games to 2000	haoyuren
	Prevents premature convergence with higher entropy minimum and reduces eval variance with 4x more evaluation games. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 hours	Change default eval_every from 10000 to 2500	haoyuren
	Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 hours	Use auto-calibrated collect_batch in Colab notebook	haoyuren
	Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 hours	Add training curve plots to Colab notebook	haoyuren
	Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 hours	Add entropy annealing to escape greedy local minimum after warmup	haoyuren
	After behavioral cloning warmup, policy is very peaked on greedy actions. Start with higher entropy coefficient (default: 5x ent_coef) and linearly decay to target, encouraging exploration of non-greedy strategies early in training. New arg: --ent_start (default: 5x --ent_coef) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 hours	Auto-calibrate collect_batch when not specified	haoyuren
	Benchmarks batch sizes [64,128,256,512] and picks smallest within 10% of peak throughput. Smaller batches = more frequent PPO updates = better training quality at similar speed. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 hours	Fix total_mem → total_memory in Colab GPU check	haoyuren
	Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 hours	Fix invalid notebook cell schema (markdown with execution_count)	haoyuren
	Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 hours	Batched game collection for ~7x training speedup	haoyuren
	- collect_games_batch(): run N games in parallel with single batched forward pass per step - evaluate_vs_greedy_batch(): batched evaluation replacing sequential eval - Add --collect_batch CLI arg for configurable parallel game count - Use torch.inference_mode() for faster collection - Update Colab notebook: GPU info, --collect_batch, log download cell Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
14 hours	Update README and Colab notebook for current rules and features	haoyuren
	- README: document current game rules (SWAP inheritance, free draw, Q removal) - README: add versus.py usage, training features (warmup, CSV log, CPU/GPU) - Colab: update training commands, add log display, fix eval device Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
15 hours	Separate CPU collect / GPU train, add training CSV log	haoyuren
	- Game collection always on CPU, PPO update on GPU (avoids per-step transfer overhead) - Log avg_len, loss, vs_greedy win rate to CSV every 10k episodes - Add --eval_every flag for periodic evaluation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
15 hours	Fix SWAP inheritance, stalemate logic, add greedy warmup	haoyuren
	- SWAP now inherits previous card's suit/rank for matching - Observation encodes effective top card when SWAP is on top - Fix stalemate: only hard passes (can't draw) count, draw+pass resets - Add behavioral cloning warmup: pre-train on greedy policy before PPO - 2p win rate vs greedy random: 60.5% Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
23 hours	Improve versus UI: suit colors, AI highlighting, draw tell	haoyuren
	- Color-code suits: ♠blue ♥magenta ♦yellow ♣cyan - AI actions highlighted in red - Show whether AI has playable cards after drawing (observable tell) - Fix pass prompt: show context-specific reason (无法出牌/不出牌/牌堆已空) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
23 hours	Update rules: free draw/pass, remove Q in 2-player games	haoyuren
	- Players can freely choose to draw even with playable cards - After drawing, players may pass instead of playing - Remove Q cards from deck in 2-player games (reverse has no effect) - Use greedy random opponent in evaluation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
24 hours	Add tqdm progress bar, fix Colab username	haoyuren
	Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
24 hours	Add Colab GPU training notebook	haoyuren
	Clone → train on GPU → download or push model back to GitHub. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
24 hours	Initial commit: Blazing Eights RL agent	haoyuren
	- Game environment with draw-then-decide rule (no auto-play on draw) - PPO self-play training script - Interactive human vs AI game (versus.py) - Real-time play assistant (play.py) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>