| Age | Commit message (Collapse) | Author |
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- collect_games_batch(): run N games in parallel with single batched forward pass per step
- evaluate_vs_greedy_batch(): batched evaluation replacing sequential eval
- Add --collect_batch CLI arg for configurable parallel game count
- Use torch.inference_mode() for faster collection
- Update Colab notebook: GPU info, --collect_batch, log download cell
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- README: document current game rules (SWAP inheritance, free draw, Q removal)
- README: add versus.py usage, training features (warmup, CSV log, CPU/GPU)
- Colab: update training commands, add log display, fix eval device
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- Game collection always on CPU, PPO update on GPU (avoids per-step transfer overhead)
- Log avg_len, loss, vs_greedy win rate to CSV every 10k episodes
- Add --eval_every flag for periodic evaluation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- SWAP now inherits previous card's suit/rank for matching
- Observation encodes effective top card when SWAP is on top
- Fix stalemate: only hard passes (can't draw) count, draw+pass resets
- Add behavioral cloning warmup: pre-train on greedy policy before PPO
- 2p win rate vs greedy random: 60.5%
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- Color-code suits: ♠blue ♥magenta ♦yellow ♣cyan
- AI actions highlighted in red
- Show whether AI has playable cards after drawing (observable tell)
- Fix pass prompt: show context-specific reason (无法出牌/不出牌/牌堆已空)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- Players can freely choose to draw even with playable cards
- After drawing, players may pass instead of playing
- Remove Q cards from deck in 2-player games (reverse has no effect)
- Use greedy random opponent in evaluation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Clone → train on GPU → download or push model back to GitHub.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- Game environment with draw-then-decide rule (no auto-play on draw)
- PPO self-play training script
- Interactive human vs AI game (versus.py)
- Real-time play assistant (play.py)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|