| Age | Commit message (Collapse) | Author |
|
- Game collection always on CPU, PPO update on GPU (avoids per-step transfer overhead)
- Log avg_len, loss, vs_greedy win rate to CSV every 10k episodes
- Add --eval_every flag for periodic evaluation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- SWAP now inherits previous card's suit/rank for matching
- Observation encodes effective top card when SWAP is on top
- Fix stalemate: only hard passes (can't draw) count, draw+pass resets
- Add behavioral cloning warmup: pre-train on greedy policy before PPO
- 2p win rate vs greedy random: 60.5%
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- Color-code suits: ♠blue ♥magenta ♦yellow ♣cyan
- AI actions highlighted in red
- Show whether AI has playable cards after drawing (observable tell)
- Fix pass prompt: show context-specific reason (无法出牌/不出牌/牌堆已空)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- Players can freely choose to draw even with playable cards
- After drawing, players may pass instead of playing
- Remove Q cards from deck in 2-player games (reverse has no effect)
- Use greedy random opponent in evaluation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
Clone → train on GPU → download or push model back to GitHub.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
|
- Game environment with draw-then-decide rule (no auto-play on draw)
- PPO self-play training script
- Interactive human vs AI game (versus.py)
- Real-time play assistant (play.py)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|