From 72cf72d704ca1a3bf4e2a5e04dcbbad99dc0f98e Mon Sep 17 00:00:00 2001 From: haoyuren <13851610112@163.com> Date: Sun, 22 Feb 2026 01:48:03 -0600 Subject: Initial commit: Blazing Eights RL agent - Game environment with draw-then-decide rule (no auto-play on draw) - PPO self-play training script - Interactive human vs AI game (versus.py) - Real-time play assistant (play.py) Co-Authored-By: Claude Opus 4.6 --- README.md | 67 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 README.md (limited to 'README.md') diff --git a/README.md b/README.md new file mode 100644 index 0000000..711f793 --- /dev/null +++ b/README.md @@ -0,0 +1,67 @@ +# Blazing Eights — RL Agent + +Self-play PPO agent for the Blazing Eights card game. + +## Setup + +```bash +pip install torch numpy +``` + +## Files + +- `blazing_env.py` — Game environment (2-5 players) +- `train.py` — PPO self-play training +- `play.py` — Real-time play assistant (input game state, get best move) + +## Training + +```bash +# Train a 2-player agent (~10min on CPU for 100k episodes) +python train.py --num_players 2 --episodes 100000 + +# Train for 3 players (may need more episodes) +python train.py --num_players 3 --episodes 200000 + +# Custom hyperparams +python train.py --num_players 4 --episodes 300000 --lr 1e-4 --ent_coef 0.02 +``` + +Training saves checkpoints every 10k episodes and a final model. + +## Real-time Play Assistant + +After training, use the assistant during a real game: + +```bash +python play.py --model blazing_ppo_final.pt --num_players 3 +``` + +It will prompt you for: +1. Your hand (e.g., `8h,Ks,3d,SWAP`) +2. Top discard card (e.g., `6d`) +3. Active suit if an 8 was played +4. Direction (cw/ccw) +5. Other players' hand sizes +6. Approximate deck size + +Then shows ranked action recommendations with probabilities. + +## Game Rules + +- **56 cards**: standard 52 + 4 Swap cards +- **Match**: suit or rank of top card +- **8**: Wild — choose a suit for next player +- **K**: All other players draw 1 +- **Q**: Reverse direction (no effect in 2-player) +- **J**: Skip next player +- **Swap**: Swap your entire hand with next player (playable anytime, no match needed) +- **Can't play**: Draw 1, play it if legal +- **Win**: First to empty hand + +## Tips for Better Training + +1. **Train per player count** — the optimal policy differs significantly for 2 vs 5 players. +2. **Increase episodes for more players** — larger games have more variance, need more samples. +3. **Opponent modeling** — after self-play, you can fine-tune against specific opponent behaviors by replacing some players with heuristic bots that mimic your friends' tendencies. +4. **Curriculum** — start training with 2 players, then use the trained model to initialize training for 3+ players. -- cgit v1.2.3