diff options
| author | haoyuren <13851610112@163.com> | 2026-02-22 11:28:45 -0600 |
|---|---|---|
| committer | haoyuren <13851610112@163.com> | 2026-02-22 11:28:45 -0600 |
| commit | 3887054e02e622ca2cb7878bc0dec63d28c7f223 (patch) | |
| tree | 1a341f7562abb41cfc25badde73879a4e914b1ee /.gitignore | |
| parent | 1cb5eb34ead9b4efc1032ec74c6ccc439f007c18 (diff) | |
Fix SWAP inheritance, stalemate logic, add greedy warmup
- SWAP now inherits previous card's suit/rank for matching
- Observation encodes effective top card when SWAP is on top
- Fix stalemate: only hard passes (can't draw) count, draw+pass resets
- Add behavioral cloning warmup: pre-train on greedy policy before PPO
- 2p win rate vs greedy random: 60.5%
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Diffstat (limited to '.gitignore')
0 files changed, 0 insertions, 0 deletions
