{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "accelerator": "GPU" }, "cells": [ { "cell_type": "markdown", "metadata": {}, "source": "# Blazing Eights - Colab GPU Training\n\nClone repo → Train PPO agent (batched collection on CPU, PPO on GPU) → Download model & logs\n\n**Game**: UNO variant with custom special cards (8=Wild, K=All draw, J=Skip, Swap=Swap hands)." }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Setup: Clone repo & install deps" ] }, { "cell_type": "code", "metadata": {}, "source": "# ====== CONFIG ======\nGITHUB_USERNAME = \"YurenHao0426\"\nREPO_NAME = \"blazing8\"\n# ====================\n\n!git clone https://github.com/{GITHUB_USERNAME}/{REPO_NAME}.git\n%cd {REPO_NAME}\n!pip install -q torch numpy tqdm", "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": {}, "source": "import torch\nprint(f\"PyTorch: {torch.__version__}\")\nprint(f\"CUDA available: {torch.cuda.is_available()}\")\nif torch.cuda.is_available():\n print(f\"GPU: {torch.cuda.get_device_name(0)}\")\n print(f\"Memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB\")\nelse:\n print(\"WARNING: No GPU detected. Go to Runtime → Change runtime type → GPU\")", "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": "## 2. Train\n\nBatched collection: runs many games in parallel with a single forward pass per step.\n- `--collect_batch`: number of parallel games (higher = faster, more VRAM). Default = 64.\n- Game simulation on CPU, PPO gradient updates on GPU (auto-detected)." }, { "cell_type": "code", "metadata": {}, "source": "# 2-player training (GPU PPO + batched collection)\n!python train.py \\\n --num_players 2 \\\n --episodes 200000 \\\n --collect_batch 128 \\\n --save_path blazing_ppo_2p", "execution_count": null, "outputs": [] }, { "cell_type": "code", "metadata": {}, "source": "# (Optional) 3-player training\n# !python train.py --num_players 3 --episodes 300000 --collect_batch 128 --save_path blazing_ppo_3p\n\n# (Optional) Larger batch for faster throughput\n# !python train.py --num_players 2 --episodes 200000 --collect_batch 256 --save_path blazing_ppo_2p\n\n# (Optional) Skip greedy warmup\n# !python train.py --num_players 2 --episodes 200000 --greedy_warmup 0 --save_path blazing_ppo_2p_no_warmup", "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": "# Show training log\nimport pandas as pd\ndf = pd.read_csv(\"blazing_ppo_2p_log.csv\")\nprint(df.to_string(index=False))", "metadata": {}, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": "## 3. Download model & logs", "metadata": {} }, { "cell_type": "code", "metadata": {}, "source": "from google.colab import files\nimport glob\n\n# Download final model(s)\nfor f in glob.glob(\"*_final.pt\"):\n print(f\"Downloading {f}...\")\n files.download(f)\n\n# Download training log(s)\nfor f in glob.glob(\"*_log.csv\"):\n print(f\"Downloading {f}...\")\n files.download(f)", "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Push model to GitHub (Option B)\n", "\n", "Push trained .pt files to a `models/` directory in the repo.\n", "\n", "You'll need a **GitHub Personal Access Token** (PAT).\n", "Create one at: https://github.com/settings/tokens → Generate new token (classic) → check `repo` scope." ] }, { "cell_type": "code", "metadata": {}, "source": [ "from getpass import getpass\n", "import os\n", "\n", "TOKEN = getpass(\"Enter your GitHub PAT: \")\n", "\n", "# Configure git\n", "!git config user.email \"colab@training.ai\"\n", "!git config user.name \"Colab Training\"\n", "\n", "# Create models dir, move .pt files there\n", "os.makedirs(\"models\", exist_ok=True)\n", "!mv *_final.pt models/\n", "!ls -lh models/\n", "\n", "# Remove .pt from gitignore temporarily for models/ dir\n", "with open(\".gitignore\", \"r\") as f:\n", " lines = f.readlines()\n", "with open(\".gitignore\", \"w\") as f:\n", " for line in lines:\n", " f.write(line)\n", " f.write(\"\\n# Allow models dir\\n!models/\\n!models/*.pt\\n\")\n", "\n", "!git add models/ .gitignore\n", "!git commit -m \"Add trained models from Colab GPU\"\n", "!git push https://{TOKEN}@github.com/{GITHUB_USERNAME}/{REPO_NAME}.git main" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": "## 5. Quick evaluation", "metadata": {} }, { "cell_type": "code", "metadata": {}, "source": "import sys\nsys.path.insert(0, \".\")\nfrom train import PolicyValueNet, evaluate_vs_greedy_batch\n\ndevice = \"cpu\"\nmodel = PolicyValueNet().to(device)\n\nimport glob\nfinal_models = glob.glob(\"*_final.pt\") + glob.glob(\"models/*_final.pt\")\nif final_models:\n ckpt = torch.load(final_models[0], map_location=device, weights_only=True)\n model.load_state_dict(ckpt[\"model\"])\n model.eval()\n print(f\"Loaded: {final_models[0]}\")\n print(f\"Trained for {ckpt.get('episode', '?')} episodes\")\n print()\n\n for n in [2, 3, 4]:\n wr = evaluate_vs_greedy_batch(model, num_players=n, num_games=2000, device=device)\n print(f\" {n} players: win rate = {wr:.1%} (random baseline: {1/n:.1%})\")\nelse:\n print(\"No model found. Train first!\")", "execution_count": null, "outputs": [] } ] }