summaryrefslogtreecommitdiff
path: root/records/track_10min_16mb/2026-03-17_NaiveBaseline/README.md
diff options
context:
space:
mode:
authorWill DePue <williamd@openai.com>2026-03-18 09:32:01 -0700
committerWill DePue <williamd@openai.com>2026-03-18 09:32:01 -0700
commita15093adad328a650d421e53c078cbd2c45beb0e (patch)
treee054c4bde12b89e6d3b39d611d9caadabc7f7234 /records/track_10min_16mb/2026-03-17_NaiveBaseline/README.md
Launch snapshot
Diffstat (limited to 'records/track_10min_16mb/2026-03-17_NaiveBaseline/README.md')
-rw-r--r--records/track_10min_16mb/2026-03-17_NaiveBaseline/README.md46
1 files changed, 46 insertions, 0 deletions
diff --git a/records/track_10min_16mb/2026-03-17_NaiveBaseline/README.md b/records/track_10min_16mb/2026-03-17_NaiveBaseline/README.md
new file mode 100644
index 0000000..1eb352a
--- /dev/null
+++ b/records/track_10min_16mb/2026-03-17_NaiveBaseline/README.md
@@ -0,0 +1,46 @@
+This record captures the `Simple Baseline`.
+
+Trainer changes in this snapshot:
+- current repository `train_gpt.py` snapshot copied into the record folder
+- published `fineweb10B_sp1024` dataset and tokenizer loaded from the new Hugging Face export
+- 10-minute wallclock cap on `8xH100`
+- periodic validation every `200` steps on the full `fineweb_val_*` split
+
+Configuration:
+- Layout: `VOCAB_SIZE=1024 NUM_LAYERS=9 MODEL_DIM=512 NUM_HEADS=8 NUM_KV_HEADS=4 MLP_MULT=2`
+- Tied output/input embeddings: `TIE_EMBEDDINGS=1`
+- Tied embedding LR: `TIED_EMBED_LR=0.05`
+- Batching: `TRAIN_BATCH_TOKENS=524288 TRAIN_SEQ_LEN=1024`
+
+Command (track-relevant params):
+```bash
+NCCL_IB_DISABLE=1 \
+RUN_ID=hf_verify_sp1024_8gpu \
+DATA_PATH=/root/code/parameter-golf/data/datasets/fineweb10B_sp1024 \
+TOKENIZER_PATH=/root/code/parameter-golf/data/tokenizers/fineweb_1024_bpe.model \
+VOCAB_SIZE=1024 \
+MAX_WALLCLOCK_SECONDS=600 \
+TRAIN_LOG_EVERY=50 \
+VAL_LOSS_EVERY=200 \
+torchrun --standalone --nproc_per_node=8 /root/code/parameter-golf/train_gpt.py
+```
+
+Key metrics (from `train.log`):
+- Timed training stopped at `13780/20000` steps due to the wallclock cap.
+- Pre-quant eval at stop: `val_loss:2.0606`, `val_bpb:1.2172`
+- Post-quant roundtrip eval: `val_loss:2.0727`, `val_bpb:1.2244`
+- Exact printed metric: `final_int8_zlib_roundtrip_exact val_bpb:1.22436570`
+- Train time: `600038ms` (`step_avg:43.54ms`)
+- Peak memory: `10184 MiB allocated`, `10200 MiB reserved`
+- Serialized model int8+zlib: `15815847 bytes`
+- Code size: `47642 bytes`
+- Total submission size int8+zlib: `15863489 bytes`
+
+Training volume:
+- Global batch: `524288` tokens/step
+- Total train tokens seen: `7224688640`
+
+Included files:
+- `train_gpt.py` (code snapshot used for the run)
+- `train.log` (exact remote training log)
+- `submission.json` (leaderboard metadata)