SOTA attempt (val_bpb=1.2064) (#49)

* SOTA attempt * Improve score on SXM --------- Co-authored-by: spokane-way <spokane@way>
author: spokane-way <marthaludwigsdottir@gmail.com> 2026-03-19 10:25:29 -0700
committer: GitHub <noreply@github.com> 2026-03-19 10:25:29 -0700
commit: e89fcf8acf8e9fd3bf63e9809c160a4e510be61b (patch)
tree: 95e225397706a0d4685663a4fc3634e3cf680c9a /records/track_10min_16mb/2026-03-18_LongContextSeq2048/README.md
parent: 6e3e90db8696d5f8294a433b70268a1dca72aab5 (diff)
1 files changed, 65 insertions, 0 deletions
diff --git a/records/track_10min_16mb/2026-03-18_LongContextSeq2048/README.md b/records/track_10min_16mb/2026-03-18_LongContextSeq2048/README.md
new file mode 100644
index 0000000..735cc0c
--- /dev/null
+++ b/records/track_10min_16mb/2026-03-18_LongContextSeq2048/README.md
@@ -0,0 +1,65 @@
+This record submission is called `Long Context Seq2048 v2`.
+
+Configuration:
+- Layout: `VOCAB_SIZE=1024 NUM_LAYERS=9 MODEL_DIM=512 NUM_HEADS=8 NUM_KV_HEADS=4 MLP_MULT=2`
+- Tied output/input embeddings: `TIE_EMBEDDINGS=1`
+- Sequence length: `TRAIN_SEQ_LEN=2048`
+- Batching: `TRAIN_BATCH_TOKENS=524288`
+- Learning rates: `TIED_EMBED_LR=0.04 MATRIX_LR=0.032 SCALAR_LR=0.032`
+
+Command:
+```bash
+NCCL_IB_DISABLE=1 \
+RUN_ID=seq2048_sxm28_full_20260319a \
+DATA_PATH=./data/datasets/fineweb10B_sp1024 \
+TOKENIZER_PATH=./data/tokenizers/fineweb_1024_bpe.model \
+VOCAB_SIZE=1024 \
+MAX_WALLCLOCK_SECONDS=600 \
+torchrun --standalone --nproc_per_node=8 \
+  records/track_10min_16mb/2026-03-18_LongContextSeq2048/train_gpt.py
+```
+
+Verification environment:
+- `8x H100 80GB HBM3`
+- all-to-all `NV18` topology
+- `torch 2.8.0+cu128`
+
+Key metrics (from `train.log` in this folder, rerun on the target SXM-class box):
+- Timed training stopped at `11564/20000` steps due to the wallclock cap.
+- Pre-quant eval at stop: `val_loss:2.0269`, `val_bpb:1.2005`
+- Post-quant roundtrip eval: `val_loss:2.0359`, `val_bpb:1.2058`
+- Exact printed metric: `final_int8_zlib_roundtrip_exact val_bpb:1.20576485`
+- Train time: `600038ms` (`step_avg:51.89ms`)
+- Peak memory: `10247 MiB allocated`, `10488 MiB reserved`
+- Serialized model int8+zlib: `15819554 bytes`
+- Code size for this standalone record script: `47716 bytes`
+- Total submission size int8+zlib: `15867270 bytes`
+
+Additional full-run reproducibility logs included in this folder:
+- `train.log`: canonical SXM rerun, `SEED=1337`, `val_bpb=1.20576485`
+- `train_seed1338.log`: SXM rerun, `SEED=1338`, `val_bpb=1.20617460`
+- `train_seed1339.log`: SXM rerun, `SEED=1339`, `val_bpb=1.20715923`
+
+Record-track significance note:
+- The public repo state for this submission has `Naive Baseline` at `1.2243657`.
+- The challenge therefore requires beating `1.2193657` to claim a new record.
+- All three included SXM full runs clear that threshold:
+  - `SEED=1337`: `1.20576485`
+  - `SEED=1338`: `1.20617460`
+  - `SEED=1339`: `1.20715923`
+- Sample mean across the three runs: `1.20636623`
+- Sample standard deviation: `0.00071667`
+- One-sided one-sample t-test against `1.2193657`: `t=31.42` with `df=2`, which gives `p=0.00051`
+
+Why this folder is standalone:
+- `train_gpt.py` compiles from inside this record folder and was used for the canonical rerun whose output is saved as `train.log`.
+- No extra Python source files are required for the training path.
+- The only inputs expected at runtime are the cached dataset and tokenizer paths described in the main repo README.
+
+Included files:
+- `train_gpt.py` (standalone winning recipe with defaults baked in)
+- `README.md` (this file)
+- `submission.json` (leaderboard metadata)
+- `train.log` (canonical full log from the standalone record script)
+- `train_seed1338.log`, `train_seed1339.log` (extra full reruns for reproducibility)
+- `logs/seq2048_sxm28_*` (raw per-run tee output and trainer text logs from the SXM verification box)
author	spokane-way <marthaludwigsdottir@gmail.com>	2026-03-19 10:25:29 -0700
committer	GitHub <noreply@github.com>	2026-03-19 10:25:29 -0700
commit	e89fcf8acf8e9fd3bf63e9809c160a4e510be61b (patch)
tree	95e225397706a0d4685663a4fc3634e3cf680c9a /records/track_10min_16mb/2026-03-18_LongContextSeq2048/README.md
parent	6e3e90db8696d5f8294a433b70268a1dca72aab5 (diff)