diff options
| author | notapplica <yadunanll@gmail.com> | 2026-03-19 14:13:10 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2026-03-19 14:13:10 -0700 |
| commit | 9fbdf8c949a909c8701857e379004fe9e11098c2 (patch) | |
| tree | e6678e7d8093720044ff2211764a8d9e4f76543d /records/track_10min_16mb/2026-03-19_WarmdownQuantization/submission.json | |
| parent | ce6cf9ac8589991ac7c4140f6f9fc09b0ae3817a (diff) | |
Record: Sliding Window + FP16 Embed + 10L + Muon WD + Overtone Init (val_bpb=1.1748) (#60)
* Add NTK Eval + Overtone Init submission (1.2160 BPB)
Train@1024 with overtone embedding init and phase-transition residual
mixing, eval@2048 with NTK-aware dynamic RoPE scaling. Mean val_bpb
1.2160 across 3 seeds (p=0.0012 for 0.0194-nat improvement over baseline).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Update submission: Muon WD + NTK Eval + Overtone Init (1.2094 BPB, p=0.0002)
* Update submission: 10-Layer + Muon WD + NTK Eval + Overtone Init (1.2029 BPB, p=0.0006)
* Update submission: FP16 Embed + 10L + Muon WD + NTK + Overtone (1.2008 BPB)
* Update submission: 1.2000 BPB — FP16 Embed + 10L + Muon WD + NTK@1408 + Overtone
* Update: 1.1748 BPB — Sliding Window + FP16 Embed + 10L + Muon WD + Overtone
---------
Co-authored-by: notapplica <notapplica@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'records/track_10min_16mb/2026-03-19_WarmdownQuantization/submission.json')
0 files changed, 0 insertions, 0 deletions
