summaryrefslogtreecommitdiff
path: root/records/track_10min_16mb/2026-03-19_TrainingOptSeq4096
diff options
context:
space:
mode:
authornotapplica <yadunanll@gmail.com>2026-03-19 14:13:10 -0700
committerGitHub <noreply@github.com>2026-03-19 14:13:10 -0700
commit9fbdf8c949a909c8701857e379004fe9e11098c2 (patch)
treee6678e7d8093720044ff2211764a8d9e4f76543d /records/track_10min_16mb/2026-03-19_TrainingOptSeq4096
parentce6cf9ac8589991ac7c4140f6f9fc09b0ae3817a (diff)
Record: Sliding Window + FP16 Embed + 10L + Muon WD + Overtone Init (val_bpb=1.1748) (#60)
* Add NTK Eval + Overtone Init submission (1.2160 BPB) Train@1024 with overtone embedding init and phase-transition residual mixing, eval@2048 with NTK-aware dynamic RoPE scaling. Mean val_bpb 1.2160 across 3 seeds (p=0.0012 for 0.0194-nat improvement over baseline). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update submission: Muon WD + NTK Eval + Overtone Init (1.2094 BPB, p=0.0002) * Update submission: 10-Layer + Muon WD + NTK Eval + Overtone Init (1.2029 BPB, p=0.0006) * Update submission: FP16 Embed + 10L + Muon WD + NTK + Overtone (1.2008 BPB) * Update submission: 1.2000 BPB — FP16 Embed + 10L + Muon WD + NTK@1408 + Overtone * Update: 1.1748 BPB — Sliding Window + FP16 Embed + 10L + Muon WD + Overtone --------- Co-authored-by: notapplica <notapplica@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Diffstat (limited to 'records/track_10min_16mb/2026-03-19_TrainingOptSeq4096')
0 files changed, 0 insertions, 0 deletions