diff options
| author | Will DePue <williamd@openai.com> | 2026-03-18 09:32:01 -0700 |
|---|---|---|
| committer | Will DePue <williamd@openai.com> | 2026-03-18 09:32:01 -0700 |
| commit | a15093adad328a650d421e53c078cbd2c45beb0e (patch) | |
| tree | e054c4bde12b89e6d3b39d611d9caadabc7f7234 /records/track_10min_16mb/2026-03-17_NaiveBaseline/train.log | |
Launch snapshot
Diffstat (limited to 'records/track_10min_16mb/2026-03-17_NaiveBaseline/train.log')
| -rw-r--r-- | records/track_10min_16mb/2026-03-17_NaiveBaseline/train.log | 448 |
1 files changed, 448 insertions, 0 deletions
diff --git a/records/track_10min_16mb/2026-03-17_NaiveBaseline/train.log b/records/track_10min_16mb/2026-03-17_NaiveBaseline/train.log new file mode 100644 index 0000000..69b17b6 --- /dev/null +++ b/records/track_10min_16mb/2026-03-17_NaiveBaseline/train.log @@ -0,0 +1,448 @@ +W0318 14:37:59.159000 871689 site-packages/torch/distributed/run.py:852] +W0318 14:37:59.159000 871689 site-packages/torch/distributed/run.py:852] ***************************************** +W0318 14:37:59.159000 871689 site-packages/torch/distributed/run.py:852] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +W0318 14:37:59.159000 871689 site-packages/torch/distributed/run.py:852] ***************************************** +[W318 14:38:11.514156940 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +[W318 14:38:11.543417305 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +[W318 14:38:11.552597211 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +NCCL version 2.27.5+cuda12.9 +[W318 14:38:11.832390267 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +[W318 14:38:11.842257581 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +[W318 14:38:11.842253680 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +[W318 14:38:11.899166383 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +[W318 14:38:11.901800020 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) + +[2026-03-18 14:38:12] pgut1-0:871784:871848 [5] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed + +[2026-03-18 14:38:12] pgut1-0:871784:871848 [5] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0 + +[2026-03-18 14:38:12] pgut1-0:871786:871849 [7] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed + +[2026-03-18 14:38:12] pgut1-0:871786:871849 [7] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0 + +[2026-03-18 14:38:12] pgut1-0:871779:871850 [0] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed + +[2026-03-18 14:38:12] pgut1-0:871779:871850 [0] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0 + +[2026-03-18 14:38:12] pgut1-0:871780:871857 [1] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed + +[2026-03-18 14:38:12] pgut1-0:871780:871857 [1] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0 + +[2026-03-18 14:38:12] pgut1-0:871781:871858 [2] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed + +[2026-03-18 14:38:12] pgut1-0:871781:871858 [2] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0 + +[2026-03-18 14:38:12] pgut1-0:871783:871859 [4] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed + +[2026-03-18 14:38:12] pgut1-0:871783:871859 [4] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0 + +[2026-03-18 14:38:12] pgut1-0:871782:871864 [3] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed + +[2026-03-18 14:38:12] pgut1-0:871782:871864 [3] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0 + +[2026-03-18 14:38:12] pgut1-0:871785:871865 [6] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed + +[2026-03-18 14:38:12] pgut1-0:871785:871865 [6] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0 +logs/hf_verify_sp1024_8gpu.txt +val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=/root/code/parameter-golf/data/tokenizers/fineweb_1024_bpe.model +train_loader:dataset:fineweb10B_sp1024 train_shards:25 +val_loader:shards pattern=/root/code/parameter-golf/data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:63779840 +[rank0]:[W318 14:38:18.833454927 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +model_params:17059912 +world_size:8 grad_accum_steps:1 +sdp_backends:cudnn=False flash=True mem_efficient=False math=False +attention_mode:gqa num_heads:8 num_kv_heads:4 +tie_embeddings:True embed_lr:0.05 head_lr:0.0 matrix_lr:0.04 scalar_lr:0.04 +train_batch_tokens:524288 train_seq_len:1024 iterations:20000 warmup_steps:20 max_wallclock_seconds:600.000 +seed:1337 +[rank3]:[W318 14:38:18.835915381 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +[rank7]:[W318 14:38:18.835951425 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +[rank6]:[W318 14:38:18.835967008 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +[rank2]:[W318 14:38:18.836023454 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +[rank5]:[W318 14:38:18.836119632 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +[rank4]:[W318 14:38:18.836127772 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +[rank1]:[W318 14:38:18.836354967 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator()) +warmup_step:1/20 +warmup_step:2/20 +warmup_step:3/20 +warmup_step:4/20 +warmup_step:5/20 +warmup_step:6/20 +warmup_step:7/20 +warmup_step:8/20 +warmup_step:9/20 +warmup_step:10/20 +warmup_step:11/20 +warmup_step:12/20 +warmup_step:13/20 +warmup_step:14/20 +warmup_step:15/20 +warmup_step:16/20 +warmup_step:17/20 +warmup_step:18/20 +warmup_step:19/20 +warmup_step:20/20 +step:0/20000 val_loss:6.9370 val_bpb:4.0978 train_time:0ms step_avg:0.01ms +step:1/20000 train_loss:6.9408 train_time:24ms step_avg:23.99ms +step:2/20000 train_loss:16.8763 train_time:67ms step_avg:33.39ms +step:3/20000 train_loss:9.0044 train_time:110ms step_avg:36.62ms +step:4/20000 train_loss:6.5686 train_time:152ms step_avg:37.99ms +step:5/20000 train_loss:6.6665 train_time:195ms step_avg:38.97ms +step:6/20000 train_loss:6.5027 train_time:239ms step_avg:39.81ms +step:7/20000 train_loss:6.2808 train_time:280ms step_avg:40.05ms +step:8/20000 train_loss:5.9951 train_time:324ms step_avg:40.52ms +step:9/20000 train_loss:6.0187 train_time:367ms step_avg:40.77ms +step:10/20000 train_loss:5.9718 train_time:409ms step_avg:40.93ms +step:50/20000 train_loss:3.9508 train_time:2126ms step_avg:42.52ms +step:100/20000 train_loss:3.3373 train_time:4267ms step_avg:42.67ms +step:150/20000 train_loss:2.9651 train_time:6414ms step_avg:42.76ms +step:200/20000 train_loss:2.8041 train_time:8677ms step_avg:43.38ms +step:200/20000 val_loss:2.8397 val_bpb:1.6774 train_time:8699ms step_avg:43.49ms +step:250/20000 train_loss:2.7379 train_time:10816ms step_avg:43.27ms +step:300/20000 train_loss:2.6613 train_time:12958ms step_avg:43.19ms +step:350/20000 train_loss:2.6434 train_time:15097ms step_avg:43.13ms +step:400/20000 train_loss:2.7684 train_time:17357ms step_avg:43.39ms +step:400/20000 val_loss:2.5687 val_bpb:1.5174 train_time:17382ms step_avg:43.45ms +step:450/20000 train_loss:2.6035 train_time:19502ms step_avg:43.34ms +step:500/20000 train_loss:2.5265 train_time:21643ms step_avg:43.29ms +step:550/20000 train_loss:2.4803 train_time:23782ms step_avg:43.24ms +step:600/20000 train_loss:2.4731 train_time:26034ms step_avg:43.39ms +step:600/20000 val_loss:2.4456 val_bpb:1.4447 train_time:26059ms step_avg:43.43ms +step:650/20000 train_loss:2.3204 train_time:28175ms step_avg:43.35ms +step:700/20000 train_loss:2.5926 train_time:30315ms step_avg:43.31ms +step:750/20000 train_loss:2.4301 train_time:32457ms step_avg:43.28ms +step:800/20000 train_loss:2.4775 train_time:34707ms step_avg:43.38ms +step:800/20000 val_loss:2.3868 val_bpb:1.4099 train_time:34732ms step_avg:43.42ms +step:850/20000 train_loss:2.3941 train_time:36851ms step_avg:43.35ms +step:900/20000 train_loss:2.3716 train_time:38990ms step_avg:43.32ms +step:950/20000 train_loss:2.3216 train_time:41131ms step_avg:43.30ms +step:1000/20000 train_loss:2.3030 train_time:43390ms step_avg:43.39ms +step:1000/20000 val_loss:2.3370 val_bpb:1.3805 train_time:43415ms step_avg:43.42ms +step:1050/20000 train_loss:2.3893 train_time:45532ms step_avg:43.36ms +step:1100/20000 train_loss:2.4145 train_time:47675ms step_avg:43.34ms +step:1150/20000 train_loss:2.2261 train_time:49933ms step_avg:43.42ms +step:1200/20000 train_loss:2.2607 train_time:52072ms step_avg:43.39ms +step:1200/20000 val_loss:2.3026 val_bpb:1.3602 train_time:52097ms step_avg:43.41ms +step:1250/20000 train_loss:2.3312 train_time:54219ms step_avg:43.38ms +step:1300/20000 train_loss:2.3575 train_time:56363ms step_avg:43.36ms +step:1350/20000 train_loss:2.2774 train_time:58628ms step_avg:43.43ms +step:1400/20000 train_loss:2.2436 train_time:60772ms step_avg:43.41ms +step:1400/20000 val_loss:2.2812 val_bpb:1.3475 train_time:60797ms step_avg:43.43ms +step:1450/20000 train_loss:2.3006 train_time:62917ms step_avg:43.39ms +step:1500/20000 train_loss:2.2831 train_time:65060ms step_avg:43.37ms +step:1550/20000 train_loss:2.2957 train_time:67324ms step_avg:43.43ms +step:1600/20000 train_loss:2.2187 train_time:69467ms step_avg:43.42ms +step:1600/20000 val_loss:2.2631 val_bpb:1.3368 train_time:69491ms step_avg:43.43ms +step:1650/20000 train_loss:2.2629 train_time:71614ms step_avg:43.40ms +step:1700/20000 train_loss:2.2619 train_time:73759ms step_avg:43.39ms +step:1750/20000 train_loss:2.1068 train_time:76028ms step_avg:43.44ms +step:1800/20000 train_loss:2.3312 train_time:78171ms step_avg:43.43ms +step:1800/20000 val_loss:2.2479 val_bpb:1.3279 train_time:78197ms step_avg:43.44ms +step:1850/20000 train_loss:2.2211 train_time:80317ms step_avg:43.41ms +step:1900/20000 train_loss:2.2477 train_time:82462ms step_avg:43.40ms +step:1950/20000 train_loss:2.2707 train_time:84723ms step_avg:43.45ms +step:2000/20000 train_loss:2.2346 train_time:86867ms step_avg:43.43ms +step:2000/20000 val_loss:2.2368 val_bpb:1.3213 train_time:86892ms step_avg:43.45ms +step:2050/20000 train_loss:2.0689 train_time:89013ms step_avg:43.42ms +step:2100/20000 train_loss:2.3382 train_time:91276ms step_avg:43.46ms +step:2150/20000 train_loss:2.1161 train_time:93418ms step_avg:43.45ms +step:2200/20000 train_loss:2.2380 train_time:95565ms step_avg:43.44ms +step:2200/20000 val_loss:2.2251 val_bpb:1.3144 train_time:95590ms step_avg:43.45ms +step:2250/20000 train_loss:2.2362 train_time:97711ms step_avg:43.43ms +step:2300/20000 train_loss:2.2390 train_time:99973ms step_avg:43.47ms +step:2350/20000 train_loss:2.1494 train_time:102118ms step_avg:43.45ms +step:2400/20000 train_loss:2.1004 train_time:104264ms step_avg:43.44ms +step:2400/20000 val_loss:2.2158 val_bpb:1.3089 train_time:104288ms step_avg:43.45ms +step:2450/20000 train_loss:2.2078 train_time:106409ms step_avg:43.43ms +step:2500/20000 train_loss:2.2990 train_time:108679ms step_avg:43.47ms +step:2550/20000 train_loss:2.3510 train_time:110825ms step_avg:43.46ms +step:2600/20000 train_loss:2.1989 train_time:112969ms step_avg:43.45ms +step:2600/20000 val_loss:2.2097 val_bpb:1.3053 train_time:112994ms step_avg:43.46ms +step:2650/20000 train_loss:2.0953 train_time:115115ms step_avg:43.44ms +step:2700/20000 train_loss:2.2119 train_time:117382ms step_avg:43.47ms +step:2750/20000 train_loss:2.2833 train_time:119524ms step_avg:43.46ms +step:2800/20000 train_loss:2.2056 train_time:121673ms step_avg:43.45ms +step:2800/20000 val_loss:2.2011 val_bpb:1.3002 train_time:121697ms step_avg:43.46ms +step:2850/20000 train_loss:2.1613 train_time:123815ms step_avg:43.44ms +step:2900/20000 train_loss:2.2400 train_time:126078ms step_avg:43.48ms +step:2950/20000 train_loss:2.2531 train_time:128222ms step_avg:43.47ms +step:3000/20000 train_loss:2.1098 train_time:130368ms step_avg:43.46ms +step:3000/20000 val_loss:2.1953 val_bpb:1.2968 train_time:130392ms step_avg:43.46ms +step:3050/20000 train_loss:2.4246 train_time:132514ms step_avg:43.45ms +step:3100/20000 train_loss:2.1884 train_time:134780ms step_avg:43.48ms +step:3150/20000 train_loss:2.2749 train_time:136926ms step_avg:43.47ms +step:3200/20000 train_loss:2.1492 train_time:139071ms step_avg:43.46ms +step:3200/20000 val_loss:2.1881 val_bpb:1.2925 train_time:139096ms step_avg:43.47ms +step:3250/20000 train_loss:2.1286 train_time:141341ms step_avg:43.49ms +step:3300/20000 train_loss:2.1058 train_time:143485ms step_avg:43.48ms +step:3350/20000 train_loss:2.2214 train_time:145628ms step_avg:43.47ms +step:3400/20000 train_loss:2.2454 train_time:147773ms step_avg:43.46ms +step:3400/20000 val_loss:2.1854 val_bpb:1.2909 train_time:147798ms step_avg:43.47ms +step:3450/20000 train_loss:2.2601 train_time:150039ms step_avg:43.49ms +step:3500/20000 train_loss:2.1183 train_time:152184ms step_avg:43.48ms +step:3550/20000 train_loss:2.0846 train_time:154329ms step_avg:43.47ms +step:3600/20000 train_loss:2.2507 train_time:156472ms step_avg:43.46ms +step:3600/20000 val_loss:2.1784 val_bpb:1.2868 train_time:156496ms step_avg:43.47ms +step:3650/20000 train_loss:2.1383 train_time:158738ms step_avg:43.49ms +step:3700/20000 train_loss:2.2848 train_time:160882ms step_avg:43.48ms +step:3750/20000 train_loss:2.1982 train_time:163029ms step_avg:43.47ms +step:3800/20000 train_loss:2.1399 train_time:165176ms step_avg:43.47ms +step:3800/20000 val_loss:2.1767 val_bpb:1.2858 train_time:165200ms step_avg:43.47ms +step:3850/20000 train_loss:2.3361 train_time:167438ms step_avg:43.49ms +step:3900/20000 train_loss:2.2756 train_time:169582ms step_avg:43.48ms +step:3950/20000 train_loss:2.1261 train_time:171729ms step_avg:43.48ms +step:4000/20000 train_loss:2.1437 train_time:173878ms step_avg:43.47ms +step:4000/20000 val_loss:2.1718 val_bpb:1.2829 train_time:173903ms step_avg:43.48ms +step:4050/20000 train_loss:2.1718 train_time:176147ms step_avg:43.49ms +step:4100/20000 train_loss:2.1899 train_time:178291ms step_avg:43.49ms +step:4150/20000 train_loss:2.1285 train_time:180438ms step_avg:43.48ms +step:4200/20000 train_loss:2.0498 train_time:182707ms step_avg:43.50ms +step:4200/20000 val_loss:2.1666 val_bpb:1.2798 train_time:182731ms step_avg:43.51ms +step:4250/20000 train_loss:2.2487 train_time:184852ms step_avg:43.49ms +step:4300/20000 train_loss:2.1979 train_time:186996ms step_avg:43.49ms +step:4350/20000 train_loss:2.1314 train_time:189141ms step_avg:43.48ms +step:4400/20000 train_loss:2.1727 train_time:191402ms step_avg:43.50ms +step:4400/20000 val_loss:2.1625 val_bpb:1.2774 train_time:191427ms step_avg:43.51ms +step:4450/20000 train_loss:2.1882 train_time:193549ms step_avg:43.49ms +step:4500/20000 train_loss:2.0735 train_time:195696ms step_avg:43.49ms +step:4550/20000 train_loss:2.1347 train_time:197840ms step_avg:43.48ms +step:4600/20000 train_loss:2.1710 train_time:200091ms step_avg:43.50ms +step:4600/20000 val_loss:2.1597 val_bpb:1.2757 train_time:200114ms step_avg:43.50ms +step:4650/20000 train_loss:2.2563 train_time:202236ms step_avg:43.49ms +step:4700/20000 train_loss:2.2077 train_time:204381ms step_avg:43.49ms +step:4750/20000 train_loss:2.1328 train_time:206643ms step_avg:43.50ms +step:4800/20000 train_loss:2.1473 train_time:208788ms step_avg:43.50ms +step:4800/20000 val_loss:2.1579 val_bpb:1.2747 train_time:208812ms step_avg:43.50ms +step:4850/20000 train_loss:2.2067 train_time:210933ms step_avg:43.49ms +step:4900/20000 train_loss:2.1119 train_time:213078ms step_avg:43.49ms +step:4950/20000 train_loss:2.0031 train_time:215339ms step_avg:43.50ms +step:5000/20000 train_loss:2.1104 train_time:217483ms step_avg:43.50ms +step:5000/20000 val_loss:2.1532 val_bpb:1.2719 train_time:217508ms step_avg:43.50ms +step:5050/20000 train_loss:2.0232 train_time:219627ms step_avg:43.49ms +step:5100/20000 train_loss:2.1995 train_time:221774ms step_avg:43.49ms +step:5150/20000 train_loss:2.0709 train_time:224038ms step_avg:43.50ms +step:5200/20000 train_loss:2.0972 train_time:226182ms step_avg:43.50ms +step:5200/20000 val_loss:2.1501 val_bpb:1.2701 train_time:226207ms step_avg:43.50ms +step:5250/20000 train_loss:2.1395 train_time:228330ms step_avg:43.49ms +step:5300/20000 train_loss:2.0947 train_time:230476ms step_avg:43.49ms +step:5350/20000 train_loss:2.0819 train_time:232740ms step_avg:43.50ms +step:5400/20000 train_loss:2.2099 train_time:234884ms step_avg:43.50ms +step:5400/20000 val_loss:2.1475 val_bpb:1.2685 train_time:234909ms step_avg:43.50ms +step:5450/20000 train_loss:2.1314 train_time:237031ms step_avg:43.49ms +step:5500/20000 train_loss:2.2057 train_time:239295ms step_avg:43.51ms +step:5550/20000 train_loss:2.0856 train_time:241437ms step_avg:43.50ms +step:5600/20000 train_loss:2.1448 train_time:243583ms step_avg:43.50ms +step:5600/20000 val_loss:2.1455 val_bpb:1.2674 train_time:243608ms step_avg:43.50ms +step:5650/20000 train_loss:2.0312 train_time:245730ms step_avg:43.49ms +step:5700/20000 train_loss:2.1392 train_time:247996ms step_avg:43.51ms +step:5750/20000 train_loss:2.0206 train_time:250140ms step_avg:43.50ms +step:5800/20000 train_loss:2.2107 train_time:252283ms step_avg:43.50ms +step:5800/20000 val_loss:2.1439 val_bpb:1.2664 train_time:252308ms step_avg:43.50ms +step:5850/20000 train_loss:2.0973 train_time:254429ms step_avg:43.49ms +step:5900/20000 train_loss:2.1270 train_time:256697ms step_avg:43.51ms +step:5950/20000 train_loss:2.0899 train_time:258840ms step_avg:43.50ms +step:6000/20000 train_loss:2.2182 train_time:260985ms step_avg:43.50ms +step:6000/20000 val_loss:2.1445 val_bpb:1.2668 train_time:261009ms step_avg:43.50ms +step:6050/20000 train_loss:2.1230 train_time:263130ms step_avg:43.49ms +step:6100/20000 train_loss:2.1640 train_time:265401ms step_avg:43.51ms +step:6150/20000 train_loss:2.1960 train_time:267547ms step_avg:43.50ms +step:6200/20000 train_loss:2.1217 train_time:269692ms step_avg:43.50ms +step:6200/20000 val_loss:2.1416 val_bpb:1.2651 train_time:269717ms step_avg:43.50ms +step:6250/20000 train_loss:2.1106 train_time:271837ms step_avg:43.49ms +step:6300/20000 train_loss:2.1989 train_time:274105ms step_avg:43.51ms +step:6350/20000 train_loss:2.1738 train_time:276249ms step_avg:43.50ms +step:6400/20000 train_loss:2.1333 train_time:278396ms step_avg:43.50ms +step:6400/20000 val_loss:2.1377 val_bpb:1.2628 train_time:278421ms step_avg:43.50ms +step:6450/20000 train_loss:1.9696 train_time:280544ms step_avg:43.50ms +step:6500/20000 train_loss:2.1279 train_time:282815ms step_avg:43.51ms +step:6550/20000 train_loss:2.2768 train_time:284958ms step_avg:43.51ms +step:6600/20000 train_loss:2.1060 train_time:287102ms step_avg:43.50ms +step:6600/20000 val_loss:2.1354 val_bpb:1.2614 train_time:287126ms step_avg:43.50ms +step:6650/20000 train_loss:2.1036 train_time:289368ms step_avg:43.51ms +step:6700/20000 train_loss:2.1438 train_time:291511ms step_avg:43.51ms +step:6750/20000 train_loss:1.8938 train_time:293654ms step_avg:43.50ms +step:6800/20000 train_loss:2.1809 train_time:295799ms step_avg:43.50ms +step:6800/20000 val_loss:2.1342 val_bpb:1.2607 train_time:295824ms step_avg:43.50ms +step:6850/20000 train_loss:2.0978 train_time:298068ms step_avg:43.51ms +step:6900/20000 train_loss:2.1146 train_time:300210ms step_avg:43.51ms +step:6950/20000 train_loss:2.1328 train_time:302354ms step_avg:43.50ms +step:7000/20000 train_loss:2.1537 train_time:304499ms step_avg:43.50ms +step:7000/20000 val_loss:2.1326 val_bpb:1.2598 train_time:304523ms step_avg:43.50ms +step:7050/20000 train_loss:2.1382 train_time:306765ms step_avg:43.51ms +step:7100/20000 train_loss:2.1078 train_time:308911ms step_avg:43.51ms +step:7150/20000 train_loss:2.1952 train_time:311056ms step_avg:43.50ms +step:7200/20000 train_loss:2.1143 train_time:313204ms step_avg:43.50ms +step:7200/20000 val_loss:2.1299 val_bpb:1.2582 train_time:313228ms step_avg:43.50ms +step:7250/20000 train_loss:2.1009 train_time:315469ms step_avg:43.51ms +step:7300/20000 train_loss:2.1529 train_time:317612ms step_avg:43.51ms +step:7350/20000 train_loss:2.1532 train_time:319759ms step_avg:43.50ms +step:7400/20000 train_loss:2.1137 train_time:321901ms step_avg:43.50ms +step:7400/20000 val_loss:2.1282 val_bpb:1.2572 train_time:321927ms step_avg:43.50ms +step:7450/20000 train_loss:2.4067 train_time:324167ms step_avg:43.51ms +step:7500/20000 train_loss:2.0751 train_time:326311ms step_avg:43.51ms +step:7550/20000 train_loss:2.1258 train_time:328457ms step_avg:43.50ms +step:7600/20000 train_loss:2.1723 train_time:330730ms step_avg:43.52ms +step:7600/20000 val_loss:2.1289 val_bpb:1.2576 train_time:330754ms step_avg:43.52ms +step:7650/20000 train_loss:2.2193 train_time:332878ms step_avg:43.51ms +step:7700/20000 train_loss:2.1329 train_time:335023ms step_avg:43.51ms +step:7750/20000 train_loss:2.0562 train_time:337169ms step_avg:43.51ms +step:7800/20000 train_loss:2.1669 train_time:339436ms step_avg:43.52ms +step:7800/20000 val_loss:2.1252 val_bpb:1.2554 train_time:339460ms step_avg:43.52ms +step:7850/20000 train_loss:2.0994 train_time:341583ms step_avg:43.51ms +step:7900/20000 train_loss:2.1585 train_time:343729ms step_avg:43.51ms +step:7950/20000 train_loss:2.1319 train_time:345873ms step_avg:43.51ms +step:8000/20000 train_loss:2.2613 train_time:348141ms step_avg:43.52ms +step:8000/20000 val_loss:2.1232 val_bpb:1.2542 train_time:348165ms step_avg:43.52ms +step:8050/20000 train_loss:2.1775 train_time:350287ms step_avg:43.51ms +step:8100/20000 train_loss:1.9587 train_time:352431ms step_avg:43.51ms +step:8150/20000 train_loss:2.0401 train_time:354575ms step_avg:43.51ms +step:8200/20000 train_loss:2.1076 train_time:356845ms step_avg:43.52ms +step:8200/20000 val_loss:2.1228 val_bpb:1.2540 train_time:356869ms step_avg:43.52ms +step:8250/20000 train_loss:2.0951 train_time:358988ms step_avg:43.51ms +step:8300/20000 train_loss:2.2244 train_time:361133ms step_avg:43.51ms +step:8350/20000 train_loss:2.0681 train_time:363279ms step_avg:43.51ms +step:8400/20000 train_loss:2.1494 train_time:365552ms step_avg:43.52ms +step:8400/20000 val_loss:2.1201 val_bpb:1.2524 train_time:365577ms step_avg:43.52ms +step:8450/20000 train_loss:2.1278 train_time:367698ms step_avg:43.51ms +step:8500/20000 train_loss:2.0289 train_time:369845ms step_avg:43.51ms +step:8550/20000 train_loss:2.0465 train_time:372114ms step_avg:43.52ms +step:8600/20000 train_loss:2.0682 train_time:374259ms step_avg:43.52ms +step:8600/20000 val_loss:2.1206 val_bpb:1.2526 train_time:374282ms step_avg:43.52ms +step:8650/20000 train_loss:2.2717 train_time:376403ms step_avg:43.51ms +step:8700/20000 train_loss:2.1795 train_time:378549ms step_avg:43.51ms +step:8750/20000 train_loss:2.0492 train_time:380817ms step_avg:43.52ms +step:8800/20000 train_loss:2.1100 train_time:382964ms step_avg:43.52ms +step:8800/20000 val_loss:2.1192 val_bpb:1.2518 train_time:382989ms step_avg:43.52ms +step:8850/20000 train_loss:2.4323 train_time:385110ms step_avg:43.52ms +step:8900/20000 train_loss:2.1016 train_time:387258ms step_avg:43.51ms +step:8950/20000 train_loss:2.0290 train_time:389530ms step_avg:43.52ms +step:9000/20000 train_loss:2.1119 train_time:391675ms step_avg:43.52ms +step:9000/20000 val_loss:2.1204 val_bpb:1.2525 train_time:391698ms step_avg:43.52ms +step:9050/20000 train_loss:2.0826 train_time:393819ms step_avg:43.52ms +step:9100/20000 train_loss:2.0427 train_time:395963ms step_avg:43.51ms +step:9150/20000 train_loss:2.1201 train_time:398238ms step_avg:43.52ms +step:9200/20000 train_loss:2.1490 train_time:400385ms step_avg:43.52ms +step:9200/20000 val_loss:2.1170 val_bpb:1.2505 train_time:400409ms step_avg:43.52ms +step:9250/20000 train_loss:2.1221 train_time:402534ms step_avg:43.52ms +step:9300/20000 train_loss:2.4550 train_time:404680ms step_avg:43.51ms +step:9350/20000 train_loss:2.0384 train_time:406932ms step_avg:43.52ms +step:9400/20000 train_loss:2.0736 train_time:409077ms step_avg:43.52ms +step:9400/20000 val_loss:2.1139 val_bpb:1.2487 train_time:409102ms step_avg:43.52ms +step:9450/20000 train_loss:2.1096 train_time:411223ms step_avg:43.52ms +step:9500/20000 train_loss:2.1070 train_time:413493ms step_avg:43.53ms +step:9550/20000 train_loss:2.0249 train_time:415641ms step_avg:43.52ms +step:9600/20000 train_loss:2.1141 train_time:417785ms step_avg:43.52ms +step:9600/20000 val_loss:2.1138 val_bpb:1.2486 train_time:417809ms step_avg:43.52ms +step:9650/20000 train_loss:2.0183 train_time:419932ms step_avg:43.52ms +step:9700/20000 train_loss:2.1482 train_time:422212ms step_avg:43.53ms +step:9750/20000 train_loss:2.1811 train_time:424359ms step_avg:43.52ms +step:9800/20000 train_loss:2.1011 train_time:426503ms step_avg:43.52ms +step:9800/20000 val_loss:2.1143 val_bpb:1.2489 train_time:426528ms step_avg:43.52ms +step:9850/20000 train_loss:2.1134 train_time:428771ms step_avg:43.53ms +step:9900/20000 train_loss:2.0497 train_time:430915ms step_avg:43.53ms +step:9950/20000 train_loss:2.1989 train_time:433061ms step_avg:43.52ms +step:10000/20000 train_loss:2.1982 train_time:435207ms step_avg:43.52ms +step:10000/20000 val_loss:2.1122 val_bpb:1.2477 train_time:435232ms step_avg:43.52ms +step:10050/20000 train_loss:2.0940 train_time:437485ms step_avg:43.53ms +step:10100/20000 train_loss:2.1277 train_time:439630ms step_avg:43.53ms +step:10150/20000 train_loss:2.0896 train_time:441773ms step_avg:43.52ms +step:10200/20000 train_loss:2.0642 train_time:443918ms step_avg:43.52ms +step:10200/20000 val_loss:2.1112 val_bpb:1.2471 train_time:443941ms step_avg:43.52ms +step:10250/20000 train_loss:2.0627 train_time:446192ms step_avg:43.53ms +step:10300/20000 train_loss:2.2191 train_time:448339ms step_avg:43.53ms +step:10350/20000 train_loss:2.1354 train_time:450485ms step_avg:43.53ms +step:10400/20000 train_loss:2.0705 train_time:452630ms step_avg:43.52ms +step:10400/20000 val_loss:2.1098 val_bpb:1.2463 train_time:452654ms step_avg:43.52ms +step:10450/20000 train_loss:2.0663 train_time:454900ms step_avg:43.53ms +step:10500/20000 train_loss:2.1334 train_time:457046ms step_avg:43.53ms +step:10550/20000 train_loss:2.1931 train_time:459192ms step_avg:43.53ms +step:10600/20000 train_loss:2.0978 train_time:461337ms step_avg:43.52ms +step:10600/20000 val_loss:2.1081 val_bpb:1.2453 train_time:461361ms step_avg:43.52ms +step:10650/20000 train_loss:2.0676 train_time:463610ms step_avg:43.53ms +step:10700/20000 train_loss:2.2333 train_time:465754ms step_avg:43.53ms +step:10750/20000 train_loss:2.1661 train_time:467899ms step_avg:43.53ms +step:10800/20000 train_loss:2.0966 train_time:470044ms step_avg:43.52ms +step:10800/20000 val_loss:2.1081 val_bpb:1.2453 train_time:470069ms step_avg:43.52ms +step:10850/20000 train_loss:2.0708 train_time:472323ms step_avg:43.53ms +step:10900/20000 train_loss:2.1666 train_time:474468ms step_avg:43.53ms +step:10950/20000 train_loss:2.1079 train_time:476615ms step_avg:43.53ms +step:11000/20000 train_loss:2.0774 train_time:478893ms step_avg:43.54ms +step:11000/20000 val_loss:2.1069 val_bpb:1.2446 train_time:478917ms step_avg:43.54ms +step:11050/20000 train_loss:2.1288 train_time:481038ms step_avg:43.53ms +step:11100/20000 train_loss:2.0801 train_time:483185ms step_avg:43.53ms +step:11150/20000 train_loss:1.8743 train_time:485331ms step_avg:43.53ms +step:11200/20000 train_loss:2.1471 train_time:487603ms step_avg:43.54ms +step:11200/20000 val_loss:2.1080 val_bpb:1.2452 train_time:487627ms step_avg:43.54ms +step:11250/20000 train_loss:2.2046 train_time:489748ms step_avg:43.53ms +step:11300/20000 train_loss:2.0957 train_time:491892ms step_avg:43.53ms +step:11350/20000 train_loss:2.0963 train_time:494038ms step_avg:43.53ms +step:11400/20000 train_loss:2.3223 train_time:496318ms step_avg:43.54ms +step:11400/20000 val_loss:2.1051 val_bpb:1.2435 train_time:496342ms step_avg:43.54ms +step:11450/20000 train_loss:2.0724 train_time:498464ms step_avg:43.53ms +step:11500/20000 train_loss:2.1197 train_time:500609ms step_avg:43.53ms +step:11550/20000 train_loss:2.0975 train_time:502754ms step_avg:43.53ms +step:11600/20000 train_loss:2.1091 train_time:505029ms step_avg:43.54ms +step:11600/20000 val_loss:2.1054 val_bpb:1.2437 train_time:505053ms step_avg:43.54ms +step:11650/20000 train_loss:2.1235 train_time:507175ms step_avg:43.53ms +step:11700/20000 train_loss:2.0795 train_time:509324ms step_avg:43.53ms +step:11750/20000 train_loss:2.0662 train_time:511469ms step_avg:43.53ms +step:11800/20000 train_loss:2.0765 train_time:513742ms step_avg:43.54ms +step:11800/20000 val_loss:2.1048 val_bpb:1.2433 train_time:513766ms step_avg:43.54ms +step:11850/20000 train_loss:2.1202 train_time:515888ms step_avg:43.53ms +step:11900/20000 train_loss:2.1029 train_time:518033ms step_avg:43.53ms +step:11950/20000 train_loss:2.1512 train_time:520308ms step_avg:43.54ms +step:12000/20000 train_loss:2.1814 train_time:522453ms step_avg:43.54ms +step:12000/20000 val_loss:2.1029 val_bpb:1.2422 train_time:522477ms step_avg:43.54ms +step:12050/20000 train_loss:2.1085 train_time:524601ms step_avg:43.54ms +step:12100/20000 train_loss:2.0347 train_time:526747ms step_avg:43.53ms +step:12150/20000 train_loss:2.0601 train_time:529018ms step_avg:43.54ms +step:12200/20000 train_loss:2.0387 train_time:531162ms step_avg:43.54ms +step:12200/20000 val_loss:2.1021 val_bpb:1.2418 train_time:531186ms step_avg:43.54ms +step:12250/20000 train_loss:2.0381 train_time:533312ms step_avg:43.54ms +step:12300/20000 train_loss:2.1302 train_time:535458ms step_avg:43.53ms +step:12350/20000 train_loss:2.1272 train_time:537727ms step_avg:43.54ms +step:12400/20000 train_loss:2.1828 train_time:539873ms step_avg:43.54ms +step:12400/20000 val_loss:2.1001 val_bpb:1.2406 train_time:539897ms step_avg:43.54ms +step:12450/20000 train_loss:2.1003 train_time:542019ms step_avg:43.54ms +step:12500/20000 train_loss:2.0696 train_time:544164ms step_avg:43.53ms +step:12550/20000 train_loss:2.1302 train_time:546436ms step_avg:43.54ms +step:12600/20000 train_loss:2.0527 train_time:548582ms step_avg:43.54ms +step:12600/20000 val_loss:2.0998 val_bpb:1.2404 train_time:548606ms step_avg:43.54ms +step:12650/20000 train_loss:2.1438 train_time:550728ms step_avg:43.54ms +step:12700/20000 train_loss:2.2689 train_time:552877ms step_avg:43.53ms +step:12750/20000 train_loss:2.1438 train_time:555147ms step_avg:43.54ms +step:12800/20000 train_loss:2.0105 train_time:557293ms step_avg:43.54ms +step:12800/20000 val_loss:2.0930 val_bpb:1.2364 train_time:557317ms step_avg:43.54ms +step:12850/20000 train_loss:2.0413 train_time:559440ms step_avg:43.54ms +step:12900/20000 train_loss:2.0630 train_time:561586ms step_avg:43.53ms +step:12950/20000 train_loss:2.1627 train_time:563863ms step_avg:43.54ms +step:13000/20000 train_loss:1.9579 train_time:566009ms step_avg:43.54ms +step:13000/20000 val_loss:2.0859 val_bpb:1.2322 train_time:566032ms step_avg:43.54ms +step:13050/20000 train_loss:2.0206 train_time:568155ms step_avg:43.54ms +step:13100/20000 train_loss:1.9294 train_time:570432ms step_avg:43.54ms +step:13150/20000 train_loss:2.0689 train_time:572576ms step_avg:43.54ms +step:13200/20000 train_loss:2.0074 train_time:574722ms step_avg:43.54ms +step:13200/20000 val_loss:2.0790 val_bpb:1.2281 train_time:574747ms step_avg:43.54ms +step:13250/20000 train_loss:2.0596 train_time:576871ms step_avg:43.54ms +step:13300/20000 train_loss:1.9474 train_time:579143ms step_avg:43.54ms +step:13350/20000 train_loss:2.0459 train_time:581289ms step_avg:43.54ms +step:13400/20000 train_loss:2.0441 train_time:583434ms step_avg:43.54ms +step:13400/20000 val_loss:2.0718 val_bpb:1.2239 train_time:583458ms step_avg:43.54ms +step:13450/20000 train_loss:2.1638 train_time:585582ms step_avg:43.54ms +step:13500/20000 train_loss:2.1216 train_time:587857ms step_avg:43.54ms +step:13550/20000 train_loss:2.1855 train_time:590003ms step_avg:43.54ms +step:13600/20000 train_loss:2.0234 train_time:592147ms step_avg:43.54ms +step:13600/20000 val_loss:2.0649 val_bpb:1.2197 train_time:592172ms step_avg:43.54ms +step:13650/20000 train_loss:2.0316 train_time:594295ms step_avg:43.54ms +step:13700/20000 train_loss:2.0323 train_time:596577ms step_avg:43.55ms +step:13750/20000 train_loss:1.9910 train_time:598726ms step_avg:43.54ms +step:13780/20000 val_loss:2.0606 val_bpb:1.2172 train_time:600038ms step_avg:43.54ms +stopping_early: wallclock_cap train_time:600038ms step:13780/20000 +peak memory allocated: 10184 MiB reserved: 10200 MiB +Serialized model: 67224983 bytes +Code size: 47642 bytes +Total submission size: 67272625 bytes +Serialized model int8+zlib: 15815847 bytes (payload:17178912 raw_torch:17224025 payload_ratio:3.91x) +Total submission size int8+zlib: 15863489 bytes +final_int8_zlib_roundtrip val_loss:2.0727 val_bpb:1.2244 eval_time:1401ms +final_int8_zlib_roundtrip_exact val_loss:2.07269931 val_bpb:1.22436570 |
