summaryrefslogtreecommitdiff
path: root/records/track_10min_16mb/2026-03-17_NaiveBaseline/train.log
blob: 69b17b6c7fcfd141c496c749522d4f04af6d1908 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
W0318 14:37:59.159000 871689 site-packages/torch/distributed/run.py:852] 
W0318 14:37:59.159000 871689 site-packages/torch/distributed/run.py:852] *****************************************
W0318 14:37:59.159000 871689 site-packages/torch/distributed/run.py:852] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
W0318 14:37:59.159000 871689 site-packages/torch/distributed/run.py:852] *****************************************
[W318 14:38:11.514156940 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[W318 14:38:11.543417305 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[W318 14:38:11.552597211 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
NCCL version 2.27.5+cuda12.9
[W318 14:38:11.832390267 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[W318 14:38:11.842257581 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[W318 14:38:11.842253680 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[W318 14:38:11.899166383 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[W318 14:38:11.901800020 Utils.hpp:137] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())

[2026-03-18 14:38:12] pgut1-0:871784:871848 [5] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed

[2026-03-18 14:38:12] pgut1-0:871784:871848 [5] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0

[2026-03-18 14:38:12] pgut1-0:871786:871849 [7] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed

[2026-03-18 14:38:12] pgut1-0:871786:871849 [7] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0

[2026-03-18 14:38:12] pgut1-0:871779:871850 [0] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed

[2026-03-18 14:38:12] pgut1-0:871779:871850 [0] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0

[2026-03-18 14:38:12] pgut1-0:871780:871857 [1] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed

[2026-03-18 14:38:12] pgut1-0:871780:871857 [1] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0

[2026-03-18 14:38:12] pgut1-0:871781:871858 [2] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed

[2026-03-18 14:38:12] pgut1-0:871781:871858 [2] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0

[2026-03-18 14:38:12] pgut1-0:871783:871859 [4] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed

[2026-03-18 14:38:12] pgut1-0:871783:871859 [4] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0

[2026-03-18 14:38:12] pgut1-0:871782:871864 [3] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed

[2026-03-18 14:38:12] pgut1-0:871782:871864 [3] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0

[2026-03-18 14:38:12] pgut1-0:871785:871865 [6] ibvwrap.c:94 NCCL WARN Call to ibv_open_device failed

[2026-03-18 14:38:12] pgut1-0:871785:871865 [6] p2p_plugin.c:565 NCCL WARN NET/IB : Unable to open device mlx5_an0
logs/hf_verify_sp1024_8gpu.txt
val_bpb:enabled tokenizer_kind=sentencepiece tokenizer_path=/root/code/parameter-golf/data/tokenizers/fineweb_1024_bpe.model
train_loader:dataset:fineweb10B_sp1024 train_shards:25
val_loader:shards pattern=/root/code/parameter-golf/data/datasets/fineweb10B_sp1024/fineweb_val_*.bin tokens:63779840
[rank0]:[W318 14:38:18.833454927 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
model_params:17059912
world_size:8 grad_accum_steps:1
sdp_backends:cudnn=False flash=True mem_efficient=False math=False
attention_mode:gqa num_heads:8 num_kv_heads:4
tie_embeddings:True embed_lr:0.05 head_lr:0.0 matrix_lr:0.04 scalar_lr:0.04
train_batch_tokens:524288 train_seq_len:1024 iterations:20000 warmup_steps:20 max_wallclock_seconds:600.000
seed:1337
[rank3]:[W318 14:38:18.835915381 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[rank7]:[W318 14:38:18.835951425 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[rank6]:[W318 14:38:18.835967008 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[rank2]:[W318 14:38:18.836023454 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[rank5]:[W318 14:38:18.836119632 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[rank4]:[W318 14:38:18.836127772 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
[rank1]:[W318 14:38:18.836354967 Utils.hpp:112] Warning: Environment variable NCCL_ASYNC_ERROR_HANDLING is deprecated; use TORCH_NCCL_ASYNC_ERROR_HANDLING instead (function operator())
warmup_step:1/20
warmup_step:2/20
warmup_step:3/20
warmup_step:4/20
warmup_step:5/20
warmup_step:6/20
warmup_step:7/20
warmup_step:8/20
warmup_step:9/20
warmup_step:10/20
warmup_step:11/20
warmup_step:12/20
warmup_step:13/20
warmup_step:14/20
warmup_step:15/20
warmup_step:16/20
warmup_step:17/20
warmup_step:18/20
warmup_step:19/20
warmup_step:20/20
step:0/20000 val_loss:6.9370 val_bpb:4.0978 train_time:0ms step_avg:0.01ms
step:1/20000 train_loss:6.9408 train_time:24ms step_avg:23.99ms
step:2/20000 train_loss:16.8763 train_time:67ms step_avg:33.39ms
step:3/20000 train_loss:9.0044 train_time:110ms step_avg:36.62ms
step:4/20000 train_loss:6.5686 train_time:152ms step_avg:37.99ms
step:5/20000 train_loss:6.6665 train_time:195ms step_avg:38.97ms
step:6/20000 train_loss:6.5027 train_time:239ms step_avg:39.81ms
step:7/20000 train_loss:6.2808 train_time:280ms step_avg:40.05ms
step:8/20000 train_loss:5.9951 train_time:324ms step_avg:40.52ms
step:9/20000 train_loss:6.0187 train_time:367ms step_avg:40.77ms
step:10/20000 train_loss:5.9718 train_time:409ms step_avg:40.93ms
step:50/20000 train_loss:3.9508 train_time:2126ms step_avg:42.52ms
step:100/20000 train_loss:3.3373 train_time:4267ms step_avg:42.67ms
step:150/20000 train_loss:2.9651 train_time:6414ms step_avg:42.76ms
step:200/20000 train_loss:2.8041 train_time:8677ms step_avg:43.38ms
step:200/20000 val_loss:2.8397 val_bpb:1.6774 train_time:8699ms step_avg:43.49ms
step:250/20000 train_loss:2.7379 train_time:10816ms step_avg:43.27ms
step:300/20000 train_loss:2.6613 train_time:12958ms step_avg:43.19ms
step:350/20000 train_loss:2.6434 train_time:15097ms step_avg:43.13ms
step:400/20000 train_loss:2.7684 train_time:17357ms step_avg:43.39ms
step:400/20000 val_loss:2.5687 val_bpb:1.5174 train_time:17382ms step_avg:43.45ms
step:450/20000 train_loss:2.6035 train_time:19502ms step_avg:43.34ms
step:500/20000 train_loss:2.5265 train_time:21643ms step_avg:43.29ms
step:550/20000 train_loss:2.4803 train_time:23782ms step_avg:43.24ms
step:600/20000 train_loss:2.4731 train_time:26034ms step_avg:43.39ms
step:600/20000 val_loss:2.4456 val_bpb:1.4447 train_time:26059ms step_avg:43.43ms
step:650/20000 train_loss:2.3204 train_time:28175ms step_avg:43.35ms
step:700/20000 train_loss:2.5926 train_time:30315ms step_avg:43.31ms
step:750/20000 train_loss:2.4301 train_time:32457ms step_avg:43.28ms
step:800/20000 train_loss:2.4775 train_time:34707ms step_avg:43.38ms
step:800/20000 val_loss:2.3868 val_bpb:1.4099 train_time:34732ms step_avg:43.42ms
step:850/20000 train_loss:2.3941 train_time:36851ms step_avg:43.35ms
step:900/20000 train_loss:2.3716 train_time:38990ms step_avg:43.32ms
step:950/20000 train_loss:2.3216 train_time:41131ms step_avg:43.30ms
step:1000/20000 train_loss:2.3030 train_time:43390ms step_avg:43.39ms
step:1000/20000 val_loss:2.3370 val_bpb:1.3805 train_time:43415ms step_avg:43.42ms
step:1050/20000 train_loss:2.3893 train_time:45532ms step_avg:43.36ms
step:1100/20000 train_loss:2.4145 train_time:47675ms step_avg:43.34ms
step:1150/20000 train_loss:2.2261 train_time:49933ms step_avg:43.42ms
step:1200/20000 train_loss:2.2607 train_time:52072ms step_avg:43.39ms
step:1200/20000 val_loss:2.3026 val_bpb:1.3602 train_time:52097ms step_avg:43.41ms
step:1250/20000 train_loss:2.3312 train_time:54219ms step_avg:43.38ms
step:1300/20000 train_loss:2.3575 train_time:56363ms step_avg:43.36ms
step:1350/20000 train_loss:2.2774 train_time:58628ms step_avg:43.43ms
step:1400/20000 train_loss:2.2436 train_time:60772ms step_avg:43.41ms
step:1400/20000 val_loss:2.2812 val_bpb:1.3475 train_time:60797ms step_avg:43.43ms
step:1450/20000 train_loss:2.3006 train_time:62917ms step_avg:43.39ms
step:1500/20000 train_loss:2.2831 train_time:65060ms step_avg:43.37ms
step:1550/20000 train_loss:2.2957 train_time:67324ms step_avg:43.43ms
step:1600/20000 train_loss:2.2187 train_time:69467ms step_avg:43.42ms
step:1600/20000 val_loss:2.2631 val_bpb:1.3368 train_time:69491ms step_avg:43.43ms
step:1650/20000 train_loss:2.2629 train_time:71614ms step_avg:43.40ms
step:1700/20000 train_loss:2.2619 train_time:73759ms step_avg:43.39ms
step:1750/20000 train_loss:2.1068 train_time:76028ms step_avg:43.44ms
step:1800/20000 train_loss:2.3312 train_time:78171ms step_avg:43.43ms
step:1800/20000 val_loss:2.2479 val_bpb:1.3279 train_time:78197ms step_avg:43.44ms
step:1850/20000 train_loss:2.2211 train_time:80317ms step_avg:43.41ms
step:1900/20000 train_loss:2.2477 train_time:82462ms step_avg:43.40ms
step:1950/20000 train_loss:2.2707 train_time:84723ms step_avg:43.45ms
step:2000/20000 train_loss:2.2346 train_time:86867ms step_avg:43.43ms
step:2000/20000 val_loss:2.2368 val_bpb:1.3213 train_time:86892ms step_avg:43.45ms
step:2050/20000 train_loss:2.0689 train_time:89013ms step_avg:43.42ms
step:2100/20000 train_loss:2.3382 train_time:91276ms step_avg:43.46ms
step:2150/20000 train_loss:2.1161 train_time:93418ms step_avg:43.45ms
step:2200/20000 train_loss:2.2380 train_time:95565ms step_avg:43.44ms
step:2200/20000 val_loss:2.2251 val_bpb:1.3144 train_time:95590ms step_avg:43.45ms
step:2250/20000 train_loss:2.2362 train_time:97711ms step_avg:43.43ms
step:2300/20000 train_loss:2.2390 train_time:99973ms step_avg:43.47ms
step:2350/20000 train_loss:2.1494 train_time:102118ms step_avg:43.45ms
step:2400/20000 train_loss:2.1004 train_time:104264ms step_avg:43.44ms
step:2400/20000 val_loss:2.2158 val_bpb:1.3089 train_time:104288ms step_avg:43.45ms
step:2450/20000 train_loss:2.2078 train_time:106409ms step_avg:43.43ms
step:2500/20000 train_loss:2.2990 train_time:108679ms step_avg:43.47ms
step:2550/20000 train_loss:2.3510 train_time:110825ms step_avg:43.46ms
step:2600/20000 train_loss:2.1989 train_time:112969ms step_avg:43.45ms
step:2600/20000 val_loss:2.2097 val_bpb:1.3053 train_time:112994ms step_avg:43.46ms
step:2650/20000 train_loss:2.0953 train_time:115115ms step_avg:43.44ms
step:2700/20000 train_loss:2.2119 train_time:117382ms step_avg:43.47ms
step:2750/20000 train_loss:2.2833 train_time:119524ms step_avg:43.46ms
step:2800/20000 train_loss:2.2056 train_time:121673ms step_avg:43.45ms
step:2800/20000 val_loss:2.2011 val_bpb:1.3002 train_time:121697ms step_avg:43.46ms
step:2850/20000 train_loss:2.1613 train_time:123815ms step_avg:43.44ms
step:2900/20000 train_loss:2.2400 train_time:126078ms step_avg:43.48ms
step:2950/20000 train_loss:2.2531 train_time:128222ms step_avg:43.47ms
step:3000/20000 train_loss:2.1098 train_time:130368ms step_avg:43.46ms
step:3000/20000 val_loss:2.1953 val_bpb:1.2968 train_time:130392ms step_avg:43.46ms
step:3050/20000 train_loss:2.4246 train_time:132514ms step_avg:43.45ms
step:3100/20000 train_loss:2.1884 train_time:134780ms step_avg:43.48ms
step:3150/20000 train_loss:2.2749 train_time:136926ms step_avg:43.47ms
step:3200/20000 train_loss:2.1492 train_time:139071ms step_avg:43.46ms
step:3200/20000 val_loss:2.1881 val_bpb:1.2925 train_time:139096ms step_avg:43.47ms
step:3250/20000 train_loss:2.1286 train_time:141341ms step_avg:43.49ms
step:3300/20000 train_loss:2.1058 train_time:143485ms step_avg:43.48ms
step:3350/20000 train_loss:2.2214 train_time:145628ms step_avg:43.47ms
step:3400/20000 train_loss:2.2454 train_time:147773ms step_avg:43.46ms
step:3400/20000 val_loss:2.1854 val_bpb:1.2909 train_time:147798ms step_avg:43.47ms
step:3450/20000 train_loss:2.2601 train_time:150039ms step_avg:43.49ms
step:3500/20000 train_loss:2.1183 train_time:152184ms step_avg:43.48ms
step:3550/20000 train_loss:2.0846 train_time:154329ms step_avg:43.47ms
step:3600/20000 train_loss:2.2507 train_time:156472ms step_avg:43.46ms
step:3600/20000 val_loss:2.1784 val_bpb:1.2868 train_time:156496ms step_avg:43.47ms
step:3650/20000 train_loss:2.1383 train_time:158738ms step_avg:43.49ms
step:3700/20000 train_loss:2.2848 train_time:160882ms step_avg:43.48ms
step:3750/20000 train_loss:2.1982 train_time:163029ms step_avg:43.47ms
step:3800/20000 train_loss:2.1399 train_time:165176ms step_avg:43.47ms
step:3800/20000 val_loss:2.1767 val_bpb:1.2858 train_time:165200ms step_avg:43.47ms
step:3850/20000 train_loss:2.3361 train_time:167438ms step_avg:43.49ms
step:3900/20000 train_loss:2.2756 train_time:169582ms step_avg:43.48ms
step:3950/20000 train_loss:2.1261 train_time:171729ms step_avg:43.48ms
step:4000/20000 train_loss:2.1437 train_time:173878ms step_avg:43.47ms
step:4000/20000 val_loss:2.1718 val_bpb:1.2829 train_time:173903ms step_avg:43.48ms
step:4050/20000 train_loss:2.1718 train_time:176147ms step_avg:43.49ms
step:4100/20000 train_loss:2.1899 train_time:178291ms step_avg:43.49ms
step:4150/20000 train_loss:2.1285 train_time:180438ms step_avg:43.48ms
step:4200/20000 train_loss:2.0498 train_time:182707ms step_avg:43.50ms
step:4200/20000 val_loss:2.1666 val_bpb:1.2798 train_time:182731ms step_avg:43.51ms
step:4250/20000 train_loss:2.2487 train_time:184852ms step_avg:43.49ms
step:4300/20000 train_loss:2.1979 train_time:186996ms step_avg:43.49ms
step:4350/20000 train_loss:2.1314 train_time:189141ms step_avg:43.48ms
step:4400/20000 train_loss:2.1727 train_time:191402ms step_avg:43.50ms
step:4400/20000 val_loss:2.1625 val_bpb:1.2774 train_time:191427ms step_avg:43.51ms
step:4450/20000 train_loss:2.1882 train_time:193549ms step_avg:43.49ms
step:4500/20000 train_loss:2.0735 train_time:195696ms step_avg:43.49ms
step:4550/20000 train_loss:2.1347 train_time:197840ms step_avg:43.48ms
step:4600/20000 train_loss:2.1710 train_time:200091ms step_avg:43.50ms
step:4600/20000 val_loss:2.1597 val_bpb:1.2757 train_time:200114ms step_avg:43.50ms
step:4650/20000 train_loss:2.2563 train_time:202236ms step_avg:43.49ms
step:4700/20000 train_loss:2.2077 train_time:204381ms step_avg:43.49ms
step:4750/20000 train_loss:2.1328 train_time:206643ms step_avg:43.50ms
step:4800/20000 train_loss:2.1473 train_time:208788ms step_avg:43.50ms
step:4800/20000 val_loss:2.1579 val_bpb:1.2747 train_time:208812ms step_avg:43.50ms
step:4850/20000 train_loss:2.2067 train_time:210933ms step_avg:43.49ms
step:4900/20000 train_loss:2.1119 train_time:213078ms step_avg:43.49ms
step:4950/20000 train_loss:2.0031 train_time:215339ms step_avg:43.50ms
step:5000/20000 train_loss:2.1104 train_time:217483ms step_avg:43.50ms
step:5000/20000 val_loss:2.1532 val_bpb:1.2719 train_time:217508ms step_avg:43.50ms
step:5050/20000 train_loss:2.0232 train_time:219627ms step_avg:43.49ms
step:5100/20000 train_loss:2.1995 train_time:221774ms step_avg:43.49ms
step:5150/20000 train_loss:2.0709 train_time:224038ms step_avg:43.50ms
step:5200/20000 train_loss:2.0972 train_time:226182ms step_avg:43.50ms
step:5200/20000 val_loss:2.1501 val_bpb:1.2701 train_time:226207ms step_avg:43.50ms
step:5250/20000 train_loss:2.1395 train_time:228330ms step_avg:43.49ms
step:5300/20000 train_loss:2.0947 train_time:230476ms step_avg:43.49ms
step:5350/20000 train_loss:2.0819 train_time:232740ms step_avg:43.50ms
step:5400/20000 train_loss:2.2099 train_time:234884ms step_avg:43.50ms
step:5400/20000 val_loss:2.1475 val_bpb:1.2685 train_time:234909ms step_avg:43.50ms
step:5450/20000 train_loss:2.1314 train_time:237031ms step_avg:43.49ms
step:5500/20000 train_loss:2.2057 train_time:239295ms step_avg:43.51ms
step:5550/20000 train_loss:2.0856 train_time:241437ms step_avg:43.50ms
step:5600/20000 train_loss:2.1448 train_time:243583ms step_avg:43.50ms
step:5600/20000 val_loss:2.1455 val_bpb:1.2674 train_time:243608ms step_avg:43.50ms
step:5650/20000 train_loss:2.0312 train_time:245730ms step_avg:43.49ms
step:5700/20000 train_loss:2.1392 train_time:247996ms step_avg:43.51ms
step:5750/20000 train_loss:2.0206 train_time:250140ms step_avg:43.50ms
step:5800/20000 train_loss:2.2107 train_time:252283ms step_avg:43.50ms
step:5800/20000 val_loss:2.1439 val_bpb:1.2664 train_time:252308ms step_avg:43.50ms
step:5850/20000 train_loss:2.0973 train_time:254429ms step_avg:43.49ms
step:5900/20000 train_loss:2.1270 train_time:256697ms step_avg:43.51ms
step:5950/20000 train_loss:2.0899 train_time:258840ms step_avg:43.50ms
step:6000/20000 train_loss:2.2182 train_time:260985ms step_avg:43.50ms
step:6000/20000 val_loss:2.1445 val_bpb:1.2668 train_time:261009ms step_avg:43.50ms
step:6050/20000 train_loss:2.1230 train_time:263130ms step_avg:43.49ms
step:6100/20000 train_loss:2.1640 train_time:265401ms step_avg:43.51ms
step:6150/20000 train_loss:2.1960 train_time:267547ms step_avg:43.50ms
step:6200/20000 train_loss:2.1217 train_time:269692ms step_avg:43.50ms
step:6200/20000 val_loss:2.1416 val_bpb:1.2651 train_time:269717ms step_avg:43.50ms
step:6250/20000 train_loss:2.1106 train_time:271837ms step_avg:43.49ms
step:6300/20000 train_loss:2.1989 train_time:274105ms step_avg:43.51ms
step:6350/20000 train_loss:2.1738 train_time:276249ms step_avg:43.50ms
step:6400/20000 train_loss:2.1333 train_time:278396ms step_avg:43.50ms
step:6400/20000 val_loss:2.1377 val_bpb:1.2628 train_time:278421ms step_avg:43.50ms
step:6450/20000 train_loss:1.9696 train_time:280544ms step_avg:43.50ms
step:6500/20000 train_loss:2.1279 train_time:282815ms step_avg:43.51ms
step:6550/20000 train_loss:2.2768 train_time:284958ms step_avg:43.51ms
step:6600/20000 train_loss:2.1060 train_time:287102ms step_avg:43.50ms
step:6600/20000 val_loss:2.1354 val_bpb:1.2614 train_time:287126ms step_avg:43.50ms
step:6650/20000 train_loss:2.1036 train_time:289368ms step_avg:43.51ms
step:6700/20000 train_loss:2.1438 train_time:291511ms step_avg:43.51ms
step:6750/20000 train_loss:1.8938 train_time:293654ms step_avg:43.50ms
step:6800/20000 train_loss:2.1809 train_time:295799ms step_avg:43.50ms
step:6800/20000 val_loss:2.1342 val_bpb:1.2607 train_time:295824ms step_avg:43.50ms
step:6850/20000 train_loss:2.0978 train_time:298068ms step_avg:43.51ms
step:6900/20000 train_loss:2.1146 train_time:300210ms step_avg:43.51ms
step:6950/20000 train_loss:2.1328 train_time:302354ms step_avg:43.50ms
step:7000/20000 train_loss:2.1537 train_time:304499ms step_avg:43.50ms
step:7000/20000 val_loss:2.1326 val_bpb:1.2598 train_time:304523ms step_avg:43.50ms
step:7050/20000 train_loss:2.1382 train_time:306765ms step_avg:43.51ms
step:7100/20000 train_loss:2.1078 train_time:308911ms step_avg:43.51ms
step:7150/20000 train_loss:2.1952 train_time:311056ms step_avg:43.50ms
step:7200/20000 train_loss:2.1143 train_time:313204ms step_avg:43.50ms
step:7200/20000 val_loss:2.1299 val_bpb:1.2582 train_time:313228ms step_avg:43.50ms
step:7250/20000 train_loss:2.1009 train_time:315469ms step_avg:43.51ms
step:7300/20000 train_loss:2.1529 train_time:317612ms step_avg:43.51ms
step:7350/20000 train_loss:2.1532 train_time:319759ms step_avg:43.50ms
step:7400/20000 train_loss:2.1137 train_time:321901ms step_avg:43.50ms
step:7400/20000 val_loss:2.1282 val_bpb:1.2572 train_time:321927ms step_avg:43.50ms
step:7450/20000 train_loss:2.4067 train_time:324167ms step_avg:43.51ms
step:7500/20000 train_loss:2.0751 train_time:326311ms step_avg:43.51ms
step:7550/20000 train_loss:2.1258 train_time:328457ms step_avg:43.50ms
step:7600/20000 train_loss:2.1723 train_time:330730ms step_avg:43.52ms
step:7600/20000 val_loss:2.1289 val_bpb:1.2576 train_time:330754ms step_avg:43.52ms
step:7650/20000 train_loss:2.2193 train_time:332878ms step_avg:43.51ms
step:7700/20000 train_loss:2.1329 train_time:335023ms step_avg:43.51ms
step:7750/20000 train_loss:2.0562 train_time:337169ms step_avg:43.51ms
step:7800/20000 train_loss:2.1669 train_time:339436ms step_avg:43.52ms
step:7800/20000 val_loss:2.1252 val_bpb:1.2554 train_time:339460ms step_avg:43.52ms
step:7850/20000 train_loss:2.0994 train_time:341583ms step_avg:43.51ms
step:7900/20000 train_loss:2.1585 train_time:343729ms step_avg:43.51ms
step:7950/20000 train_loss:2.1319 train_time:345873ms step_avg:43.51ms
step:8000/20000 train_loss:2.2613 train_time:348141ms step_avg:43.52ms
step:8000/20000 val_loss:2.1232 val_bpb:1.2542 train_time:348165ms step_avg:43.52ms
step:8050/20000 train_loss:2.1775 train_time:350287ms step_avg:43.51ms
step:8100/20000 train_loss:1.9587 train_time:352431ms step_avg:43.51ms
step:8150/20000 train_loss:2.0401 train_time:354575ms step_avg:43.51ms
step:8200/20000 train_loss:2.1076 train_time:356845ms step_avg:43.52ms
step:8200/20000 val_loss:2.1228 val_bpb:1.2540 train_time:356869ms step_avg:43.52ms
step:8250/20000 train_loss:2.0951 train_time:358988ms step_avg:43.51ms
step:8300/20000 train_loss:2.2244 train_time:361133ms step_avg:43.51ms
step:8350/20000 train_loss:2.0681 train_time:363279ms step_avg:43.51ms
step:8400/20000 train_loss:2.1494 train_time:365552ms step_avg:43.52ms
step:8400/20000 val_loss:2.1201 val_bpb:1.2524 train_time:365577ms step_avg:43.52ms
step:8450/20000 train_loss:2.1278 train_time:367698ms step_avg:43.51ms
step:8500/20000 train_loss:2.0289 train_time:369845ms step_avg:43.51ms
step:8550/20000 train_loss:2.0465 train_time:372114ms step_avg:43.52ms
step:8600/20000 train_loss:2.0682 train_time:374259ms step_avg:43.52ms
step:8600/20000 val_loss:2.1206 val_bpb:1.2526 train_time:374282ms step_avg:43.52ms
step:8650/20000 train_loss:2.2717 train_time:376403ms step_avg:43.51ms
step:8700/20000 train_loss:2.1795 train_time:378549ms step_avg:43.51ms
step:8750/20000 train_loss:2.0492 train_time:380817ms step_avg:43.52ms
step:8800/20000 train_loss:2.1100 train_time:382964ms step_avg:43.52ms
step:8800/20000 val_loss:2.1192 val_bpb:1.2518 train_time:382989ms step_avg:43.52ms
step:8850/20000 train_loss:2.4323 train_time:385110ms step_avg:43.52ms
step:8900/20000 train_loss:2.1016 train_time:387258ms step_avg:43.51ms
step:8950/20000 train_loss:2.0290 train_time:389530ms step_avg:43.52ms
step:9000/20000 train_loss:2.1119 train_time:391675ms step_avg:43.52ms
step:9000/20000 val_loss:2.1204 val_bpb:1.2525 train_time:391698ms step_avg:43.52ms
step:9050/20000 train_loss:2.0826 train_time:393819ms step_avg:43.52ms
step:9100/20000 train_loss:2.0427 train_time:395963ms step_avg:43.51ms
step:9150/20000 train_loss:2.1201 train_time:398238ms step_avg:43.52ms
step:9200/20000 train_loss:2.1490 train_time:400385ms step_avg:43.52ms
step:9200/20000 val_loss:2.1170 val_bpb:1.2505 train_time:400409ms step_avg:43.52ms
step:9250/20000 train_loss:2.1221 train_time:402534ms step_avg:43.52ms
step:9300/20000 train_loss:2.4550 train_time:404680ms step_avg:43.51ms
step:9350/20000 train_loss:2.0384 train_time:406932ms step_avg:43.52ms
step:9400/20000 train_loss:2.0736 train_time:409077ms step_avg:43.52ms
step:9400/20000 val_loss:2.1139 val_bpb:1.2487 train_time:409102ms step_avg:43.52ms
step:9450/20000 train_loss:2.1096 train_time:411223ms step_avg:43.52ms
step:9500/20000 train_loss:2.1070 train_time:413493ms step_avg:43.53ms
step:9550/20000 train_loss:2.0249 train_time:415641ms step_avg:43.52ms
step:9600/20000 train_loss:2.1141 train_time:417785ms step_avg:43.52ms
step:9600/20000 val_loss:2.1138 val_bpb:1.2486 train_time:417809ms step_avg:43.52ms
step:9650/20000 train_loss:2.0183 train_time:419932ms step_avg:43.52ms
step:9700/20000 train_loss:2.1482 train_time:422212ms step_avg:43.53ms
step:9750/20000 train_loss:2.1811 train_time:424359ms step_avg:43.52ms
step:9800/20000 train_loss:2.1011 train_time:426503ms step_avg:43.52ms
step:9800/20000 val_loss:2.1143 val_bpb:1.2489 train_time:426528ms step_avg:43.52ms
step:9850/20000 train_loss:2.1134 train_time:428771ms step_avg:43.53ms
step:9900/20000 train_loss:2.0497 train_time:430915ms step_avg:43.53ms
step:9950/20000 train_loss:2.1989 train_time:433061ms step_avg:43.52ms
step:10000/20000 train_loss:2.1982 train_time:435207ms step_avg:43.52ms
step:10000/20000 val_loss:2.1122 val_bpb:1.2477 train_time:435232ms step_avg:43.52ms
step:10050/20000 train_loss:2.0940 train_time:437485ms step_avg:43.53ms
step:10100/20000 train_loss:2.1277 train_time:439630ms step_avg:43.53ms
step:10150/20000 train_loss:2.0896 train_time:441773ms step_avg:43.52ms
step:10200/20000 train_loss:2.0642 train_time:443918ms step_avg:43.52ms
step:10200/20000 val_loss:2.1112 val_bpb:1.2471 train_time:443941ms step_avg:43.52ms
step:10250/20000 train_loss:2.0627 train_time:446192ms step_avg:43.53ms
step:10300/20000 train_loss:2.2191 train_time:448339ms step_avg:43.53ms
step:10350/20000 train_loss:2.1354 train_time:450485ms step_avg:43.53ms
step:10400/20000 train_loss:2.0705 train_time:452630ms step_avg:43.52ms
step:10400/20000 val_loss:2.1098 val_bpb:1.2463 train_time:452654ms step_avg:43.52ms
step:10450/20000 train_loss:2.0663 train_time:454900ms step_avg:43.53ms
step:10500/20000 train_loss:2.1334 train_time:457046ms step_avg:43.53ms
step:10550/20000 train_loss:2.1931 train_time:459192ms step_avg:43.53ms
step:10600/20000 train_loss:2.0978 train_time:461337ms step_avg:43.52ms
step:10600/20000 val_loss:2.1081 val_bpb:1.2453 train_time:461361ms step_avg:43.52ms
step:10650/20000 train_loss:2.0676 train_time:463610ms step_avg:43.53ms
step:10700/20000 train_loss:2.2333 train_time:465754ms step_avg:43.53ms
step:10750/20000 train_loss:2.1661 train_time:467899ms step_avg:43.53ms
step:10800/20000 train_loss:2.0966 train_time:470044ms step_avg:43.52ms
step:10800/20000 val_loss:2.1081 val_bpb:1.2453 train_time:470069ms step_avg:43.52ms
step:10850/20000 train_loss:2.0708 train_time:472323ms step_avg:43.53ms
step:10900/20000 train_loss:2.1666 train_time:474468ms step_avg:43.53ms
step:10950/20000 train_loss:2.1079 train_time:476615ms step_avg:43.53ms
step:11000/20000 train_loss:2.0774 train_time:478893ms step_avg:43.54ms
step:11000/20000 val_loss:2.1069 val_bpb:1.2446 train_time:478917ms step_avg:43.54ms
step:11050/20000 train_loss:2.1288 train_time:481038ms step_avg:43.53ms
step:11100/20000 train_loss:2.0801 train_time:483185ms step_avg:43.53ms
step:11150/20000 train_loss:1.8743 train_time:485331ms step_avg:43.53ms
step:11200/20000 train_loss:2.1471 train_time:487603ms step_avg:43.54ms
step:11200/20000 val_loss:2.1080 val_bpb:1.2452 train_time:487627ms step_avg:43.54ms
step:11250/20000 train_loss:2.2046 train_time:489748ms step_avg:43.53ms
step:11300/20000 train_loss:2.0957 train_time:491892ms step_avg:43.53ms
step:11350/20000 train_loss:2.0963 train_time:494038ms step_avg:43.53ms
step:11400/20000 train_loss:2.3223 train_time:496318ms step_avg:43.54ms
step:11400/20000 val_loss:2.1051 val_bpb:1.2435 train_time:496342ms step_avg:43.54ms
step:11450/20000 train_loss:2.0724 train_time:498464ms step_avg:43.53ms
step:11500/20000 train_loss:2.1197 train_time:500609ms step_avg:43.53ms
step:11550/20000 train_loss:2.0975 train_time:502754ms step_avg:43.53ms
step:11600/20000 train_loss:2.1091 train_time:505029ms step_avg:43.54ms
step:11600/20000 val_loss:2.1054 val_bpb:1.2437 train_time:505053ms step_avg:43.54ms
step:11650/20000 train_loss:2.1235 train_time:507175ms step_avg:43.53ms
step:11700/20000 train_loss:2.0795 train_time:509324ms step_avg:43.53ms
step:11750/20000 train_loss:2.0662 train_time:511469ms step_avg:43.53ms
step:11800/20000 train_loss:2.0765 train_time:513742ms step_avg:43.54ms
step:11800/20000 val_loss:2.1048 val_bpb:1.2433 train_time:513766ms step_avg:43.54ms
step:11850/20000 train_loss:2.1202 train_time:515888ms step_avg:43.53ms
step:11900/20000 train_loss:2.1029 train_time:518033ms step_avg:43.53ms
step:11950/20000 train_loss:2.1512 train_time:520308ms step_avg:43.54ms
step:12000/20000 train_loss:2.1814 train_time:522453ms step_avg:43.54ms
step:12000/20000 val_loss:2.1029 val_bpb:1.2422 train_time:522477ms step_avg:43.54ms
step:12050/20000 train_loss:2.1085 train_time:524601ms step_avg:43.54ms
step:12100/20000 train_loss:2.0347 train_time:526747ms step_avg:43.53ms
step:12150/20000 train_loss:2.0601 train_time:529018ms step_avg:43.54ms
step:12200/20000 train_loss:2.0387 train_time:531162ms step_avg:43.54ms
step:12200/20000 val_loss:2.1021 val_bpb:1.2418 train_time:531186ms step_avg:43.54ms
step:12250/20000 train_loss:2.0381 train_time:533312ms step_avg:43.54ms
step:12300/20000 train_loss:2.1302 train_time:535458ms step_avg:43.53ms
step:12350/20000 train_loss:2.1272 train_time:537727ms step_avg:43.54ms
step:12400/20000 train_loss:2.1828 train_time:539873ms step_avg:43.54ms
step:12400/20000 val_loss:2.1001 val_bpb:1.2406 train_time:539897ms step_avg:43.54ms
step:12450/20000 train_loss:2.1003 train_time:542019ms step_avg:43.54ms
step:12500/20000 train_loss:2.0696 train_time:544164ms step_avg:43.53ms
step:12550/20000 train_loss:2.1302 train_time:546436ms step_avg:43.54ms
step:12600/20000 train_loss:2.0527 train_time:548582ms step_avg:43.54ms
step:12600/20000 val_loss:2.0998 val_bpb:1.2404 train_time:548606ms step_avg:43.54ms
step:12650/20000 train_loss:2.1438 train_time:550728ms step_avg:43.54ms
step:12700/20000 train_loss:2.2689 train_time:552877ms step_avg:43.53ms
step:12750/20000 train_loss:2.1438 train_time:555147ms step_avg:43.54ms
step:12800/20000 train_loss:2.0105 train_time:557293ms step_avg:43.54ms
step:12800/20000 val_loss:2.0930 val_bpb:1.2364 train_time:557317ms step_avg:43.54ms
step:12850/20000 train_loss:2.0413 train_time:559440ms step_avg:43.54ms
step:12900/20000 train_loss:2.0630 train_time:561586ms step_avg:43.53ms
step:12950/20000 train_loss:2.1627 train_time:563863ms step_avg:43.54ms
step:13000/20000 train_loss:1.9579 train_time:566009ms step_avg:43.54ms
step:13000/20000 val_loss:2.0859 val_bpb:1.2322 train_time:566032ms step_avg:43.54ms
step:13050/20000 train_loss:2.0206 train_time:568155ms step_avg:43.54ms
step:13100/20000 train_loss:1.9294 train_time:570432ms step_avg:43.54ms
step:13150/20000 train_loss:2.0689 train_time:572576ms step_avg:43.54ms
step:13200/20000 train_loss:2.0074 train_time:574722ms step_avg:43.54ms
step:13200/20000 val_loss:2.0790 val_bpb:1.2281 train_time:574747ms step_avg:43.54ms
step:13250/20000 train_loss:2.0596 train_time:576871ms step_avg:43.54ms
step:13300/20000 train_loss:1.9474 train_time:579143ms step_avg:43.54ms
step:13350/20000 train_loss:2.0459 train_time:581289ms step_avg:43.54ms
step:13400/20000 train_loss:2.0441 train_time:583434ms step_avg:43.54ms
step:13400/20000 val_loss:2.0718 val_bpb:1.2239 train_time:583458ms step_avg:43.54ms
step:13450/20000 train_loss:2.1638 train_time:585582ms step_avg:43.54ms
step:13500/20000 train_loss:2.1216 train_time:587857ms step_avg:43.54ms
step:13550/20000 train_loss:2.1855 train_time:590003ms step_avg:43.54ms
step:13600/20000 train_loss:2.0234 train_time:592147ms step_avg:43.54ms
step:13600/20000 val_loss:2.0649 val_bpb:1.2197 train_time:592172ms step_avg:43.54ms
step:13650/20000 train_loss:2.0316 train_time:594295ms step_avg:43.54ms
step:13700/20000 train_loss:2.0323 train_time:596577ms step_avg:43.55ms
step:13750/20000 train_loss:1.9910 train_time:598726ms step_avg:43.54ms
step:13780/20000 val_loss:2.0606 val_bpb:1.2172 train_time:600038ms step_avg:43.54ms
stopping_early: wallclock_cap train_time:600038ms step:13780/20000
peak memory allocated: 10184 MiB reserved: 10200 MiB
Serialized model: 67224983 bytes
Code size: 47642 bytes
Total submission size: 67272625 bytes
Serialized model int8+zlib: 15815847 bytes (payload:17178912 raw_torch:17224025 payload_ratio:3.91x)
Total submission size int8+zlib: 15863489 bytes
final_int8_zlib_roundtrip val_loss:2.0727 val_bpb:1.2244 eval_time:1401ms
final_int8_zlib_roundtrip_exact val_loss:2.07269931 val_bpb:1.22436570