1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
|
2025-12-25 09:14:36,278 - INFO - Loaded dataset: math-500
2025-12-25 09:14:36,576 - INFO - Loaded 100 profiles from ../data/complex_profiles_v2/profiles_100.jsonl
2025-12-25 09:14:36,577 - INFO - Running method: vanilla
2025-12-25 09:14:36,577 - INFO - Profile 1/20
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:06<00:18, 6.14s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:12<00:12, 6.32s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:18<00:05, 5.91s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 3.76s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.62s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:06<00:25, 6.47s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:09<00:14, 4.69s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:15<00:10, 5.16s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:18<00:04, 4.17s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:20<00:00, 3.31s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:20<00:00, 4.01s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:09, 3.29s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:06<00:06, 3.29s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:12<00:04, 4.32s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:12<00:00, 2.82s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:12<00:00, 3.16s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.54s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:07, 2.66s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.96s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:10<00:02, 2.60s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:11<00:00, 2.05s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:11<00:00, 2.34s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.16s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.75s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:12<00:04, 4.38s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00, 2.86s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00, 3.32s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.54s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:07, 2.66s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.99s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:02, 2.85s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.20s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.47s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:06<00:19, 6.65s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:09<00:09, 4.64s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:14<00:04, 4.85s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.11s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.86s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.93s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:08, 2.95s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.20s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:03, 3.08s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.37s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.62s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:14, 4.97s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.03s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:19<00:07, 7.30s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:20<00:00, 4.66s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:20<00:00, 5.04s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:07, 2.61s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:06, 3.03s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:03, 3.04s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.34s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.54s/it]
2025-12-25 09:18:54,429 - INFO - Profile 2/20
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:11, 3.88s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:06, 3.47s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:15<00:05, 5.58s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.56s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.91s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.93s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.42s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.86s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:03, 3.26s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.49s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.62s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.43s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.96s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.02s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.84s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.26s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.92s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.45s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 3.00s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:12<00:03, 3.59s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.67s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.78s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.12s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.80s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:05, 5.89s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.84s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.19s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.92s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.50s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.98s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.69s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.80s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.87s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.46s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.97s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:05, 5.80s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.73s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.16s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.51s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:06, 3.01s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.62s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.73s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.83s/it]
2025-12-25 09:22:06,817 - ERROR - Error in session: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 39.49 GiB of which 12.31 MiB is free. Including non-PyTorch memory, this process has 39.47 GiB memory in use. Of the allocated memory 38.77 GiB is allocated by PyTorch, and 210.60 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
2025-12-25 09:22:06,836 - ERROR - Full traceback:
Traceback (most recent call last):
File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/collaborativeagents/scripts/run_experiments.py", line 192, in run_single_session
agent_adapter.initialize()
File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/collaborativeagents/adapters/personalized_llm_adapter.py", line 87, in initialize
self._llm = PersonalizedLLM(
^^^^^^^^^^^^^^^^
File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/serving/personalized_llm.py", line 227, in __init__
self._load_models()
File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/serving/personalized_llm.py", line 318, in _load_models
self._extractor = get_preference_extractor("rule")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/config/registry.py", line 123, in get_preference_extractor
return QwenRuleExtractor(
^^^^^^^^^^^^^^^^^^
File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/models/preference_extractor/rule_extractor.py", line 36, in __init__
self.model = AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5048, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5468, in _load_pretrained_model
_error_msgs, disk_offload_index = load_shard_file(args)
^^^^^^^^^^^^^^^^^^^^^
File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 843, in load_shard_file
disk_offload_index = _load_state_dict_into_meta_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 770, in _load_state_dict_into_meta_model
_load_parameter_into_model(model, param_name, param.to(param_device))
^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 39.49 GiB of which 12.31 MiB is free. Including non-PyTorch memory, this process has 39.47 GiB memory in use. Of the allocated memory 38.77 GiB is allocated by PyTorch, and 210.60 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:10, 3.61s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.65s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.37s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.05s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.34s/it]
2025-12-25 09:22:24,976 - INFO - Based on the current allocation process, no modules could be assigned to the following devices due to insufficient memory:
- 0: 2484944896 bytes required
These minimum requirements are specific to this allocation attempt and may vary. Consider increasing the available memory for these devices to at least the specified minimum, or adjusting the model config.
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.51s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.98s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.65s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.79s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.86s/it]
2025-12-25 09:22:39,682 - INFO - Based on the current allocation process, no modules could be assigned to the following devices due to insufficient memory:
- 0: 560343040 bytes required
These minimum requirements are specific to this allocation attempt and may vary. Consider increasing the available memory for these devices to at least the specified minimum, or adjusting the model config.
2025-12-25 09:22:44,762 - ERROR - Error in session: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 39.49 GiB of which 12.31 MiB is free. Including non-PyTorch memory, this process has 39.47 GiB memory in use. Of the allocated memory 38.76 GiB is allocated by PyTorch, and 219.26 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
2025-12-25 09:22:44,764 - ERROR - Full traceback:
Traceback (most recent call last):
File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/collaborativeagents/scripts/run_experiments.py", line 192, in run_single_session
agent_adapter.initialize()
File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/collaborativeagents/adapters/personalized_llm_adapter.py", line 87, in initialize
self._llm = PersonalizedLLM(
^^^^^^^^^^^^^^^^
File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/serving/personalized_llm.py", line 227, in __init__
self._load_models()
File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/serving/personalized_llm.py", line 318, in _load_models
self._extractor = get_preference_extractor("rule")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/config/registry.py", line 123, in get_preference_extractor
return QwenRuleExtractor(
^^^^^^^^^^^^^^^^^^
File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/models/preference_extractor/rule_extractor.py", line 36, in __init__
self.model = AutoModelForCausalLM.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 277, in _wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5048, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5468, in _load_pretrained_model
_error_msgs, disk_offload_index = load_shard_file(args)
^^^^^^^^^^^^^^^^^^^^^
File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 843, in load_shard_file
disk_offload_index = _load_state_dict_into_meta_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 770, in _load_state_dict_into_meta_model
_load_parameter_into_model(model, param_name, param.to(param_device))
^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 39.49 GiB of which 12.31 MiB is free. Including non-PyTorch memory, this process has 39.47 GiB memory in use. Of the allocated memory 38.76 GiB is allocated by PyTorch, and 219.26 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
2025-12-25 09:22:44,854 - INFO - Profile 3/20
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:11, 3.88s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.75s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.21s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.97s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.30s/it]
2025-12-25 09:23:02,714 - INFO - Based on the current allocation process, no modules could be assigned to the following devices due to insufficient memory:
- 0: 2484944896 bytes required
These minimum requirements are specific to this allocation attempt and may vary. Consider increasing the available memory for these devices to at least the specified minimum, or adjusting the model config.
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.44s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:06, 3.02s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:12<00:03, 3.29s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.59s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.70s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.07s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.80s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.28s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.99s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.35s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.92s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.44s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.95s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:02, 2.86s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.26s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.44s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.26s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.85s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:15<00:05, 5.77s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.68s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.09s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.93s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.57s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:06, 3.01s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:02, 2.86s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.35s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.51s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.33s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.78s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.06s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.85s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.24s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.40s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:06, 3.09s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:03, 3.01s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.31s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.51s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:14, 4.67s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.90s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:05, 6.00s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.69s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.17s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.90s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:09, 3.15s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.32s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:12<00:03, 3.04s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.41s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.67s/it]
2025-12-25 09:26:54,931 - INFO - Profile 4/20
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.57s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.99s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:05, 5.94s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.66s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.14s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.93s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:08, 2.82s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.34s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:02, 2.97s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.29s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.56s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:14, 4.85s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.20s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:06, 6.13s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.77s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.29s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:08, 2.79s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:06, 3.17s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:03, 3.07s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.42s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.62s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.35s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.83s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:05, 5.89s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.63s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.08s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.49s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.94s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:02, 2.91s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.26s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.45s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.14s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.83s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:05, 5.91s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.64s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.08s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:10, 3.42s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:10<00:07, 3.63s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.52s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.74s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 3.00s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.15s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.99s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.02s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.71s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.16s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:03<00:15, 3.79s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:07<00:11, 3.92s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:11<00:07, 3.98s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:15<00:03, 3.98s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:17<00:00, 2.97s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:17<00:00, 3.40s/it]
2025-12-25 09:30:47,307 - INFO - Profile 5/20
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:11, 3.89s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.52s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:15<00:05, 5.59s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.45s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.84s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.73s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.24s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.34s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.35s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.53s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.85s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:14, 4.72s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.97s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:06, 6.26s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.86s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.33s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.84s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.27s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.03s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.67s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.56s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.88s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.23s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.01s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:06, 6.46s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.98s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.41s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.82s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.32s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.08s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.85s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.68s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.98s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:05<00:16, 5.54s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:09<00:08, 4.42s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:21<00:07, 7.90s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:21<00:00, 4.84s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:21<00:00, 5.34s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.68s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.27s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.29s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:15<00:04, 4.12s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 2.85s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 3.13s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.05s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.68s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.26s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.85s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.24s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.82s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.24s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.07s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.78s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.63s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.93s/it]
2025-12-25 09:34:45,922 - INFO - Profile 6/20
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.60s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 4.00s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:18<00:07, 7.07s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:19<00:00, 4.34s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:19<00:00, 4.77s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.66s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.27s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.12s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.72s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.59s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.90s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:10, 3.50s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.56s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:15<00:05, 5.71s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.52s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.88s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.59s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:10, 3.39s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.19s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.84s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.67s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.98s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:11, 3.89s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.73s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:15<00:05, 5.84s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.60s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.01s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.64s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:08, 3.00s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.99s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.62s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.53s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.81s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.56s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.05s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:18<00:06, 6.73s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.14s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.58s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.83s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:08, 2.89s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.93s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.68s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.56s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.83s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:14, 4.79s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.99s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:06, 6.23s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.83s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.32s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.92s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:07<00:11, 3.74s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:10<00:06, 3.38s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:15<00:03, 3.99s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 2.76s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 3.14s/it]
2025-12-25 09:38:42,663 - INFO - Profile 7/20
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:11, 3.81s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.75s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.03s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.71s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.11s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:10, 3.41s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.14s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.71s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.58s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.93s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.18s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.02s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.24s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.84s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.28s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.75s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:10, 3.57s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.28s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.78s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.63s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.99s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.38s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.96s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.19s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.81s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.26s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:03<00:12, 3.15s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:10, 3.43s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.20s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.85s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 2.68s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 3.03s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.46s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.07s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:06, 6.25s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.85s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.32s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.89s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.27s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.11s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.58s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.50s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.85s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.10s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.85s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.21s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.82s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.24s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.88s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.32s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.09s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.91s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 2.72s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 3.02s/it]
2025-12-25 09:42:41,214 - INFO - Profile 8/20
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:14, 4.93s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.22s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:06, 6.45s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.97s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.48s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.81s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:10, 3.60s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.26s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.66s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 2.71s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 3.03s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:05<00:15, 5.27s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.33s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:06, 6.22s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.83s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.39s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.86s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.25s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.07s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.54s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.47s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.82s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.29s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.32s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:18<00:06, 6.65s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.09s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.55s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.89s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.27s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.11s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.59s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.51s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.85s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:11, 3.95s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.73s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.32s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.89s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.28s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.70s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.19s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.09s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.64s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.54s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.86s/it]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.55s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.90s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:18<00:06, 6.96s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.28s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.69s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.78s/it][2025-12-25T09:46:14.933] error: *** JOB 14355902 ON gpua065 CANCELLED AT 2025-12-25T09:46:14 DUE to SIGNAL Terminated ***
|