diff options
Diffstat (limited to 'collaborativeagents/slurm/logs/run_expts_a100_14355902.err')
| -rw-r--r-- | collaborativeagents/slurm/logs/run_expts_a100_14355902.err | 185 |
1 files changed, 185 insertions, 0 deletions
diff --git a/collaborativeagents/slurm/logs/run_expts_a100_14355902.err b/collaborativeagents/slurm/logs/run_expts_a100_14355902.err new file mode 100644 index 0000000..c0437af --- /dev/null +++ b/collaborativeagents/slurm/logs/run_expts_a100_14355902.err @@ -0,0 +1,185 @@ +2025-12-25 09:14:36,278 - INFO - Loaded dataset: math-500 +2025-12-25 09:14:36,576 - INFO - Loaded 100 profiles from ../data/complex_profiles_v2/profiles_100.jsonl +2025-12-25 09:14:36,577 - INFO - Running method: vanilla +2025-12-25 09:14:36,577 - INFO - Profile 1/20 +`torch_dtype` is deprecated! Use `dtype` instead! +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:06<00:18, 6.14s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:12<00:12, 6.32s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:18<00:05, 5.91s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 3.76s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.62s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:06<00:25, 6.47s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:09<00:14, 4.69s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:15<00:10, 5.16s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:18<00:04, 4.17s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:20<00:00, 3.31s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:20<00:00, 4.01s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:09, 3.29s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:06<00:06, 3.29s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:12<00:04, 4.32s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:12<00:00, 2.82s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:12<00:00, 3.16s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.54s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:07, 2.66s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.96s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:10<00:02, 2.60s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:11<00:00, 2.05s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:11<00:00, 2.34s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.16s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.75s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:12<00:04, 4.38s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00, 2.86s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:13<00:00, 3.32s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.54s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:07, 2.66s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.99s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:02, 2.85s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.20s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.47s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:06<00:19, 6.65s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:09<00:09, 4.64s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:14<00:04, 4.85s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.11s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.86s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.93s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:08, 2.95s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.20s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:03, 3.08s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.37s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.62s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:14, 4.97s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.03s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:19<00:07, 7.30s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:20<00:00, 4.66s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:20<00:00, 5.04s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:07, 2.61s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:06, 3.03s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:03, 3.04s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.34s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.54s/it] +2025-12-25 09:18:54,429 - INFO - Profile 2/20 +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:11, 3.88s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:06, 3.47s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:15<00:05, 5.58s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.56s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.91s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.93s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.42s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.86s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:03, 3.26s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.49s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.62s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.43s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.96s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.02s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.84s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.26s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.92s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.45s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 3.00s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:12<00:03, 3.59s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.67s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.78s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.12s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.80s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:05, 5.89s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.84s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.19s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.92s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.50s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.98s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.69s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.80s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.87s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.46s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.97s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:05, 5.80s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.73s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.16s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.51s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:06, 3.01s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.62s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.73s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.83s/it] +2025-12-25 09:22:06,817 - ERROR - Error in session: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 39.49 GiB of which 12.31 MiB is free. Including non-PyTorch memory, this process has 39.47 GiB memory in use. Of the allocated memory 38.77 GiB is allocated by PyTorch, and 210.60 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) +2025-12-25 09:22:06,836 - ERROR - Full traceback: +Traceback (most recent call last): + File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/collaborativeagents/scripts/run_experiments.py", line 192, in run_single_session + agent_adapter.initialize() + File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/collaborativeagents/adapters/personalized_llm_adapter.py", line 87, in initialize + self._llm = PersonalizedLLM( + ^^^^^^^^^^^^^^^^ + File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/serving/personalized_llm.py", line 227, in __init__ + self._load_models() + File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/serving/personalized_llm.py", line 318, in _load_models + self._extractor = get_preference_extractor("rule") + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/config/registry.py", line 123, in get_preference_extractor + return QwenRuleExtractor( + ^^^^^^^^^^^^^^^^^^ + File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/models/preference_extractor/rule_extractor.py", line 36, in __init__ + self.model = AutoModelForCausalLM.from_pretrained( + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained + return model_class.from_pretrained( + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 277, in _wrapper + return func(*args, **kwargs) + ^^^^^^^^^^^^^^^^^^^^^ + File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5048, in from_pretrained + ) = cls._load_pretrained_model( + ^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5468, in _load_pretrained_model + _error_msgs, disk_offload_index = load_shard_file(args) + ^^^^^^^^^^^^^^^^^^^^^ + File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 843, in load_shard_file + disk_offload_index = _load_state_dict_into_meta_model( + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context + return func(*args, **kwargs) + ^^^^^^^^^^^^^^^^^^^^^ + File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 770, in _load_state_dict_into_meta_model + _load_parameter_into_model(model, param_name, param.to(param_device)) + ^^^^^^^^^^^^^^^^^^^^^^ +torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 39.49 GiB of which 12.31 MiB is free. Including non-PyTorch memory, this process has 39.47 GiB memory in use. Of the allocated memory 38.77 GiB is allocated by PyTorch, and 210.60 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) + +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:10, 3.61s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.65s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.37s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.05s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.34s/it] +2025-12-25 09:22:24,976 - INFO - Based on the current allocation process, no modules could be assigned to the following devices due to insufficient memory: + - 0: 2484944896 bytes required +These minimum requirements are specific to this allocation attempt and may vary. Consider increasing the available memory for these devices to at least the specified minimum, or adjusting the model config. +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.51s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.98s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.65s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.79s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.86s/it] +2025-12-25 09:22:39,682 - INFO - Based on the current allocation process, no modules could be assigned to the following devices due to insufficient memory: + - 0: 560343040 bytes required +These minimum requirements are specific to this allocation attempt and may vary. Consider increasing the available memory for these devices to at least the specified minimum, or adjusting the model config. +2025-12-25 09:22:44,762 - ERROR - Error in session: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 39.49 GiB of which 12.31 MiB is free. Including non-PyTorch memory, this process has 39.47 GiB memory in use. Of the allocated memory 38.76 GiB is allocated by PyTorch, and 219.26 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) +2025-12-25 09:22:44,764 - ERROR - Full traceback: +Traceback (most recent call last): + File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/collaborativeagents/scripts/run_experiments.py", line 192, in run_single_session + agent_adapter.initialize() + File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/collaborativeagents/adapters/personalized_llm_adapter.py", line 87, in initialize + self._llm = PersonalizedLLM( + ^^^^^^^^^^^^^^^^ + File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/serving/personalized_llm.py", line 227, in __init__ + self._load_models() + File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/serving/personalized_llm.py", line 318, in _load_models + self._extractor = get_preference_extractor("rule") + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/config/registry.py", line 123, in get_preference_extractor + return QwenRuleExtractor( + ^^^^^^^^^^^^^^^^^^ + File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/src/personalization/models/preference_extractor/rule_extractor.py", line 36, in __init__ + self.model = AutoModelForCausalLM.from_pretrained( + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 604, in from_pretrained + return model_class.from_pretrained( + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 277, in _wrapper + return func(*args, **kwargs) + ^^^^^^^^^^^^^^^^^^^^^ + File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5048, in from_pretrained + ) = cls._load_pretrained_model( + ^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 5468, in _load_pretrained_model + _error_msgs, disk_offload_index = load_shard_file(args) + ^^^^^^^^^^^^^^^^^^^^^ + File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 843, in load_shard_file + disk_offload_index = _load_state_dict_into_meta_model( + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context + return func(*args, **kwargs) + ^^^^^^^^^^^^^^^^^^^^^ + File "/u/yurenh2/miniforge3/envs/eval/lib/python3.11/site-packages/transformers/modeling_utils.py", line 770, in _load_state_dict_into_meta_model + _load_parameter_into_model(model, param_name, param.to(param_device)) + ^^^^^^^^^^^^^^^^^^^^^^ +torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB. GPU 0 has a total capacity of 39.49 GiB of which 12.31 MiB is free. Including non-PyTorch memory, this process has 39.47 GiB memory in use. Of the allocated memory 38.76 GiB is allocated by PyTorch, and 219.26 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) + +2025-12-25 09:22:44,854 - INFO - Profile 3/20 +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:11, 3.88s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.75s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.21s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.97s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.30s/it] +2025-12-25 09:23:02,714 - INFO - Based on the current allocation process, no modules could be assigned to the following devices due to insufficient memory: + - 0: 2484944896 bytes required +These minimum requirements are specific to this allocation attempt and may vary. Consider increasing the available memory for these devices to at least the specified minimum, or adjusting the model config. +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.44s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:06, 3.02s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:12<00:03, 3.29s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.59s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.70s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.07s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.80s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.28s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.99s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.35s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.92s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.44s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.95s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:02, 2.86s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.26s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.44s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.26s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.85s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:15<00:05, 5.77s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.68s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.09s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.93s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.57s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:06, 3.01s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:02, 2.86s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.35s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.51s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.33s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.78s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.06s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.85s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.24s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.40s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:06, 3.09s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:03, 3.01s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.31s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.51s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:14, 4.67s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.90s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:05, 6.00s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.69s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.17s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.90s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:09, 3.15s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.32s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:12<00:03, 3.04s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.41s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.67s/it] +2025-12-25 09:26:54,931 - INFO - Profile 4/20 +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.57s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.99s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:05, 5.94s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.66s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.14s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.93s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:08, 2.82s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.34s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:02, 2.97s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.29s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.56s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:14, 4.85s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.20s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:06, 6.13s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.77s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.29s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:08, 2.79s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:06, 3.17s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:03, 3.07s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.42s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:13<00:00, 2.62s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.35s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.83s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:05, 5.89s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.63s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.08s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:04<00:07, 2.49s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.94s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:11<00:02, 2.91s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.26s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:12<00:00, 2.45s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.14s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.83s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:05, 5.91s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.64s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.08s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:01<00:07, 1.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:10, 3.42s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:10<00:07, 3.63s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.52s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.74s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 3.00s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.15s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.99s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.02s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.71s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.16s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:03<00:15, 3.79s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:07<00:11, 3.92s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:11<00:07, 3.98s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:15<00:03, 3.98s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:17<00:00, 2.97s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:17<00:00, 3.40s/it] +2025-12-25 09:30:47,307 - INFO - Profile 5/20 +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:11, 3.89s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.52s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:15<00:05, 5.59s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.45s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.84s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.73s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.24s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.34s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.35s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.53s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.85s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:14, 4.72s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.97s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:06, 6.26s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.86s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.33s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.84s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.27s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.03s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.67s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.56s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.88s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.23s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.01s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:06, 6.46s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.98s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.41s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.82s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.32s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.08s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.85s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.68s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.98s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:05<00:16, 5.54s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:09<00:08, 4.42s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:21<00:07, 7.90s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:21<00:00, 4.84s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:21<00:00, 5.34s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.68s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.27s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.29s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:15<00:04, 4.12s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 2.85s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 3.13s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.05s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.68s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.26s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.85s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.24s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.82s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.24s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.07s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.78s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.63s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.93s/it] +2025-12-25 09:34:45,922 - INFO - Profile 6/20 +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.60s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 4.00s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:18<00:07, 7.07s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:19<00:00, 4.34s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:19<00:00, 4.77s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.66s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.27s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.12s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.72s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.59s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.90s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:10, 3.50s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.56s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:15<00:05, 5.71s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.52s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.88s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.59s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:10, 3.39s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.19s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.84s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.67s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.98s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:11, 3.89s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.73s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:15<00:05, 5.84s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.60s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.01s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.64s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:08, 3.00s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.99s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.62s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.53s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.81s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.56s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.05s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:18<00:06, 6.73s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.14s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.58s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.83s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:05<00:08, 2.89s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:08<00:05, 2.93s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.68s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.56s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.83s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:14, 4.79s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.99s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:06, 6.23s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.83s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.32s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.92s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:07<00:11, 3.74s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:10<00:06, 3.38s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:15<00:03, 3.99s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 2.76s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 3.14s/it] +2025-12-25 09:38:42,663 - INFO - Profile 7/20 +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:11, 3.81s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.75s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.03s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.71s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.11s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.91s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:10, 3.41s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.14s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.71s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.58s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.93s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.18s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.02s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.24s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.84s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.28s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.75s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:10, 3.57s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.28s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.78s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.63s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.99s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.38s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:07, 3.96s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.19s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.81s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.26s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:03<00:12, 3.15s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:10, 3.43s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.20s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.85s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 2.68s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 3.03s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.46s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.07s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:06, 6.25s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.85s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.32s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.89s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.27s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.11s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.58s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.50s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.85s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.10s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.85s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.21s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 3.82s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:16<00:00, 4.24s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.88s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.32s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.09s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.91s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 2.72s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 3.02s/it] +2025-12-25 09:42:41,214 - INFO - Profile 8/20 +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:14, 4.93s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.22s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:06, 6.45s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.97s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.48s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.81s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:10, 3.60s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.26s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:14<00:03, 3.66s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 2.71s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:15<00:00, 3.03s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:05<00:15, 5.27s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.33s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:06, 6.22s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.83s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.39s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.86s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.25s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.07s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.54s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.47s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.82s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:12, 4.29s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.32s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:18<00:06, 6.65s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.09s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.55s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.89s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.27s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.11s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.59s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.51s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.85s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:03<00:11, 3.95s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.73s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:16<00:06, 6.32s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 3.89s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:17<00:00, 4.28s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:10, 2.70s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.19s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:09<00:06, 3.09s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:13<00:03, 3.64s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.54s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.86s/it] +
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:13, 4.55s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:07<00:07, 3.90s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:18<00:06, 6.96s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.28s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.69s/it] +
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:02<00:11, 2.78s/it][2025-12-25T09:46:14.933] error: *** JOB 14355902 ON gpua065 CANCELLED AT 2025-12-25T09:46:14 DUE to SIGNAL Terminated *** |
