2025-12-26 02:38:46,860 - INFO - Loaded dataset: gpqa 2025-12-26 02:38:46,861 - INFO - Loaded dataset: aime 2025-12-26 02:38:46,861 - INFO - Loaded dataset: math-hard 2025-12-26 02:38:46,861 - INFO - Loaded dataset: humaneval 2025-12-26 02:38:46,872 - INFO - Loaded 100 profiles from ../data/complex_profiles_v2/profiles_100.jsonl 2025-12-26 02:38:46,872 - INFO - Running method: vanilla `torch_dtype` is deprecated! Use `dtype` instead! Loading checkpoint shards: 0%| | 0/4 [00:00 main() File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/collaborativeagents/scripts/run_experiments.py", line 608, in main analysis = runner.run_all() ^^^^^^^^^^^^^^^^ File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/collaborativeagents/scripts/run_experiments.py", line 414, in run_all results = self.run_method(method) ^^^^^^^^^^^^^^^^^^^^^^^ File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/collaborativeagents/scripts/run_experiments.py", line 367, in run_method samples = dataset.get_testset() ^^^^^^^^^^^^^^^^^^^^^ File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/collaborativeagents/datasets_extended.py", line 71, in get_testset self._test_data = self._load_data("test")[:self.eval_size] ^^^^^^^^^^^^^^^^^^^^^^^ File "/projects/bfqt/users/yurenh2/ml-projects/personalization-user-model/collaborativeagents/datasets_extended.py", line 153, in _load_data solution=item["answer"], ~~~~^^^^^^^^^^ KeyError: 'answer'