1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
|
2025-12-25 22:08:21,888 - INFO - Loaded dataset: math-500
2025-12-25 22:08:22,033 - INFO - Loaded 100 profiles from ../data/complex_profiles_v2/profiles_100.jsonl
2025-12-25 22:08:22,034 - INFO - Running method: vanilla
`torch_dtype` is deprecated! Use `dtype` instead!
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:04<00:14, 4.85s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:08<00:08, 4.08s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:13<00:04, 4.69s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:14<00:00, 3.07s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:14<00:00, 3.60s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:03<00:15, 3.81s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:06<00:09, 3.25s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:10<00:06, 3.45s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:12<00:03, 3.07s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.49s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:14<00:00, 2.86s/it]
2025-12-25 22:08:59,678 - INFO - Profile 1/5
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:10<00:32, 10.97s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:15<00:14, 7.08s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:19<00:05, 5.68s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:20<00:00, 4.06s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:20<00:00, 5.23s/it]
2025-12-25 22:12:43,460 - WARNING - User agent failed to respond at turn 4
2025-12-25 22:14:39,792 - WARNING - User agent failed to respond at turn 3
2025-12-25 22:14:39,793 - INFO - Profile 2/5
2025-12-25 22:17:30,565 - INFO - Profile 3/5
2025-12-25 22:20:00,571 - INFO - Profile 4/5
2025-12-25 22:23:05,146 - WARNING - User agent failed to respond at turn 4
2025-12-25 22:23:35,365 - INFO - Profile 5/5
2025-12-25 22:26:59,994 - INFO - Running method: all_memory
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:08<00:24, 8.06s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:11<00:10, 5.48s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:20<00:07, 7.15s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:21<00:00, 4.51s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:21<00:00, 5.33s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:05<00:22, 5.64s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:08<00:12, 4.14s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:14<00:10, 5.08s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:17<00:03, 3.98s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:18<00:00, 2.97s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:18<00:00, 3.68s/it]
2025-12-25 22:28:23,340 - INFO - Profile 1/5
The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
2025-12-25 22:29:58,580 - WARNING - User agent failed to respond at turn 3
2025-12-25 22:30:39,359 - INFO - Profile 2/5
2025-12-25 22:34:23,370 - INFO - Profile 3/5
2025-12-25 22:35:04,289 - WARNING - User agent failed to respond at turn 2
2025-12-25 22:35:30,064 - WARNING - User agent failed to respond at turn 2
2025-12-25 22:36:33,412 - WARNING - User agent failed to respond at turn 6
2025-12-25 22:36:33,412 - INFO - Profile 4/5
2025-12-25 22:38:38,658 - WARNING - User agent failed to respond at turn 3
2025-12-25 22:39:23,955 - INFO - Profile 5/5
2025-12-25 22:42:19,402 - INFO - Running method: rag
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:06<00:20, 6.89s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:12<00:12, 6.09s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:19<00:06, 6.42s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:19<00:00, 4.16s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:19<00:00, 4.98s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:06<00:24, 6.16s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:11<00:16, 5.55s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:17<00:11, 5.90s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:22<00:05, 5.37s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:23<00:00, 4.06s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:23<00:00, 4.78s/it]
2025-12-25 22:43:31,077 - INFO - Profile 1/5
2025-12-25 22:46:19,125 - INFO - Profile 2/5
2025-12-25 22:49:18,368 - INFO - Profile 3/5
2025-12-25 22:53:00,495 - WARNING - User agent failed to respond at turn 3
2025-12-25 22:53:00,497 - INFO - Profile 4/5
2025-12-25 22:54:01,784 - WARNING - User agent failed to respond at turn 3
2025-12-25 22:58:07,157 - INFO - Profile 5/5
2025-12-25 22:58:54,351 - WARNING - User agent failed to respond at turn 3
2025-12-25 22:59:40,507 - WARNING - User agent failed to respond at turn 2
2025-12-25 23:00:10,569 - INFO - Running method: rag_vector
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:07<00:22, 7.37s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:10<00:10, 5.14s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:18<00:06, 6.24s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 3.96s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00, 4.74s/it]
Loading checkpoint shards: 0%| | 0/5 [00:00<?, ?it/s]
Loading checkpoint shards: 20%|██ | 1/5 [00:06<00:25, 6.35s/it]
Loading checkpoint shards: 40%|████ | 2/5 [00:09<00:12, 4.31s/it]
Loading checkpoint shards: 60%|██████ | 3/5 [00:15<00:10, 5.22s/it]
Loading checkpoint shards: 80%|████████ | 4/5 [00:18<00:04, 4.18s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:19<00:00, 3.16s/it]
Loading checkpoint shards: 100%|██████████| 5/5 [00:19<00:00, 3.89s/it]
2025-12-25 23:01:19,774 - INFO - Profile 1/5
2025-12-25 23:03:56,207 - INFO - Profile 2/5
2025-12-25 23:06:30,341 - WARNING - User agent failed to respond at turn 2
2025-12-25 23:06:30,342 - INFO - Profile 3/5
2025-12-25 23:09:50,352 - WARNING - User agent failed to respond at turn 7
2025-12-25 23:11:12,291 - WARNING - User agent failed to respond at turn 4
2025-12-25 23:11:12,293 - INFO - Profile 4/5
2025-12-25 23:14:00,507 - WARNING - User agent failed to respond at turn 2
2025-12-25 23:15:21,185 - INFO - Profile 5/5
2025-12-25 23:17:09,189 - WARNING - User agent failed to respond at turn 4
2025-12-25 23:17:38,489 - INFO - Running method: contextual
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:05<00:17, 5.83s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:11<00:11, 5.97s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:17<00:05, 5.85s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:19<00:00, 4.31s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:19<00:00, 4.89s/it]
2025-12-25 23:17:58,530 - INFO - Profile 1/5
2025-12-25 23:22:15,614 - WARNING - User agent failed to respond at turn 5
2025-12-25 23:23:01,495 - INFO - Profile 2/5
2025-12-25 23:26:21,325 - WARNING - User agent failed to respond at turn 12
2025-12-25 23:26:21,326 - INFO - Profile 3/5
2025-12-25 23:29:17,191 - WARNING - User agent failed to respond at turn 7
2025-12-25 23:30:41,180 - INFO - Profile 4/5
2025-12-25 23:31:24,578 - WARNING - User agent failed to respond at turn 2
2025-12-25 23:33:26,694 - WARNING - User agent failed to respond at turn 6
2025-12-25 23:35:25,025 - WARNING - User agent failed to respond at turn 6
2025-12-25 23:35:25,025 - INFO - Profile 5/5
2025-12-25 23:36:14,963 - WARNING - User agent failed to respond at turn 3
2025-12-25 23:37:33,084 - WARNING - User agent failed to respond at turn 3
2025-12-25 23:38:11,089 - INFO - Running method: reflection
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:05<00:17, 5.99s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:09<00:09, 4.61s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:13<00:04, 4.32s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.35s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:15<00:00, 3.87s/it]
2025-12-25 23:38:27,190 - INFO - Profile 1/5
2025-12-25 23:41:28,520 - WARNING - User agent failed to respond at turn 3
2025-12-25 23:42:37,103 - INFO - Profile 2/5
2025-12-25 23:46:33,054 - WARNING - User agent failed to respond at turn 7
2025-12-25 23:46:46,658 - INFO - Profile 3/5
2025-12-25 23:49:40,906 - WARNING - User agent failed to respond at turn 4
2025-12-25 23:50:58,786 - WARNING - User agent failed to respond at turn 2
2025-12-25 23:51:12,246 - INFO - Profile 4/5
2025-12-25 23:52:14,159 - WARNING - User agent failed to respond at turn 4
2025-12-25 23:55:01,535 - WARNING - User agent failed to respond at turn 4
2025-12-25 23:56:57,317 - INFO - Profile 5/5
2025-12-25 23:58:27,891 - WARNING - User agent failed to respond at turn 2
2025-12-25 23:59:29,746 - INFO - Running method: reflection_grpo
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:07<00:21, 7.21s/it]
Loading checkpoint shards: 50%|█████ | 2/4 [00:12<00:12, 6.18s/it]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:18<00:05, 5.82s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:19<00:00, 4.28s/it]
Loading checkpoint shards: 100%|██████████| 4/4 [00:19<00:00, 4.99s/it]
2025-12-25 23:59:50,260 - INFO - Profile 1/5
2025-12-26 00:00:31,696 - WARNING - User agent failed to respond at turn 3
2025-12-26 00:03:33,202 - INFO - Profile 2/5
2025-12-26 00:06:53,817 - INFO - Profile 3/5
2025-12-26 00:10:53,169 - WARNING - User agent failed to respond at turn 4
2025-12-26 00:12:53,034 - WARNING - User agent failed to respond at turn 4
2025-12-26 00:13:06,491 - INFO - Profile 4/5
2025-12-26 00:13:59,355 - WARNING - User agent failed to respond at turn 3
2025-12-26 00:18:16,345 - INFO - Profile 5/5
2025-12-26 00:18:53,569 - WARNING - User agent failed to respond at turn 3
2025-12-26 00:19:48,324 - WARNING - User agent failed to respond at turn 2
2025-12-26 00:20:53,392 - WARNING - User agent failed to respond at turn 3
2025-12-26 00:21:06,861 - INFO - Report saved to ../results/multiturn_test_20251225_220813/20251225_220821/report.md
|