Group-Entropy-Equalization/README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134

# One-shot Entropy Minimization

[![paper](https://img.shields.io/badge/paper-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2505.20282)
[![Model](https://img.shields.io/badge/Models/Dataset-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/zgao3186/qwen25math7b-one-shot-em/)
[![Notion](https://img.shields.io/badge/Site-000000.svg?style=for-the-badge&logo=notion&logoColor=white)](https://www.notion.so/One-shot-Entropy-Minimization-202606db813b80639773f850f39246a5) 

### Installation

```bash
conda create -n one-shot-em python=3.10 -y
pip install -r requirements.txt
```

---

### Colab Quickstart (single-GPU, no DeepSpeed)

In Colab, use a smaller model first to verify end-to-end. Then scale up if VRAM allows.

```bash
!git clone https://github.com/YurenHao0426/gee.git
%cd /content/gee/Group-Entropy-Equalization
!pip -q install transformers==4.44.2 accelerate==0.33.0 peft==0.12.0 bitsandbytes==0.43.3 datasets==2.21.0 wandb==0.17.7 pyarrow==17.0.0
```

Create a small parquet if you don’t have one:

```python
import os, pandas as pd
os.makedirs("dataset/1shot_rlvr", exist_ok=True)
df = pd.DataFrame({"problem": [
    "What is 2 + 2?",
    "If x=3, compute x^2 + 2x + 1.",
    "The doctor is a ____.",
    "Factor 12.",
    "What is 7*8?",
]})
df_big = pd.concat([df]*256, ignore_index=True).iloc[:1280]
df_big.to_parquet("dataset/1shot_rlvr/pi1_r1280.parquet", index=False)
```

Run training (no DeepSpeed, no AMP to avoid Colab GradScaler quirks):

```bash
!python train.py \
  --model_name Qwen2.5-1.5B \
  --model_path Qwen/Qwen2.5-1.5B \
  --train_data dataset/1shot_rlvr/pi1_r1280.parquet \
  --effective_batch 4 --micro_batch_size 1 \
  --temperature 0.5 --learning_rate 2e-5 --sample_temp 0.5 \
  --max_steps 10 --log_steps 1 --save_steps 10 \
  --run_name colab_em10 --wandb_project one-shot-em \
  --no_deepspeed --mixed_precision no
```

Checkpoints are saved under `checkpoints/<model>/<run_name>/`.

---

### Reproducing One-shot EM Training (SOTA)

```bash
accelerate launch train.py \
  --model_name Qwen2.5-Math-7B \
  --model_path /path/to/Qwen2.5-Math-7B \
  --train_data dataset/1shot_rlvr/pi1_r1280.parquet \
  --effective_batch 64 \
  --micro_batch_size 2 \
  --temperature 0.5 \
  --learning_rate 2e-5 \
  --max_steps 50 \
  --log_steps 1 \
  --save_steps 1 \
  --run_name one_shot \
  --wandb_project one-shot-em
```

---

### Reproducing Multi-shot EM Training

```bash
accelerate launch train.py \
  --model_name Qwen2.5-Math-7B \
  --model_path /path/to/Qwen2.5-Math-7B \
  --train_data dataset/numina/numina_00.parquet \
  --effective_batch 64 \
  --micro_batch_size 2 \
  --temperature 0.5 \
  --learning_rate 2e-5 \
  --max_steps 50 \
  --log_steps 1 \
  --save_steps 1 \
  --run_name multi_shot \
  --wandb_project one-shot-em
```

---

### Evaluation

```bash
cd Qwen2.5-Eval/evaluation
bash sh/eval_all_math.sh
```

---

### Acknowledgements

Our dataset references and builds upon the following open-source contributions:

- [NuminaMath-CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT)
- [DeepScaler](https://github.com/agentica-project/deepscaler)
- [One-shot RLVR](https://github.com/ypwang61/One-Shot-RLVR/) – for data selection strategies
- [Qwen2.5-Eval](https://github.com/QwenLM/Qwen2.5-Math/) – for evaluation benchmarks

We sincerely thank the authors and maintainers of these projects for their excellent contributions to the research community!


---

### Citation
```
@misc{gao2025oneshotentropyminimization,
      title={One-shot Entropy Minimization}, 
      author={Zitian Gao and Lynx Chen and Haoming Luo and Joey Zhou and Bryan Dai},
      year={2025},
      eprint={2505.20282},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.20282}, 
}
```