diff options
| author | Yuren Hao <yurenh2@illinois.edu> | 2026-04-08 22:11:29 -0500 |
|---|---|---|
| committer | Yuren Hao <yurenh2@illinois.edu> | 2026-04-08 22:11:29 -0500 |
| commit | 5eb005d3724e2470f476995c3ebbcb67e8b04ca5 (patch) | |
| tree | 76590ab46200c74210fff3d5e5b5fa158e408e05 /README.md | |
| parent | 185404c245b360df5bc8398f9331c481c16f01f8 (diff) | |
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 2 |
1 files changed, 2 insertions, 0 deletions
@@ -2,6 +2,8 @@ [](https://arxiv.org/abs/2508.08833) [](https://huggingface.co/datasets/blackhao0426/PutnamGAP) +[](https://github.com/YurenHao0426/PutnamGAP) +[](https://github.com/YurenHao0426/GAP) [](https://creativecommons.org/licenses/by/4.0/) **GAP** (*Generalization-and-Perturbation*) is an automatable evaluation framework for stress-testing the **robustness of LLM mathematical reasoning** under semantically equivalent transformations of advanced math problems. It partitions equivalence-preserving transformations into two qualitatively different families — **surface renaming** and **kernel parameter resampling** — and provides paired-evaluation, mechanism-sensitive analyses that prior perturbation benchmarks cannot. |
