diff options
| -rw-r--r-- | README.md | 2 |
1 files changed, 2 insertions, 0 deletions
@@ -2,6 +2,8 @@ [](https://arxiv.org/abs/2508.08833) [](https://huggingface.co/datasets/blackhao0426/PutnamGAP) +[](https://github.com/YurenHao0426/PutnamGAP) +[](https://github.com/YurenHao0426/GAP) [](https://creativecommons.org/licenses/by/4.0/) **GAP** (*Generalization-and-Perturbation*) is an automatable evaluation framework for stress-testing the **robustness of LLM mathematical reasoning** under semantically equivalent transformations of advanced math problems. It partitions equivalence-preserving transformations into two qualitatively different families — **surface renaming** and **kernel parameter resampling** — and provides paired-evaluation, mechanism-sensitive analyses that prior perturbation benchmarks cannot. |
