From 5eb005d3724e2470f476995c3ebbcb67e8b04ca5 Mon Sep 17 00:00:00 2001 From: Yuren Hao Date: Wed, 8 Apr 2026 22:11:29 -0500 Subject: README badges: add PutnamGAP mirror + GAP framework cross-link badges --- README.md | 2 ++ 1 file changed, 2 insertions(+) (limited to 'README.md') diff --git a/README.md b/README.md index a64a3f1..d29250c 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,8 @@ [![arXiv](https://img.shields.io/badge/arXiv-2508.08833-b31b1b.svg)](https://arxiv.org/abs/2508.08833) [![Hugging Face](https://img.shields.io/badge/🤗_Dataset-PutnamGAP-yellow.svg)](https://huggingface.co/datasets/blackhao0426/PutnamGAP) +[![Dataset Mirror](https://img.shields.io/badge/GitHub-PutnamGAP_mirror-blue?logo=github)](https://github.com/YurenHao0426/PutnamGAP) +[![GAP Code](https://img.shields.io/badge/GitHub-GAP_framework-181717?logo=github)](https://github.com/YurenHao0426/GAP) [![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/) **GAP** (*Generalization-and-Perturbation*) is an automatable evaluation framework for stress-testing the **robustness of LLM mathematical reasoning** under semantically equivalent transformations of advanced math problems. It partitions equivalence-preserving transformations into two qualitatively different families — **surface renaming** and **kernel parameter resampling** — and provides paired-evaluation, mechanism-sensitive analyses that prior perturbation benchmarks cannot. -- cgit v1.2.3