summaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorWill DePue <williamd@openai.com>2026-03-19 14:55:57 -0700
committerGitHub <noreply@github.com>2026-03-19 14:55:57 -0700
commit535352463e08b52a602d33ed8cf24f1379addee7 (patch)
tree4657b11d27696fc7f81a34b98157218b888ca6f9 /README.md
parentf3897c16bb913640c2b65d2e82addab307245034 (diff)
Update README.md
Diffstat (limited to 'README.md')
-rw-r--r--README.md6
1 files changed, 5 insertions, 1 deletions
diff --git a/README.md b/README.md
index 98aa544..e994ca4 100644
--- a/README.md
+++ b/README.md
@@ -142,7 +142,7 @@ No external downloads, training dataset access, or network calls are allowed dur
**Are scores independently verified by OpenAI?**
-We're not automatically verifying every submission, but we will verify the top leaderboard entries over time. Any non-reproducible results can be disqualified, and issues reproducing submissions should be raised on the PR.
+We're not automatically verifying every submission, but we will verify the top leaderboard entries over time. Any non-reproducible results can be disqualified, and issues reproducing submissions should be raised on the PR. If you find an issue with a record on the leaderboard or find a record isn't reproducible, please let us know and add an Github Issue describing your findings.
**What counts as 'external compute'? For example, is it fair to tune my hyperparameters offline?**
@@ -152,6 +152,10 @@ There's no perfectly clear answer here and it's hard to draw a clean line around
We won't accept submissions that take more than 10 minutes on 8xH100 to evaluate (Note: This limit is in addition to the 10 minutes of training time allowed!), but otherwise you're free to evaluate however. As with modded-nanogpt, we allow evaluation at any sequence length. And, obviously, you aren't allowed to access any training data during evaluation, unless you pay for those bits in the <16MB limit. We encourage competitors to push the bounds of evaluation methods as aggressively as with training methods.
+**What is the process for accepting new submissions?**
+
+Since all submissions are public, we're accepting record submissions chronologically depending on their PR creation time. The leaderboard may take time to update due to verification and review of submissions, so pay consideration to what the current SOTA PR is when submitting. As explained below, submissions should exceed the SOTA record with sufficient statistical significance in order to accepted for the leaderboard. Otherwise, submissions may be accepted as 'non-record submissions' given they are sufficiently unique or interesting.
+
## Submission Process
New SOTA records must fulfill the following criteria: