summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 10:24:24 -0500
committerYurenHao0426 <Blackhao0426@gmail.com>2026-04-08 10:24:24 -0500
commitebd0d1410492bfff1dc96db4cbb9bbbe97e7afe6 (patch)
treee47997a27ca319acf0b2d5beac3587358a383780
parent9c5dcd36d1c53073e6b42c2c85a0c47f2d3229c2 (diff)
Bib fix: correct titles for 3 E&D model papers (Paleka/O'Bray/Jordan)
Previous bibitems had paraphrased/invented titles for the 3 E&D-methodology exemplar papers cited in §1 and §7. The correct titles are: - Paleka et al. ICLR 2026: 'Pitfalls in Evaluating Language Model Forecasters' (not 'Pitfalls in evaluating model behavior: measurement, reporting, and interpretability failures') - O'Bray et al. ICLR 2022: 'Evaluation Metrics for Graph Generative Models: Problems, Pitfalls, and Practical Solutions' (not 'Evaluation beyond leaderboard metrics: methodology matters') - Jordan et al. ICML 2020: 'Evaluating the Performance of Reinforcement Learning Algorithms' (not 'Evaluating machine learning: tests, cases, and expectations'). Also corrected first author 'Matt' -> 'Scott M.' Verified against codex round 23 memory which recorded the correct titles from the OpenReview/ICML URLs. Previous bibitems were hallucinated titles from earlier rounds and would have been a factual bug in the bibliography. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
-rw-r--r--paper/main.pdfbin481570 -> 481554 bytes
-rw-r--r--paper/main.tex16
2 files changed, 8 insertions, 8 deletions
diff --git a/paper/main.pdf b/paper/main.pdf
index 93aac07..7b169a6 100644
--- a/paper/main.pdf
+++ b/paper/main.pdf
Binary files differ
diff --git a/paper/main.tex b/paper/main.tex
index 4ab205e..6e824b6 100644
--- a/paper/main.tex
+++ b/paper/main.tex
@@ -190,20 +190,20 @@ The main lesson is to decompose the evaluation question before interpreting the
\begin{thebibliography}{10}
-\bibitem[Paleka et~al.(2026)Paleka, et~al.]{paleka2026pitfalls}
+\bibitem[Paleka et~al.(2026)]{paleka2026pitfalls}
Daniel Paleka et~al.
-\newblock Pitfalls in evaluating model behavior: measurement, reporting, and
- interpretability failures.
+\newblock Pitfalls in evaluating language model forecasters.
\newblock In {\em International Conference on Learning Representations}, 2026.
-\bibitem[O'Bray et~al.(2022)O'Bray, et~al.]{obray2022evaluation}
+\bibitem[O'Bray et~al.(2022)]{obray2022evaluation}
Leslie O'Bray et~al.
-\newblock Evaluation beyond leaderboard metrics: methodology matters.
+\newblock Evaluation metrics for graph generative models: problems, pitfalls,
+ and practical solutions.
\newblock In {\em International Conference on Learning Representations}, 2022.
-\bibitem[Jordan et~al.(2020)Jordan, et~al.]{jordan2020evaluating}
-Matt Jordan et~al.
-\newblock Evaluating machine learning: tests, cases, and expectations.
+\bibitem[Jordan et~al.(2020)]{jordan2020evaluating}
+Scott~M. Jordan et~al.
+\newblock Evaluating the performance of reinforcement learning algorithms.
\newblock In {\em International Conference on Machine Learning}, 2020.
\bibitem[Lillicrap et~al.(2016)Lillicrap, Cownden, Tweed, and