From ebd0d1410492bfff1dc96db4cbb9bbbe97e7afe6 Mon Sep 17 00:00:00 2001 From: YurenHao0426 Date: Wed, 8 Apr 2026 10:24:24 -0500 Subject: Bib fix: correct titles for 3 E&D model papers (Paleka/O'Bray/Jordan) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Previous bibitems had paraphrased/invented titles for the 3 E&D-methodology exemplar papers cited in §1 and §7. The correct titles are: - Paleka et al. ICLR 2026: 'Pitfalls in Evaluating Language Model Forecasters' (not 'Pitfalls in evaluating model behavior: measurement, reporting, and interpretability failures') - O'Bray et al. ICLR 2022: 'Evaluation Metrics for Graph Generative Models: Problems, Pitfalls, and Practical Solutions' (not 'Evaluation beyond leaderboard metrics: methodology matters') - Jordan et al. ICML 2020: 'Evaluating the Performance of Reinforcement Learning Algorithms' (not 'Evaluating machine learning: tests, cases, and expectations'). Also corrected first author 'Matt' -> 'Scott M.' Verified against codex round 23 memory which recorded the correct titles from the OpenReview/ICML URLs. Previous bibitems were hallucinated titles from earlier rounds and would have been a factual bug in the bibliography. Co-Authored-By: Claude Opus 4.6 (1M context) --- paper/main.tex | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'paper/main.tex') diff --git a/paper/main.tex b/paper/main.tex index 4ab205e..6e824b6 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -190,20 +190,20 @@ The main lesson is to decompose the evaluation question before interpreting the \begin{thebibliography}{10} -\bibitem[Paleka et~al.(2026)Paleka, et~al.]{paleka2026pitfalls} +\bibitem[Paleka et~al.(2026)]{paleka2026pitfalls} Daniel Paleka et~al. -\newblock Pitfalls in evaluating model behavior: measurement, reporting, and - interpretability failures. +\newblock Pitfalls in evaluating language model forecasters. \newblock In {\em International Conference on Learning Representations}, 2026. -\bibitem[O'Bray et~al.(2022)O'Bray, et~al.]{obray2022evaluation} +\bibitem[O'Bray et~al.(2022)]{obray2022evaluation} Leslie O'Bray et~al. -\newblock Evaluation beyond leaderboard metrics: methodology matters. +\newblock Evaluation metrics for graph generative models: problems, pitfalls, + and practical solutions. \newblock In {\em International Conference on Learning Representations}, 2022. -\bibitem[Jordan et~al.(2020)Jordan, et~al.]{jordan2020evaluating} -Matt Jordan et~al. -\newblock Evaluating machine learning: tests, cases, and expectations. +\bibitem[Jordan et~al.(2020)]{jordan2020evaluating} +Scott~M. Jordan et~al. +\newblock Evaluating the performance of reinforcement learning algorithms. \newblock In {\em International Conference on Machine Learning}, 2020. \bibitem[Lillicrap et~al.(2016)Lillicrap, Cownden, Tweed, and -- cgit v1.2.3