From ebd0d1410492bfff1dc96db4cbb9bbbe97e7afe6 Mon Sep 17 00:00:00 2001
From: YurenHao0426 <Blackhao0426@gmail.com>
Date: Wed, 8 Apr 2026 10:24:24 -0500
Subject: Bib fix: correct titles for 3 E&D model papers (Paleka/O'Bray/Jordan)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Previous bibitems had paraphrased/invented titles for the 3 E&D-methodology
exemplar papers cited in §1 and §7. The correct titles are:

- Paleka et al. ICLR 2026: 'Pitfalls in Evaluating Language Model Forecasters'
  (not 'Pitfalls in evaluating model behavior: measurement, reporting, and
  interpretability failures')

- O'Bray et al. ICLR 2022: 'Evaluation Metrics for Graph Generative Models:
  Problems, Pitfalls, and Practical Solutions' (not 'Evaluation beyond
  leaderboard metrics: methodology matters')

- Jordan et al. ICML 2020: 'Evaluating the Performance of Reinforcement
  Learning Algorithms' (not 'Evaluating machine learning: tests, cases, and
  expectations'). Also corrected first author 'Matt' -> 'Scott M.'

Verified against codex round 23 memory which recorded the correct titles
from the OpenReview/ICML URLs. Previous bibitems were hallucinated titles
from earlier rounds and would have been a factual bug in the bibliography.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 paper/main.tex | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

(limited to 'paper/main.tex')

diff --git a/paper/main.tex b/paper/main.tex
index 4ab205e..6e824b6 100644
--- a/paper/main.tex
+++ b/paper/main.tex
@@ -190,20 +190,20 @@ The main lesson is to decompose the evaluation question before interpreting the
 
 \begin{thebibliography}{10}
 
-\bibitem[Paleka et~al.(2026)Paleka, et~al.]{paleka2026pitfalls}
+\bibitem[Paleka et~al.(2026)]{paleka2026pitfalls}
 Daniel Paleka et~al.
-\newblock Pitfalls in evaluating model behavior: measurement, reporting, and
-  interpretability failures.
+\newblock Pitfalls in evaluating language model forecasters.
 \newblock In {\em International Conference on Learning Representations}, 2026.
 
-\bibitem[O'Bray et~al.(2022)O'Bray, et~al.]{obray2022evaluation}
+\bibitem[O'Bray et~al.(2022)]{obray2022evaluation}
 Leslie O'Bray et~al.
-\newblock Evaluation beyond leaderboard metrics: methodology matters.
+\newblock Evaluation metrics for graph generative models: problems, pitfalls,
+  and practical solutions.
 \newblock In {\em International Conference on Learning Representations}, 2022.
 
-\bibitem[Jordan et~al.(2020)Jordan, et~al.]{jordan2020evaluating}
-Matt Jordan et~al.
-\newblock Evaluating machine learning: tests, cases, and expectations.
+\bibitem[Jordan et~al.(2020)]{jordan2020evaluating}
+Scott~M. Jordan et~al.
+\newblock Evaluating the performance of reinforcement learning algorithms.
 \newblock In {\em International Conference on Machine Learning}, 2020.
 
 \bibitem[Lillicrap et~al.(2016)Lillicrap, Cownden, Tweed, and
-- 
cgit v1.2.3