diff options
| author | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 10:24:24 -0500 |
|---|---|---|
| committer | YurenHao0426 <Blackhao0426@gmail.com> | 2026-04-08 10:24:24 -0500 |
| commit | ebd0d1410492bfff1dc96db4cbb9bbbe97e7afe6 (patch) | |
| tree | e47997a27ca319acf0b2d5beac3587358a383780 | |
| parent | 9c5dcd36d1c53073e6b42c2c85a0c47f2d3229c2 (diff) | |
Bib fix: correct titles for 3 E&D model papers (Paleka/O'Bray/Jordan)
Previous bibitems had paraphrased/invented titles for the 3 E&D-methodology
exemplar papers cited in §1 and §7. The correct titles are:
- Paleka et al. ICLR 2026: 'Pitfalls in Evaluating Language Model Forecasters'
(not 'Pitfalls in evaluating model behavior: measurement, reporting, and
interpretability failures')
- O'Bray et al. ICLR 2022: 'Evaluation Metrics for Graph Generative Models:
Problems, Pitfalls, and Practical Solutions' (not 'Evaluation beyond
leaderboard metrics: methodology matters')
- Jordan et al. ICML 2020: 'Evaluating the Performance of Reinforcement
Learning Algorithms' (not 'Evaluating machine learning: tests, cases, and
expectations'). Also corrected first author 'Matt' -> 'Scott M.'
Verified against codex round 23 memory which recorded the correct titles
from the OpenReview/ICML URLs. Previous bibitems were hallucinated titles
from earlier rounds and would have been a factual bug in the bibliography.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| -rw-r--r-- | paper/main.pdf | bin | 481570 -> 481554 bytes | |||
| -rw-r--r-- | paper/main.tex | 16 |
2 files changed, 8 insertions, 8 deletions
diff --git a/paper/main.pdf b/paper/main.pdf Binary files differindex 93aac07..7b169a6 100644 --- a/paper/main.pdf +++ b/paper/main.pdf diff --git a/paper/main.tex b/paper/main.tex index 4ab205e..6e824b6 100644 --- a/paper/main.tex +++ b/paper/main.tex @@ -190,20 +190,20 @@ The main lesson is to decompose the evaluation question before interpreting the \begin{thebibliography}{10} -\bibitem[Paleka et~al.(2026)Paleka, et~al.]{paleka2026pitfalls} +\bibitem[Paleka et~al.(2026)]{paleka2026pitfalls} Daniel Paleka et~al. -\newblock Pitfalls in evaluating model behavior: measurement, reporting, and - interpretability failures. +\newblock Pitfalls in evaluating language model forecasters. \newblock In {\em International Conference on Learning Representations}, 2026. -\bibitem[O'Bray et~al.(2022)O'Bray, et~al.]{obray2022evaluation} +\bibitem[O'Bray et~al.(2022)]{obray2022evaluation} Leslie O'Bray et~al. -\newblock Evaluation beyond leaderboard metrics: methodology matters. +\newblock Evaluation metrics for graph generative models: problems, pitfalls, + and practical solutions. \newblock In {\em International Conference on Learning Representations}, 2022. -\bibitem[Jordan et~al.(2020)Jordan, et~al.]{jordan2020evaluating} -Matt Jordan et~al. -\newblock Evaluating machine learning: tests, cases, and expectations. +\bibitem[Jordan et~al.(2020)]{jordan2020evaluating} +Scott~M. Jordan et~al. +\newblock Evaluating the performance of reinforcement learning algorithms. \newblock In {\em International Conference on Machine Learning}, 2020. \bibitem[Lillicrap et~al.(2016)Lillicrap, Cownden, Tweed, and |
