summaryrefslogtreecommitdiff
path: root/paper/main.tex
diff options
context:
space:
mode:
Diffstat (limited to 'paper/main.tex')
-rw-r--r--paper/main.tex16
1 files changed, 8 insertions, 8 deletions
diff --git a/paper/main.tex b/paper/main.tex
index 4ab205e..6e824b6 100644
--- a/paper/main.tex
+++ b/paper/main.tex
@@ -190,20 +190,20 @@ The main lesson is to decompose the evaluation question before interpreting the
\begin{thebibliography}{10}
-\bibitem[Paleka et~al.(2026)Paleka, et~al.]{paleka2026pitfalls}
+\bibitem[Paleka et~al.(2026)]{paleka2026pitfalls}
Daniel Paleka et~al.
-\newblock Pitfalls in evaluating model behavior: measurement, reporting, and
- interpretability failures.
+\newblock Pitfalls in evaluating language model forecasters.
\newblock In {\em International Conference on Learning Representations}, 2026.
-\bibitem[O'Bray et~al.(2022)O'Bray, et~al.]{obray2022evaluation}
+\bibitem[O'Bray et~al.(2022)]{obray2022evaluation}
Leslie O'Bray et~al.
-\newblock Evaluation beyond leaderboard metrics: methodology matters.
+\newblock Evaluation metrics for graph generative models: problems, pitfalls,
+ and practical solutions.
\newblock In {\em International Conference on Learning Representations}, 2022.
-\bibitem[Jordan et~al.(2020)Jordan, et~al.]{jordan2020evaluating}
-Matt Jordan et~al.
-\newblock Evaluating machine learning: tests, cases, and expectations.
+\bibitem[Jordan et~al.(2020)]{jordan2020evaluating}
+Scott~M. Jordan et~al.
+\newblock Evaluating the performance of reinforcement learning algorithms.
\newblock In {\em International Conference on Machine Learning}, 2020.
\bibitem[Lillicrap et~al.(2016)Lillicrap, Cownden, Tweed, and