From 8a4985fa66ec0977cf0de300aa7371fa809a24e7 Mon Sep 17 00:00:00 2001
From: Yuren Hao <97327730+YurenHao0426@users.noreply.github.com>
Date: Fri, 25 Jul 2025 05:10:20 -0700
Subject: Auto-update: Added 1 new papers on 2025-07-25

---
 README.md | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

(limited to 'README.md')

diff --git a/README.md b/README.md
index d324a36..9ea5453 100644
--- a/README.md
+++ b/README.md
@@ -27,6 +27,24 @@
 
 
 
+
+
+
+## Papers Updated on 2025-07-25 12:10 UTC
+
+### Beyond Internal Data: Constructing Complete Datasets for Fairness   Testing
+
+**Authors:** Varsha Ramineni, Hossein A. Rahmani, Emine Yilmaz et al.
+
+**Categories:** cs.LG, cs.AI, stat.ML
+
+**Published:** 2025-07-24T16:35:42Z
+
+**Abstract:** As AI becomes prevalent in high-risk domains and decision-making, it is essential to test for potential harms and biases. This urgency is reflected by the global emergence of AI regulations that emphasise fairness and adequate testing, with some mandating independent bias audits. However, procuring the necessary data for fairness testing remains a significant challenge. Particularly in industry settings, legal and privacy concerns restrict the collection of demographic data required to assess group disparities, and auditors face practical and cultural challenges in gaining access to data. Further, internal historical datasets are often insufficiently representative to identify real-world biases. This work focuses on evaluating classifier fairness when complete datasets including demographics are inaccessible. We propose leveraging separate overlapping datasets to construct complete synthetic data that includes demographic information and accurately reflects the underlying relationships between protected attributes and model features. We validate the fidelity of the synthetic data by comparing it to real data, and empirically demonstrate that fairness metrics derived from testing on such synthetic data are consistent with those obtained from real data. This work, therefore, offers a path to overcome real-world data scarcity for fairness testing, enabling independent, model-agnostic evaluation of fairness, and serving as a viable substitute where real data is limited.
+
+**Link:** [arXiv:2507.18561v1](http://arxiv.org/abs/2507.18561v1)
+
+---
 
 ## Papers Updated on 2025-07-24 12:10 UTC
 
-- 
cgit v1.2.3