summaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'README.md')
-rw-r--r--README.md84
1 files changed, 64 insertions, 20 deletions
diff --git a/README.md b/README.md
index 775d22c..c9aa8d0 100644
--- a/README.md
+++ b/README.md
@@ -1,14 +1,16 @@
-# ArXiv LLM Bias Paper Fetcher
+# ArXiv Social Good AI Paper Fetcher
-An automated system for discovering and cataloging research papers related to bias in Large Language Models (LLMs) from arXiv.org. This tool uses GPT-4o to intelligently filter papers and automatically updates a target repository with newly discovered relevant research.
+An automated system for discovering and cataloging research papers related to AI bias, fairness, and social good from arXiv.org. This tool uses GPT-4o to intelligently filter papers for social impact and automatically updates a target repository with newly discovered relevant research.
## 🎯 Features
-- **Intelligent Paper Detection**: Uses GPT-4o to analyze paper titles and abstracts for LLM bias relevance
+- **Intelligent Paper Detection**: Uses GPT-4o to analyze paper titles and abstracts for social good and fairness relevance
- **Automated Daily Updates**: Runs daily via GitHub Actions to fetch the latest papers
- **Historical Paper Collection**: Can fetch and process papers from the past 2 years
- **GitHub Integration**: Automatically updates target repository README with new findings
-- **Comprehensive Filtering**: Focuses on AI/ML categories most likely to contain relevant research
+- **Comprehensive Filtering**: Focuses on AI/ML categories most likely to contain social impact research
+- **Social Good Focus**: Identifies bias and fairness research across healthcare, education, criminal justice, and more
+- **Reverse Chronological Order**: Always maintains newest papers at the top of README for easy access
## 🔧 Setup & Configuration
@@ -78,6 +80,16 @@ Test the historical fetching functionality:
python scripts/test_historical_fetch.py
```
+Test the Social Good prompt effectiveness:
+```bash
+python scripts/test_social_good_prompt.py
+```
+
+Test the reverse chronological ordering:
+```bash
+python scripts/test_reverse_chronological.py
+```
+
### Debugging
If the system completes too quickly or you suspect no papers are being fetched, use the debug script:
@@ -91,6 +103,29 @@ This will show detailed information about:
- Number of papers fetched at each step
- Sample papers and filtering results
+### Parallel Processing
+
+The system now supports parallel processing of OpenAI requests for faster filtering:
+
+```bash
+# Test parallel vs sequential performance
+python scripts/test_parallel_processing.py
+```
+
+**Performance optimization options:**
+```bash
+# Enable/disable parallel processing
+USE_PARALLEL=true python scripts/fetch_papers.py
+
+# Control concurrent requests (default: 16 for daily, 25 for historical)
+MAX_CONCURRENT=20 python scripts/fetch_papers.py
+
+# Disable parallel processing for debugging
+USE_PARALLEL=false python scripts/fetch_papers.py
+```
+
+**Expected speedup:** 3-10x faster processing depending on the number of papers and network conditions.
+
## 🤖 GitHub Actions
The project includes automated GitHub Actions workflows:
@@ -123,36 +158,45 @@ The system searches these arXiv categories for relevant papers:
Papers are considered relevant if they discuss:
-- Bias in large language models, generative AI, or foundation models
-- Fairness issues in NLP models or text generation
-- Ethical concerns with language models
-- Demographic bias in AI systems
-- Alignment and safety of language models
-- Bias evaluation or mitigation in NLP
+- **Bias and fairness** in AI/ML systems with societal impact
+- **Algorithmic fairness** in healthcare, education, criminal justice, hiring, or finance
+- **Demographic bias** affecting marginalized or underrepresented groups
+- **Data bias** and its social consequences
+- **Ethical AI** and responsible AI deployment in society
+- **AI safety** and alignment with human values and social welfare
+- **Bias evaluation, auditing, or mitigation** in real-world applications
+- **Representation and inclusion** in AI systems and datasets
+- **Social implications** of AI bias (e.g., perpetuating inequality)
+- **Fairness** in recommendation systems, search engines, or content moderation
## 📁 Project Structure
```
PaperFetcher/
├── scripts/
-│ ├── fetch_papers.py # Main fetching script
-│ ├── test_daily_fetch.py # Daily fetching test
-│ ├── test_historical_fetch.py # Historical fetching test
-│ └── debug_fetch.py # Debug and troubleshooting script
+│ ├── fetch_papers.py # Main fetching script (with parallel support)
+│ ├── test_daily_fetch.py # Daily fetching test
+│ ├── test_historical_fetch.py # Historical fetching test
+│ ├── test_parallel_processing.py # Parallel processing performance test
+│ ├── test_improved_fetch.py # Improved fetching logic test
+│ ├── test_social_good_prompt.py # Social Good prompt testing
+│ ├── test_reverse_chronological.py # Reverse chronological order testing
+│ └── debug_fetch.py # Debug and troubleshooting script
├── .github/
│ └── workflows/
-│ └── daily_papers.yml # GitHub Actions workflow
-├── requirements.txt # Python dependencies
-└── README.md # This file
+│ └── daily_papers.yml # GitHub Actions workflow
+├── requirements.txt # Python dependencies
+└── README.md # This file
```
## 🔍 How It Works
1. **Paper Retrieval**: Queries arXiv API for papers in relevant CS categories
2. **Date Filtering**: Filters papers based on submission/update dates
-3. **AI Analysis**: Uses GPT-4o to analyze each paper's title and abstract
-4. **Repository Update**: Adds relevant papers to target repository's README
-5. **Version Control**: Commits changes with descriptive commit messages
+3. **AI Analysis**: Uses GPT-4o to analyze each paper's title and abstract for social good relevance
+4. **Social Impact Assessment**: Evaluates papers for bias, fairness, and societal implications
+5. **Repository Update**: Adds relevant papers to target repository's README in reverse chronological order
+6. **Version Control**: Commits changes with descriptive commit messages
## ⚙️ Configuration Options