Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease
Abstract
Large language models (LLMs) have demonstrated impressive capabilities in disease diagnosis. However, their effectiveness in identifying rarer diseases, which are inherently more challenging to diagnose, remains an open question. Rare disease performance is critical with the increasing use of LLMs in healthcare settings. This is especially true if a primary care physician needs to make a rarer prognosis from only a patient conversation so that they can take the appropriate next step. To that end, several clinical decision support systems are designed to support providers in rare disease identification. Yet their utility is limited due to their lack of knowledge of common disorders and difficulty of use. In this paper, we propose RareScale to combine the knowledge LLMs with expert systems. We use jointly use an expert system and LLM to simulate rare disease chats. This data is used to train a rare disease candidate predictor model. Candidates from this smaller model are then used as additional inputs to black-box LLM to make the final differential diagnosis. Thus, RareScale allows for a balance between rare and common diagnoses. We present results on over 575 rare diseases, beginning with Abdominal Actinomycosis and ending with Wilson's Disease. Our approach significantly improves the baseline performance of black-box LLMs by over 17% in Top-5 accuracy. We also find that our candidate generation performance is high (e.g. 88.8% on gpt-4o generated chats).
Community
Excited to announce our new preprint from Curai, Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease! We introduce RareScale, a method that enhances rare disease diagnosis by combining large language models (LLMs) with expert systems. Diagnosing rare diseases, affecting ~5.9% globally, is notoriously slow, often taking 4-5 years. Clinical decision support systems (CDSS) exist, but they struggle with usability and disease coverage.
LLMs show promise but previous work focuses on information-rich case studies, limiting their real-world use in primary care. Enter RareScale, which directly improves rare disease differential diagnosis from provider-patient chats on over 575 diseases. RareScale uses an expert system and simulator to create structured cases, which are in turn fed into LLMs to create free-form dialogues. These dialogues allow us to train a rare disease candidate generator which can be used in combination with a black-box LLM.
With RareScale, we saw a significant improvement, boosting Top-5 diagnostic accuracy up to 17% . Check out our full paper for details!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Memorize and Rank: Elevating Large Language Models for Clinical Diagnosis Prediction (2025)
- Survey and Improvement Strategies for Gene Prioritization with Large Language Models (2025)
- MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models (2025)
- Large Language Models for Interpretable Mental Health Diagnosis (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper