health
DiagRAG: AI-Powered Rare Disease Diagnostic
Probabilistic diagnosis for rare diseases with grounded clinical reasoning.
RAGclinicalrare-diseaseprobabilisticknowledge-graph
By
Devshree Hardiksinh Jadeja
Semester
Spring 2026
Problem
Rare disease diagnosis is uniquely difficult: each disease has very few documented cases, symptoms overlap heavily, and diagnosis is treated as static classification — when in reality it is a sequential, uncertainty-heavy decision process.
Solution
A three-layer AI pipeline: phenotype ingestion → probabilistic inference → retrieval-augmented LLM reasoning. A Partial VAE generates a disease posterior with calibrated uncertainty, an information-gain module recommends the most informative next phenotype (cost-aware), and a RAG + LLM layer retrieves biomedical evidence to produce transparent clinical reasoning.
User flow
- Clinician enters observed HPO symptoms
- Probabilistic engine generates calibrated disease probabilities with uncertainty
- System suggests the most informative and cost-aware next phenotype or test
- RAG + LLM produces a transparent, evidence-grounded clinical explanation
LLM components
- RAG-based clinical reasoning — evidence-grounded explanations for ranked diagnoses
- Uncertainty-aware reasoning — surfaces calibrated probabilities, not just top picks
Tools
- LLM: OpenAI GPT / Claude
- Embeddings & vector store: Hugging Face + FAISS / Qdrant
- ML: PyTorch
- Domain knowledge: SHEPHERD knowledge graph embeddings
- Stack: FastAPI + React
- Vibe coding: Antigravity