health
Decoding SAS: RAG-Powered Clinical Data Interpretation
Translate dense SAS clinical outputs into plain-English findings.
RAGclinicalSASbiomedicaldata-dictionary
By
Rutika Avinash Kadam
Semester
Spring 2026
Problem
Complex SAS (Statistical Analysis System) outputs — Mixed Procedures, Convergence tables — create a literacy gap that makes results dense and difficult for clinicians or new researchers to translate into actionable medical insights quickly.
Solution
A web-based application that translates complex SAS procedures into plain-English clinical insights by grounding the LLM in project-specific data dictionaries. The system retrieves clinical definitions for cryptic variables and synthesizes statistical coefficients into a 'Key Findings' summary.
User flow
- Upload a SAS output PDF or image
- The system scans for cryptic variables and retrieves their clinical definitions from the project's data dictionary
- The LLM synthesizes statistical coefficients with retrieved context to generate a 'Key Findings' summary
LLM components
- RAG-based interpreter — grounds reasoning in project-specific data dictionaries to ensure accurate interpretation of cryptic SAS variables
- Clinical synthesis — turns statistical coefficients into plain-English findings
Tools
- Frontend: Gradio or Streamlit
- Stack: Python + LangChain
- Vector DB: FAISS or Pinecone
- Embeddings: MedEmbed-small-v0.1
- LLM: GPT-4o / Claude Sonnet, with OpenAI / Claude / Gemini Pro as alternatives