AMS 691.01
All projects
health

Decoding SAS: RAG-Powered Clinical Data Interpretation

Translate dense SAS clinical outputs into plain-English findings.

RAGclinicalSASbiomedicaldata-dictionary

By

Rutika Avinash Kadam

Semester

Spring 2026

Complex SAS (Statistical Analysis System) outputs — Mixed Procedures, Convergence tables — create a literacy gap that makes results dense and difficult for clinicians or new researchers to translate into actionable medical insights quickly.

A web-based application that translates complex SAS procedures into plain-English clinical insights by grounding the LLM in project-specific data dictionaries. The system retrieves clinical definitions for cryptic variables and synthesizes statistical coefficients into a 'Key Findings' summary.

  • Upload a SAS output PDF or image
  • The system scans for cryptic variables and retrieves their clinical definitions from the project's data dictionary
  • The LLM synthesizes statistical coefficients with retrieved context to generate a 'Key Findings' summary
  • RAG-based interpreter — grounds reasoning in project-specific data dictionaries to ensure accurate interpretation of cryptic SAS variables
  • Clinical synthesis — turns statistical coefficients into plain-English findings
  • Frontend: Gradio or Streamlit
  • Stack: Python + LangChain
  • Vector DB: FAISS or Pinecone
  • Embeddings: MedEmbed-small-v0.1
  • LLM: GPT-4o / Claude Sonnet, with OpenAI / Claude / Gemini Pro as alternatives