AMS 691.01
All projects
finance

FinSight AI

A multimodal financial assistant that turns receipts and voice into insight.

multi-agentfinancevoicevisionLangGraph

By

Divyansh Pradhan

Semester

Spring 2026

'Where did my money actually go?' is a question users ask, yet tracking it remains a highly manual chore. Manual tracking is tedious, data is scattered across texts, voice memos, and receipts, and legacy apps like Mint or Rocket only show backward-looking charts. They lack conversational advice and dynamic forward-looking simulations for planning future life events.

A modular multi-agent AI financial assistant that turns unstructured multimodal data into actionable intelligence. Voice and text inputs log expenses instantly, vision/OCR extracts line items from receipts, a predictive engine simulates future scenarios, FAISS retrieves past financial context, and LangGraph dynamically routes between specialized agents.

  • Open the chatbot UI after a purchase
  • Click the mic to say 'I spent $15 on lunch', or upload a photo of a receipt
  • The system extracts date, amount, and merchant and logs it to the database
  • Later, ask 'If I plan to move to Waltham next June, how much must I save monthly?'
  • The Predictive Agent simulates burn rate over FAISS history and returns conversational advice
  • Expense Tracker Agent — extracts structured parameters (date, cost, category) from text, voice, and image inputs
  • Financial Advisor Agent — RAG over FAISS vector store of past financial context
  • Predictive Agent — uses tool-calling for forecasting simulations
  • Multimodal voice and vision — extends the assistant beyond text input
  • Backend: Python + FastAPI
  • Frontend: Streamlit or React
  • Agent orchestration: LangChain + LangGraph
  • Vector DB: FAISS
  • LLM / multimodal: Gemini Pro & Vision API
  • Voice: Web Audio API
  • Data structuring: Pydantic, Dateparser
  • Vibe coding: Cursor / Gemini / ChatGPT