FinSight AI
A multimodal financial assistant that turns receipts and voice into insight.
By
Divyansh Pradhan
Semester
Spring 2026
Problem
'Where did my money actually go?' is a question users ask, yet tracking it remains a highly manual chore. Manual tracking is tedious, data is scattered across texts, voice memos, and receipts, and legacy apps like Mint or Rocket only show backward-looking charts. They lack conversational advice and dynamic forward-looking simulations for planning future life events.
Solution
A modular multi-agent AI financial assistant that turns unstructured multimodal data into actionable intelligence. Voice and text inputs log expenses instantly, vision/OCR extracts line items from receipts, a predictive engine simulates future scenarios, FAISS retrieves past financial context, and LangGraph dynamically routes between specialized agents.
User flow
- Open the chatbot UI after a purchase
- Click the mic to say 'I spent $15 on lunch', or upload a photo of a receipt
- The system extracts date, amount, and merchant and logs it to the database
- Later, ask 'If I plan to move to Waltham next June, how much must I save monthly?'
- The Predictive Agent simulates burn rate over FAISS history and returns conversational advice
LLM components
- Expense Tracker Agent — extracts structured parameters (date, cost, category) from text, voice, and image inputs
- Financial Advisor Agent — RAG over FAISS vector store of past financial context
- Predictive Agent — uses tool-calling for forecasting simulations
- Multimodal voice and vision — extends the assistant beyond text input
Tools
- Backend: Python + FastAPI
- Frontend: Streamlit or React
- Agent orchestration: LangChain + LangGraph
- Vector DB: FAISS
- LLM / multimodal: Gemini Pro & Vision API
- Voice: Web Audio API
- Data structuring: Pydantic, Dateparser
- Vibe coding: Cursor / Gemini / ChatGPT