Tax Invation 101: AI-Powered Malaysia Tax Receipt Claim Assistant
Tax Invation 101 is designed to simplify the complex process of organizing receipts for Malaysian personal income tax claims. The application allows users to upload receipt images or PDF files, after which the system automatically performs OCR to extract crucial information such as merchant name, receipt date, total amount, item descriptions, and payment details.
The extracted receipt data is then meticulously cleaned, indexed, and stored for future retrieval. We implemented a Retrieval-Augmented Generation (RAG) system to connect the user’s uploaded receipt data with official tax-related documentation and claim guidelines. This sophisticated process enables the system to reason over both the raw receipt information and the relevant Malaysian tax claim rules before suggesting the most suitable tax relief category.
The core goal of this project is not merely digital storage, but providing actionable intelligence. It helps users understand which expenses are potentially claimable under current Malaysian tax rules for a specific assessment year. This is particularly valuable for individuals who accumulate numerous receipts throughout the year but struggle with the manual organization required for annual tax submissions.
The platform integrates modern web technologies, advanced OCR processing, vector indexing, and LLM-powered classification into a single, practical, and intelligent receipt management solution.
Features
- OCR receipt scanning and automatic text extraction
- Intelligent receipt indexing and structured metadata storage
- LLM-based tax claim categorization using RAG
- Tax year selection and context-aware classification
- Searchable digital receipt repository
- Suggestion of eligible tax reliefs based on local laws
- Dashboard overview and total claim amount calculation
Challenges
Accurately extracting structured data from receipts with highly variable layouts, image qualities, and potential damage. Correctly mapping vague receipt descriptions to specific Malaysian tax relief categories, which requires contextual reasoning beyond simple keyword matching. Implementing a reliable RAG pipeline to ensure the LLM's suggestions are strictly grounded in official, up-to-date tax documentation.
Solutions
We developed an automated receipt processing pipeline that cleans and structures raw OCR text into defined fields. For classification, we utilized a RAG approach, indexing official tax documents into a knowledge base so the LLM retrieves relevant rules before suggesting a category, significantly reducing hallucination. Data persistence was managed using a combination of structured databases for metadata and a vector database for semantic search.