[Submission] Vasishta - RAG app(Govt Scheme assisstance)(Team Synergy)
Government Schemes RAG Assistant
Project Overview
The Government Schemes RAG Assistant is a Retrieval-Augmented Generation (RAG) application built using Streamlit, LangChain, FAISS, and Hugging Face Embeddings. The application allows users to ask questions about various government schemes and receive context-aware answers extracted from official government documents.
Features
- PDF document ingestion
- Automatic text chunking
- Semantic search using embeddings
- Retrieval-Augmented Generation (RAG)
- Interactive Streamlit interface
- FAISS vector database for fast retrieval
- Government scheme question answering
Folder Structure
RAG/
│
├── app.py
├── README.md
├── requirements.txt
├── styles.css
├── ui_components.py
├── test_backend.py
├── test_chunk.py
├── test_pdf.py
│
├── data/
│ ├── Atal Pension Yojana.pdf
│ ├── Operational-Guidelines-of-PMAY-U.pdf
│ ├── pmay.pdf
│ ├── RevisedPM-KISANOperationalGuidelines.pdf
│ └── Rythu Bandhu Scheme Official Guidelines.pdf
│
├── utils/
│ ├── pdf_loader.py
│ ├── embeddings.py
│ ├── retriever.py
│ ├── llm_response.py
│ ├── vector_store.py
│ ├── text_chuncker.py
│
├── screenshot/
│ └── demo.png
│
└── venv/
What Document Did You Use and Why?
We used official government scheme documents, including:
- Atal Pension Yojana (APY)
- Pradhan Mantri Awas Yojana Urban (PMAY-U)
- PM-KISAN Scheme
- Rythu Bandhu Scheme
These documents were selected because they contain authentic information regarding eligibility criteria, benefits, registration processes, and implementation guidelines. Using official documents improves the accuracy and reliability of responses generated by the RAG system.
How Does Your Chunking Work?
The PDF documents are loaded and processed using LangChain's PDF loading utilities.
The extracted text is divided into smaller chunks using RecursiveCharacterTextSplitter.
Typical configuration:
- Chunk Size: 1000 characters
- Chunk Overlap: 200 characters
This allows the retrieval system to preserve context while ensuring efficient semantic search.
Which Embedding Model Did You Use?
We used:
sentence-transformers/all-MiniLM-L6-v2
This model converts document chunks into vector embeddings that capture semantic meaning. These embeddings are stored in a FAISS vector database for similarity-based retrieval.
Retrieval Workflow
- User enters a question.
- The question is converted into embeddings.
- FAISS searches for the most relevant document chunks.
- Retrieved chunks are supplied as context.
- The language model generates an answer based on the retrieved information.
This Retrieval-Augmented Generation (RAG) approach improves factual accuracy and reduces hallucinations.
Technologies Used
- Python
- Streamlit
- LangChain
- FAISS
- Hugging Face Embeddings
- Sentence Transformers
- PyPDF
How to Run Locally
Clone the Repository
git clone <repository-url>
cd RAG
Create Virtual Environment
python -m venv venv
Activate Virtual Environment
Windows:
venv\Scripts\activate
Install Dependencies
pip install -r requirements.txt
Run the Application
streamlit run app.py
Open in Browser
http://localhost:8501
Screenshot
The screenshot below demonstrates the working Government Schemes Assistant.
Location:
screenshots/demo.png
The screenshot includes:
- User query input
- Retrieved answer
- Streamlit user interface
- Government Scheme knowledge retrieval
Example Questions
- What is Atal Pension Yojana?
- Who is eligible for PM-KISAN?
- How can I apply for PMAY-U?
- What are the benefits of the Rythu Bandhu Scheme?
- What documents are required for scheme registration?
What Would You Improve With More Time?
Given additional development time, the following improvements could be implemented:
- Multilingual support (English, Hindi, Telugu)
- Source citations for every response
- Conversation memory
- Support for DOCX and TXT files
- Online deployment
- Advanced reranking models
- Better UI/UX design
- Real-time government scheme updates
Team Contributions
- Data Collection & PDF Management- Laxminarayana
- Document Processing & Chunking
- Embedding Generation-Noel Paul
- Retrieval System Development-Vasishta
- Streamlit Frontend Development-Ramcharan
- Testing & Documentation-Dhruvaa
Conclusion
This project demonstrates the practical implementation of Retrieval-Augmented Generation (RAG) for government scheme assistance. By combining document retrieval with language models, the application provides accurate, context-aware answers from official government documents while maintaining reliability and scalability.