Vasishta Koduri requested to merge Vasishta_K/ip1-icfai:main into main May 29, 2026

Government Schemes RAG Assistant

Project Overview

The Government Schemes RAG Assistant is a Retrieval-Augmented Generation (RAG) application built using Streamlit, LangChain, FAISS, and Hugging Face Embeddings. The application allows users to ask questions about various government schemes and receive context-aware answers extracted from official government documents.

Features

PDF document ingestion
Automatic text chunking
Semantic search using embeddings
Retrieval-Augmented Generation (RAG)
Interactive Streamlit interface
FAISS vector database for fast retrieval
Government scheme question answering

Folder Structure

RAG/
│
├── app.py
├── README.md
├── requirements.txt
├── styles.css
├── ui_components.py
├── test_backend.py
├── test_chunk.py
├── test_pdf.py
│
├── data/
│   ├── Atal Pension Yojana.pdf
│   ├── Operational-Guidelines-of-PMAY-U.pdf
│   ├── pmay.pdf
│   ├── RevisedPM-KISANOperationalGuidelines.pdf
│   └── Rythu Bandhu Scheme Official Guidelines.pdf
│
├── utils/
│   ├── pdf_loader.py
│   ├── embeddings.py
│   ├── retriever.py
│   ├── llm_response.py
│   ├── vector_store.py
│   ├── text_chuncker.py
│
├── screenshot/
│   └── demo.png
│
└── venv/

What Document Did You Use and Why?

We used official government scheme documents, including:

Atal Pension Yojana (APY)
Pradhan Mantri Awas Yojana Urban (PMAY-U)
PM-KISAN Scheme
Rythu Bandhu Scheme

These documents were selected because they contain authentic information regarding eligibility criteria, benefits, registration processes, and implementation guidelines. Using official documents improves the accuracy and reliability of responses generated by the RAG system.

How Does Your Chunking Work?

The PDF documents are loaded and processed using LangChain's PDF loading utilities.

The extracted text is divided into smaller chunks using RecursiveCharacterTextSplitter.

Typical configuration:

Chunk Size: 1000 characters
Chunk Overlap: 200 characters

This allows the retrieval system to preserve context while ensuring efficient semantic search.

Which Embedding Model Did You Use?

We used:

sentence-transformers/all-MiniLM-L6-v2

This model converts document chunks into vector embeddings that capture semantic meaning. These embeddings are stored in a FAISS vector database for similarity-based retrieval.

Retrieval Workflow

User enters a question.
The question is converted into embeddings.
FAISS searches for the most relevant document chunks.
Retrieved chunks are supplied as context.
The language model generates an answer based on the retrieved information.

This Retrieval-Augmented Generation (RAG) approach improves factual accuracy and reduces hallucinations.

Technologies Used

Python
Streamlit
LangChain
FAISS
Hugging Face Embeddings
Sentence Transformers
PyPDF

How to Run Locally

Clone the Repository

git clone <repository-url>
cd RAG

Create Virtual Environment

python -m venv venv

Activate Virtual Environment

Windows:

venv\Scripts\activate

Install Dependencies

pip install -r requirements.txt

Run the Application

streamlit run app.py

Open in Browser

http://localhost:8501

Screenshot

The screenshot below demonstrates the working Government Schemes Assistant.

Location:

screenshots/demo.png

The screenshot includes:

User query input
Retrieved answer
Streamlit user interface
Government Scheme knowledge retrieval

Example Questions

What is Atal Pension Yojana?
Who is eligible for PM-KISAN?
How can I apply for PMAY-U?
What are the benefits of the Rythu Bandhu Scheme?
What documents are required for scheme registration?

What Would You Improve With More Time?

Given additional development time, the following improvements could be implemented:

Multilingual support (English, Hindi, Telugu)
Source citations for every response
Conversation memory
Support for DOCX and TXT files
Online deployment
Advanced reranking models
Better UI/UX design
Real-time government scheme updates

Team Contributions

Data Collection & PDF Management- Laxminarayana
Document Processing & Chunking
Embedding Generation-Noel Paul
Retrieval System Development-Vasishta
Streamlit Frontend Development-Ramcharan
Testing & Documentation-Dhruvaa

Conclusion

This project demonstrates the practical implementation of Retrieval-Augmented Generation (RAG) for government scheme assistance. By combining document retrieval with language models, the application provides accurate, context-aware answers from official government documents while maintaining reliability and scalability.

Edited May 29, 2026 by Vasishta Koduri

[Submission] Vasishta - RAG app(Govt Scheme assisstance)(Team Synergy)