Skip to content

[Submission] Vasishta - RAG app(Govt Scheme assisstance)(Team Synergy)

Vasishta Koduri requested to merge Vasishta_K/ip1-icfai:main into main

Government Schemes RAG Assistant

Project Overview

The Government Schemes RAG Assistant is a Retrieval-Augmented Generation (RAG) application built using Streamlit, LangChain, FAISS, and Hugging Face Embeddings. The application allows users to ask questions about various government schemes and receive context-aware answers extracted from official government documents.


Features

  • PDF document ingestion
  • Automatic text chunking
  • Semantic search using embeddings
  • Retrieval-Augmented Generation (RAG)
  • Interactive Streamlit interface
  • FAISS vector database for fast retrieval
  • Government scheme question answering

Folder Structure

RAG/

├── app.py
├── README.md
├── requirements.txt
├── styles.css
├── ui_components.py
├── test_backend.py
├── test_chunk.py
├── test_pdf.py

├── data/
│   ├── Atal Pension Yojana.pdf
│   ├── Operational-Guidelines-of-PMAY-U.pdf
│   ├── pmay.pdf
│   ├── RevisedPM-KISANOperationalGuidelines.pdf
│   └── Rythu Bandhu Scheme Official Guidelines.pdf

├── utils/
│   ├── pdf_loader.py
│   ├── embeddings.py
│   ├── retriever.py
│   ├── llm_response.py
│   ├── vector_store.py
│   ├── text_chuncker.py

├── screenshot/
│   └── demo.png

└── venv/

What Document Did You Use and Why?

We used official government scheme documents, including:

  • Atal Pension Yojana (APY)
  • Pradhan Mantri Awas Yojana Urban (PMAY-U)
  • PM-KISAN Scheme
  • Rythu Bandhu Scheme

These documents were selected because they contain authentic information regarding eligibility criteria, benefits, registration processes, and implementation guidelines. Using official documents improves the accuracy and reliability of responses generated by the RAG system.


How Does Your Chunking Work?

The PDF documents are loaded and processed using LangChain's PDF loading utilities.

The extracted text is divided into smaller chunks using RecursiveCharacterTextSplitter.

Typical configuration:

  • Chunk Size: 1000 characters
  • Chunk Overlap: 200 characters

This allows the retrieval system to preserve context while ensuring efficient semantic search.


Which Embedding Model Did You Use?

We used:

sentence-transformers/all-MiniLM-L6-v2

This model converts document chunks into vector embeddings that capture semantic meaning. These embeddings are stored in a FAISS vector database for similarity-based retrieval.


Retrieval Workflow

  1. User enters a question.
  2. The question is converted into embeddings.
  3. FAISS searches for the most relevant document chunks.
  4. Retrieved chunks are supplied as context.
  5. The language model generates an answer based on the retrieved information.

This Retrieval-Augmented Generation (RAG) approach improves factual accuracy and reduces hallucinations.


Technologies Used

  • Python
  • Streamlit
  • LangChain
  • FAISS
  • Hugging Face Embeddings
  • Sentence Transformers
  • PyPDF

How to Run Locally

Clone the Repository

git clone <repository-url>
cd RAG

Create Virtual Environment

python -m venv venv

Activate Virtual Environment

Windows:

venv\Scripts\activate

Install Dependencies

pip install -r requirements.txt

Run the Application

streamlit run app.py

Open in Browser

http://localhost:8501

Screenshot

The screenshot below demonstrates the working Government Schemes Assistant.

Location:

screenshots/demo.png

The screenshot includes:

  • User query input
  • Retrieved answer
  • Streamlit user interface
  • Government Scheme knowledge retrieval

Example Questions

  • What is Atal Pension Yojana?
  • Who is eligible for PM-KISAN?
  • How can I apply for PMAY-U?
  • What are the benefits of the Rythu Bandhu Scheme?
  • What documents are required for scheme registration?

What Would You Improve With More Time?

Given additional development time, the following improvements could be implemented:

  • Multilingual support (English, Hindi, Telugu)
  • Source citations for every response
  • Conversation memory
  • Support for DOCX and TXT files
  • Online deployment
  • Advanced reranking models
  • Better UI/UX design
  • Real-time government scheme updates

Team Contributions

  • Data Collection & PDF Management- Laxminarayana
  • Document Processing & Chunking
  • Embedding Generation-Noel Paul
  • Retrieval System Development-Vasishta
  • Streamlit Frontend Development-Ramcharan
  • Testing & Documentation-Dhruvaa

Conclusion

This project demonstrates the practical implementation of Retrieval-Augmented Generation (RAG) for government scheme assistance. By combining document retrieval with language models, the application provides accurate, context-aware answers from official government documents while maintaining reliability and scalability.

Edited by Vasishta Koduri

Merge request reports

Loading