Skip to content

[Submission] Shreyas Mogalapalli — RAG App

Shreyas Mogalapalli requested to merge ShreyasMogalapalli/ip1-icfai:main into main

🧠 RAG Chatbot — Local AI Document Q&A

A fully local, privacy-first Retrieval-Augmented Generation (RAG) chatbot built with Streamlit and Ollama. Three modes in one app — chat freely, upload documents, or paste a URL — all processing happens on your machine, nothing sent to the cloud.


📸 Screenshot

Run the app and save your screenshot as screenshots/demo.png


📄 What Document Did I Use and Why?

This app is general-purpose — it works with any PDF, TXT, or Markdown file you provide, or any webpage URL. There is no hardcoded document.

Good starter documents to test with:

  • A research paper (e.g. Attention Is All You Need) — tests dense technical Q&A
  • A company policy or HR handbook — tests factual retrieval
  • A Wikipedia article via URL — tests live web ingestion

Why no fixed document? The app is designed to be reusable across any domain — academic, legal, personal notes, or web content — so you bring your own data.


💬 Three Modes

The app has 3 tabs, each with its own chat history:

Tab Mode How it works
💬 Normal Chat Plain LLM Just type and chat — no documents needed
📄 Document RAG on file Upload PDF/TXT/MD → index → answers from your file
🌐 URL RAG on webpage Paste a URL → fetch & index → answers from that page

️ How Does Your Chunking Work?

Chunking is handled in utils/loader.py using a character-based sliding window with sentence-boundary awareness.

Steps:

  1. Document loaded as raw text (PyMuPDF for PDFs, plain read for TXT/MD, HTTP scrape for URLs)
  2. Text split into 500-character chunks (configurable in sidebar)
  3. Each chunk has 50-character overlap with the next — prevents answers being cut at boundaries
  4. Before cutting, the chunker looks back up to 100 chars for a natural break: paragraph (\n\n), newline (\n), or sentence end (. , ! , ? )
  5. Each chunk stored with metadata: source, chunk_id, start_char

Example of smart breaking:

"...attention mechanisms are used here.\n\nThe encoder then maps..."
                                         ↑ breaks here (paragraph boundary)
Parameter Default Configurable range
Chunk size 500 chars 200 – 1000
Chunk overlap 50 chars 0 – 200

🔢 Which Embedding Model Did I Use?

Model: nomic-embed-text:latest via Ollama

Why nomic-embed-text?

  • Built specifically for retrieval tasks — outperforms general-purpose models on semantic search
  • Produces 768-dimensional vectors — precise enough for similarity matching, efficient for local hardware
  • Runs fully locally via Ollama — no API key, no internet required
  • Lightweight at only 274 MB, already in your Ollama install

How it's used:

  • Index time: each text chunk → nomic-embed-text → 768-dim vector → stored in ChromaDB
  • Query time: user question → nomic-embed-text → query vector → cosine similarity → top-K chunks retrieved → passed to LLM as context

🚀 How to Run Locally

Prerequisites

  • Python 3.9+
  • Ollama installed

Step 1 — Start Ollama

ollama serve

Confirm you have the required models:

ollama pull nomic-embed-text   # embeddings
ollama pull llama3.1:8b        # chat LLM (default)

Step 2 — Install Dependencies

cd shreyas-rag-app
pip install -r requirements.txt

Step 3 — Run the App

streamlit run app.py

Open http://localhost:8501 in your browser.

Step 4 — Pick Your Mode

💬 Normal Chat tab — just start typing, no setup needed

📄 Document tab:

  1. Expand Upload & Index a Document
  2. Upload your PDF / TXT / MD
  3. Click Index Document
  4. Ask questions in the chat box below

🌐 URL tab:

  1. Expand Fetch & Index a URL
  2. Paste any webpage URL
  3. Click 🌐 Fetch & Index
  4. Ask questions about the page

🗂️ Folder Structure

shreyas-rag-app/
├── app.py                  # Main Streamlit app (3-tab UI)
├── requirements.txt        # All dependencies
├── README.md               # This file
├── data/
│   └── your_document.pdf   # Place documents here
├── utils/
│   ├── __init__.py
│   ├── loader.py           # Document loading & chunking logic
│   ├── embedder.py         # Embedding generation via Ollama
│   └── retriever.py        # ChromaDB vector store & retrieval
└── screenshots/
    └── demo.png            # App screenshot

🧩 Supported Models (from your Ollama install)

Model Best for
llama3.1:8b Balanced quality & speed — default
qwen2.5:7b Strong reasoning & instruction following
deepseek-r1:8b Complex multi-step reasoning
gemma2:2b Fast responses, low RAM usage
llama3:latest General purpose
nomic-embed-text Embeddings only (not for chat)

🔮 What Would I Improve With More Time?

1. 🔁 Semantic Chunking

Replace fixed character chunking with topic-boundary chunking — splits where the content actually changes, not just at a character count. Libraries like semantic-text-splitter would give much more coherent chunks.

2. 🔀 Hybrid Search (BM25 + Vector)

Combine dense vector search (semantic similarity) with sparse BM25 keyword search. Hybrid retrieval handles both vague conceptual questions and exact term lookups (names, dates, codes) better than either alone.

3. 🗃️ Multi-Document Management

A dedicated UI to manage multiple indexed documents — view, delete, or filter retrieval to a specific file. Currently all documents in a tab are searched together.

4. 💬 Smarter Conversation Memory

Replace the current 3-turn sliding window with a summarisation-based memory — older turns get compressed into a short summary so the model retains context over long conversations without hitting the context limit.

5. 📊 Retrieval Evaluation

Add a 👍 / 👎 feedback button per answer and log Hit Rate and MRR metrics to measure and improve retrieval quality over time.

6. 🖥️ Streaming Responses

Switch from stream: false to streaming Ollama responses so the answer appears word-by-word instead of all at once — much better UX for longer answers.


🔒 Privacy

Everything runs locally — no data ever leaves your machine:

  • Ollama runs LLMs and embeddings entirely on your hardware
  • ChromaDB stores all vectors in ./chroma_db/ on disk
  • No telemetry, no cloud calls, no API keys required EOF echo "Done"
Edited by Shreyas Mogalapalli

Merge request reports

Loading