Retrieval-Augmented Generation (RAG) for Scientific Q&A · Topics

Projects with this topic

S

Shanmukha Praneet Atmuri / SciBot Research Assistant

SciBot is an intelligent, RAG-based research assistant designed to retrieve and answer questions from PDF documents with high accuracy. This system combines the power of LLMs (Large Language Models) with a vector database to perform context-aware question answering directly from PDF data, making it ideal for students, researchers, and professionals.

🧠 Key Features: 📄 PDF Ingestion & Parsing: Automatically reads and extracts structured text content from uploaded PDF files using tools like PyMuPDF or pdfplumber.

🧹 Text Preprocessing: Cleans the extracted text, chunks it into semantically meaningful passages, and removes irrelevant formatting or metadata.

🔍 Embedding Generation: Converts text chunks into numerical vector embeddings using OpenAI Embeddings API, HuggingFace models, or Instructor embeddings.

🗂️ Vector Database Storage: Stores these embeddings in a fast vector database like ChromaDB, FAISS, or Pinecone, enabling efficient semantic retrieval.

🤖 RAG-based Question Answering:

When a user asks a question, relevant chunks are retrieved from the vector DB.

These are passed along with the query to an LLM (like GPT-4, OpenAI GPT-3.5, or Gemini).

The model then generates accurate, grounded responses using the retrieved context.

🧪 Research-Oriented UI:

Built with Streamlit for a fast, minimal interface.

Users can upload PDFs, type questions, and get natural language answers.

Highlighted references or citations from the PDF

Retrieval-Au...

0

Updated Jun 19, 2025

0 0 0 0

Updated Jun 19, 2025

Retrieval-Augmented Generation (RAG) for Scient...

Projects with this topic

Shanmukha Praneet Atmuri / SciBot Research Assistant