[Submission] Venkata Srinivas Varun Oruganti [Team - Centurions] — RAG App
✨ Aura AI Studio | Premium Universal Assistant & Custom Knowledge Suite
Aura AI Studio is an elite, production-ready Retrieval-Augmented Generation (RAG) platform built with Streamlit, FAISS, and Hugging Face. It features a stunning glassmorphic dark-mode interface and a highly modular quad-engine architecture capable of running seamlessly in Cloud API, Local Offline PyTorch, Llama.cpp (Local GGUF), or Ollama server configurations.
📁 Repository Structure
your-name-rag-app/
├── app.py # Main Streamlit UI, chat session managers & orchestrator
├── requirements.txt # Package dependencies (version-locked)
├── README.md # Developer documentation & quickstart guide
├── data/
│ └── sample_knowledge.txt # A premium sample technical specifications document
├── utils/
│ ├── __init__.py # Package initialization
│ ├── loader.py # Document loading, PDF parsing & smart chunk splitter
│ ├── embedder.py # Embedding and LLM clients with robust error handling
│ ├── retriever.py # FAISS vector store, cosine searches & offline fallbacks
│ └── styles.py # High-end glassmorphic dark theme CSS & mock data
└── screenshots/
└── demo.png # Screenshot of the working Aura AI Studio
⚡ Key Features
- Ultra-Premium Glassmorphism Aesthetics: Dynamic animations, glowing rings, cohesive indigo-magenta gradients, and HSL custom themes tailored for Visual Excellence.
-
Dynamic Quad-Mode Execution:
- Llama.cpp (Local GGUF) [PREFERRED & RECOMMENDED]: Runs optimized GGUF quantized models locally with extreme memory efficiency, CPU/GPU acceleration, and minimal hardware footprint.
- Hugging Face Cloud API: Leverages global serverless inference GPUs (sub-second queries, zero local hardware requirements).
- Local Offline Engine: Loads lightweight models locally on PyTorch CPU/GPU for 100% private, offline RAG.
- Ollama Local Server: Integrates with a local running Ollama instance over a fast REST API.
-
Resilient Error Boundaries: Specialized interceptors catch and explain
401 Unauthorized(invalid tokens),403 Forbidden(gated models), and429(rate limits) instantly with actionable diagnostic cards. - Smart Semantic Chunking: Paragraph-aware text splitting that defaults to sentence-level slicing if the paragraph is too large, retaining a customizable boundary overlap.
- Score-Based Redirection: Uses normalized inner-product calculations for exact Cosine Similarity. If matching context drops below a 35.0% threshold, queries automatically fallback to general training, preventing hallucinations.
🚀 Quickstart Guide
1. Installation
Ensure you have Python 3.10+ installed. Clone or copy the project files, open your terminal in the project directory, and run:
pip install -r requirements.txt
[!TIP] Windows User Tip (For Llama.cpp support): To ensure the local C++ backend aligns with your CPU's exact instruction set and avoid
0xc000001d(Illegal Instruction) errors, we highly recommend installingllama-cpp-pythoncompiled from source:pip install llama-cpp-python --no-binary llama-cpp-python --force-reinstall --no-cache-dir
2. Configure Token (Optional but Recommended)
Aura handles Hugging Face tokens dynamically. You do not need to hardcode it! Configure your token (hf_...) using any of the following methods:
- Sidebar UI: Paste the token directly in the Engine & Model Settings sidebar tab while the app is running.
-
Local Environment Variable: Create a
.envfile in the root folder: -
Cloud Platform Secrets: When deploying to Streamlit Community Cloud or Hugging Face Spaces, add a Secret named
HF_TOKENin the project settings.
3. Launching the App
Run the following command to start the Streamlit server:
streamlit run app.py
The app will compile instantly and open automatically in your default browser at http://localhost:8501.
⚙ ️ Setting Up Llama.cpp (Local GGUF Mode)
For a fully private, highly optimized local experience, select Llama.cpp (Local GGUF) in the Engine & Model Settings sidebar:
-
Preferred Default Model:
-
GGUF Repo ID:
Qwen/Qwen2.5-1.5B-Instruct-GGUF -
GGUF Filename:
qwen2.5-1.5b-instruct-q4_k_m.gguf
-
GGUF Repo ID:
-
Important Note on Vision Files:
-
⚠ ️ Vision/multimodal GGUF files (likemmproj-BF16.ggufprojector files) are not standalone text LLMs and cannot be loaded by themselves. Always specify a valid language model GGUF (such as Qwen or Llama text instruct GGUFs) for standard text completions and Telugu answering.
-
🛠 ️ Diagnostics & FAISS Settings
-
Default Embedding Model:
sentence-transformers/all-MiniLM-L6-v2(384 dimensions) -
Default LLM:
Qwen/Qwen2.5-7B-Instruct(Cloud) /Qwen/Qwen2.5-1.5B-Instruct(Local GGUF) - RAG Parameters: Fully adjustable sliding controls for Chunk Size, Overlap, Retrieved Sources (Top-K), and LLM Temperature.
- Diagnostics Dashboard: Inspect PyTorch, CUDA activation status, FAISS loaded range indices, and run cosine similarity analytics directly inside the Database Analytics tab!