Venkata Srinivas Varun Oruganti requested to merge Varun_2007/aura-ai:main into main May 29, 2026

✨ Aura AI Studio | Premium Universal Assistant & Custom Knowledge Suite

Aura AI Studio is an elite, production-ready Retrieval-Augmented Generation (RAG) platform built with Streamlit, FAISS, and Hugging Face. It features a stunning glassmorphic dark-mode interface and a highly modular quad-engine architecture capable of running seamlessly in Cloud API, Local Offline PyTorch, Llama.cpp (Local GGUF), or Ollama server configurations.

📁 Repository Structure

your-name-rag-app/
├── app.py                  # Main Streamlit UI, chat session managers & orchestrator
├── requirements.txt        # Package dependencies (version-locked)
├── README.md               # Developer documentation & quickstart guide
├── data/
│   └── sample_knowledge.txt # A premium sample technical specifications document
├── utils/
│   ├── __init__.py         # Package initialization
│   ├── loader.py           # Document loading, PDF parsing & smart chunk splitter
│   ├── embedder.py         # Embedding and LLM clients with robust error handling
│   ├── retriever.py        # FAISS vector store, cosine searches & offline fallbacks
│   └── styles.py           # High-end glassmorphic dark theme CSS & mock data
└── screenshots/
    └── demo.png            # Screenshot of the working Aura AI Studio

⚡ Key Features

Ultra-Premium Glassmorphism Aesthetics: Dynamic animations, glowing rings, cohesive indigo-magenta gradients, and HSL custom themes tailored for Visual Excellence.
Dynamic Quad-Mode Execution:
- Llama.cpp (Local GGUF) [PREFERRED & RECOMMENDED]: Runs optimized GGUF quantized models locally with extreme memory efficiency, CPU/GPU acceleration, and minimal hardware footprint.
- Hugging Face Cloud API: Leverages global serverless inference GPUs (sub-second queries, zero local hardware requirements).
- Local Offline Engine: Loads lightweight models locally on PyTorch CPU/GPU for 100% private, offline RAG.
- Ollama Local Server: Integrates with a local running Ollama instance over a fast REST API.
Resilient Error Boundaries: Specialized interceptors catch and explain 401 Unauthorized (invalid tokens), 403 Forbidden (gated models), and 429 (rate limits) instantly with actionable diagnostic cards.
Smart Semantic Chunking: Paragraph-aware text splitting that defaults to sentence-level slicing if the paragraph is too large, retaining a customizable boundary overlap.
Score-Based Redirection: Uses normalized inner-product calculations for exact Cosine Similarity. If matching context drops below a 35.0% threshold, queries automatically fallback to general training, preventing hallucinations.

🚀 Quickstart Guide

1. Installation

Ensure you have Python 3.10+ installed. Clone or copy the project files, open your terminal in the project directory, and run:

pip install -r requirements.txt

[!TIP] Windows User Tip (For Llama.cpp support): To ensure the local C++ backend aligns with your CPU's exact instruction set and avoid 0xc000001d (Illegal Instruction) errors, we highly recommend installing llama-cpp-python compiled from source:
pip install llama-cpp-python --no-binary llama-cpp-python --force-reinstall --no-cache-dir

2. Configure Token (Optional but Recommended)

Aura handles Hugging Face tokens dynamically. You do not need to hardcode it! Configure your token (hf_...) using any of the following methods:

Sidebar UI: Paste the token directly in the Engine & Model Settings sidebar tab while the app is running.
Local Environment Variable: Create a .env file in the root folder:
Cloud Platform Secrets: When deploying to Streamlit Community Cloud or Hugging Face Spaces, add a Secret named HF_TOKEN in the project settings.

3. Launching the App

Run the following command to start the Streamlit server:

streamlit run app.py

The app will compile instantly and open automatically in your default browser at http://localhost:8501.

⚙️ Setting Up Llama.cpp (Local GGUF Mode)

For a fully private, highly optimized local experience, select Llama.cpp (Local GGUF) in the Engine & Model Settings sidebar:

Preferred Default Model:
- GGUF Repo ID: Qwen/Qwen2.5-1.5B-Instruct-GGUF
- GGUF Filename: qwen2.5-1.5b-instruct-q4_k_m.gguf
Important Note on Vision Files:
- ⚠️ Vision/multimodal GGUF files (like mmproj-BF16.gguf projector files) are not standalone text LLMs and cannot be loaded by themselves. Always specify a valid language model GGUF (such as Qwen or Llama text instruct GGUFs) for standard text completions and Telugu answering.

🛠️ Diagnostics & FAISS Settings

Default Embedding Model: sentence-transformers/all-MiniLM-L6-v2 (384 dimensions)
Default LLM: Qwen/Qwen2.5-7B-Instruct (Cloud) / Qwen/Qwen2.5-1.5B-Instruct (Local GGUF)
RAG Parameters: Fully adjustable sliding controls for Chunk Size, Overlap, Retrieved Sources (Top-K), and LLM Temperature.
Diagnostics Dashboard: Inspect PyTorch, CUDA activation status, FAISS loaded range indices, and run cosine similarity analytics directly inside the Database Analytics tab!

[Submission] Venkata Srinivas Varun Oruganti [Team - Centurions] — RAG App