Skip to content
C

corpus-collection

Projects with this topic

  • A Streamlit and AI-powered web application for creating Telugu memes and collecting linguistic corpus data

    Updated
    Updated
  • This project is a real-time voice translator web application built with Streamlit. It captures live speech input via microphone, recognizes the spoken text using speech recognition, translates it to a selected target language via machine translation, and plays back the translated audio using text-to-speech synthesis. The app supports multiple languages, provides start/stop controls for continuous conversation flows, displays original and translated text, allows editing of transcriptions, and enables exporting of recordings and analysis. It aims to facilitate accessible, multilingual communication and contribute to language corpora for research purposes.

    Updated
    Updated
  • A community-driven Streamlit application for collecting a corpus of Telugu cultural data, with a focus on food and culinary heritage.

    Updated
    Updated
  • Desi Proverbs & Local Lore Collector is a lightweight, mobile-friendly Streamlit app to collect Indian proverbs, folk sayings, and cultural facts across languages. It auto-detects language, stores structured metadata (language, region, tags), and exports a clean corpus for research and cultural preservation. Built for low-bandwidth users, deployable on Hugging Face Spaces, and designed with an offline-first approach.

    Updated
    Updated