Bhaskar Battula requested to merge ASR-integration into develop May 29, 2026

Overview

The feature enables the system to process user-uploaded or server-recorded audio files, generate transcripts using the ASR pipeline, and patch the generated transcript into the existing record data automatically.

What Does This MR Do?

Features Added

Added ASR integration flow into the CLI project
Supports:
- User-uploaded audio files
- Server-side recorded audio
Automatically transcribes audio content
Patches transcript data into existing records
Added logging and error handling for ASR workflows
Added automated tests for transcription and patching flows

Workflow

Audio file is uploaded or existing audio record is fetched
CLI triggers ASR processing
Audio is sent to ASR service
Transcript is generated
Record is patched with transcript data
Updated record is persisted

Technical Changes

Added

ASR service integration
Audio processing pipeline
Record patch/update handler
Transcription orchestration logic
Error handling and logging

Commands for Execution

Run ASR Processing

corpus-client asr <audio file>

Environment Variables

ASR_BASE_URL= <asr-service-url>

Test Coverage

Added Test Cases

Successful audio transcription
Record patch/update flow
Uploaded audio processing
Server-recorded audio processing
Invalid audio handling
ASR service failure handling
Invalid/missing record handling

Expected Outcome

After execution:

Audio files are transcribed successfully
Existing records are updated with transcript data
Errors are handled gracefully
Logs are generated for debugging and monitoring
Manual transcription effort is reduced

Checklist

Closes

Closes #54

Draft: feat: ASR integration for transcription