Skip to content

Refact: added validation logics matching corpus backend for file uploads

Bhaskar Battula requested to merge file-checks into develop

Overview

This MR introduces client-side (CLI) validation to enforce fail-fast checks before initiating uploads to the FastAPI backend. The goal is to reduce unnecessary network usage, improve user feedback latency, and align CLI behavior with existing backend validation rules.

This change complements the backend’s multi-layered validation system by ensuring that invalid files and metadata are rejected early at the source.


What This MR Does

1. File Validation (Pre-Upload)

  • Enforces file size limits using os.path.getsize, aligned with backend constraints:
    • Audio: up to 500MB
    • Video: up to 2GB
    • Image: up to 50MB
    • Text: up to 10MB
  • Prevents uploads of oversized files before chunking begins.

2. File Type Validation

  • Validates file extensions and MIME types locally prior to upload.
  • Rejects unsupported or potentially unsafe file types early, reducing backend load.

3. Metadata Validation

  • Ensures CLI inputs comply with backend schema rules:
    • Title: 8–200 characters, minimum 2 meaningful words
    • Description: 32–2000 characters
  • Prevents invalid payload submission during upload finalization.

4. Input Sanitization & Consistency

  • Aligns CLI validation logic with backend rules defined in app/schemas/validation.py.
  • Reduces mismatch between frontend (CLI) and backend validation behavior.

Why This Change Is Needed

  • Fail-fast principle: Detect issues immediately instead of after upload attempts
  • Bandwidth optimization: Avoid uploading large invalid files or metadata
  • Improved UX: Users receive instant, actionable feedback without waiting for server responses
  • Backend efficiency: Reduces unnecessary processing, chunk handling, and validation overhead

Acceptance Criteria

  • CLI rejects files exceeding backend-defined size limits before upload begins
  • CLI validates file extensions and MIME types prior to upload
  • CLI enforces metadata constraints (title, description) consistent with backend rules
  • No upload request is made if validation fails
  • Error messages are clear and aligned with backend validation semantics
  • Validation rules remain consistent with app/schemas/validation.py

Impact

  • Reduces failed upload attempts reaching backend
  • Improves overall system efficiency and responsiveness
  • Provides a consistent validation experience across CLI and backend

Closes

Closes: #30 (closed)

Merge request reports

Loading