fix: surface backend error messages in cli uploads

Summary

This MR improves error handling in the Corpus CLI by aligning client-side failure reporting with the actual response formats returned by the local corpus-server-app.

Previously, several CLI flows displayed generic errors such as HTTP 400, HTTP 401, or silent/generic failures even when the backend was returning specific and actionable messages in FastAPI’s detail field or validation error payloads.

This change introduces a shared response/error parsing utility and wires it into the currently used API flows so the CLI now surfaces backend-provided messages directly.


Problem

The CLI was not consistently parsing backend error responses.

Observed issues

  • Login failures only showed generic status codes instead of backend messages like Incorrect phone number or password
  • Record upload chunk/finalization failures often reduced backend responses to plain HTTP <status>
  • Extracted text upload errors did not properly format FastAPI validation errors
  • Category fetch failures during upload setup did not clearly explain what went wrong
  • Some failures were only logged and not clearly shown to the user at the CLI level

This was especially problematic because the backend already returns useful detail payloads for cases such as:

  • auth failures
  • validation failures
  • forbidden operations
  • missing categories / missing chunks
  • extracted text conflicts
  • record not found / permission errors

Scope of This MR

This MR updates error handling only for the endpoints the CLI is currently using:

  • POST /api/v1/auth/login
  • GET /api/v1/categories/
  • POST /api/v1/records/upload/chunk
  • POST /api/v1/records/upload
  • POST /api/v1/records/{record_id}/extracted_text

It does not introduce new endpoint coverage beyond those flows.


Changes Made

1. Added shared API error parsing utility

New file:

  • src/corpus_client_cli/api_errors.py

This utility centralizes response parsing and formatting logic for:

  • JSON error objects containing detail, error, or message
  • FastAPI / Pydantic validation error lists
  • plain-text error bodies
  • non-JSON error responses

Key behavior

  • extracts meaningful user-facing messages from backend payloads
  • formats validation lists like:
    segments -> 0 -> end: Input should be greater than 0
  • provides a fallback to HTTP <status> when no better message exists

This avoids repeating response parsing logic across CLI modules and keeps backend error handling consistent.


2. Improved login error reporting

Updated:

  • src/corpus_client_cli/cli.py

Changes

  • authenticate() now reads backend response payloads instead of only checking status code
  • when login fails, the CLI prints the backend-provided message
  • if the login response is malformed or not a dict, the CLI now reports that clearly instead of failing implicitly

Example improvement

  • Before: Login failed: HTTP 401
  • After: Login failed: Incorrect phone number or password

3. Improved record upload error handling

Updated:

  • src/corpus_client_cli/upload.py

Chunk upload flow

  • chunk upload errors now parse backend payloads and extract real failure reasons
  • errors are logged with backend detail and returned in a user-readable form

Upload finalization flow

  • finalization errors now surface backend detail instead of generic status-only messages

Examples include:

  • invalid category input
  • missing chunks
  • forbidden release rights (downloaded)
  • invalid language / media / release-rights values
  • backend validation or file-processing failures

Per-file reporting

  • process_file() now prints file-specific failure messages to the console
  • if chunk upload fails, the CLI summarizes chunk error messages for that file
  • if finalization fails, the exact finalization error is shown next to the filename
  • missing file / unexpected exceptions are now surfaced clearly to the user

4. Improved category fetch fallback behavior

Updated:

  • src/corpus_client_cli/upload.py

Changes

  • category fetch now parses error responses from the backend
  • if category retrieval fails, the CLI prints the actual reason and falls back to the default category
  • if the response shape is unexpected, the CLI warns and still proceeds safely

This keeps the upload flow usable while making the failure reason visible.


5. Improved extracted-text upload error handling

Updated:

  • src/corpus_client_cli/extracted_text_upload.py

Changes

  • extracted-text POST failures now parse backend error payloads instead of embedding raw response bodies inconsistently
  • 409 Conflict responses now preserve the backend message
  • 422 validation errors are formatted into readable field-level messages
  • failed uploads now print clear console messages per stem/file
  • 409 already exists cases are surfaced as warnings rather than opaque failures

This is important because the backend for extracted text returns meaningful messages such as:

  • Record not found
  • Extracted text already exists for this record and cannot be overwritten via POST. Use PATCH to update.
  • validation errors for malformed segments or invalid payload structure
  • permission / authorization failures

Testing

Updated:

  • tests/test_async_logic.py

Added / updated coverage for

  • login failure using backend detail
  • upload finalization returning backend rejection text
  • extracted-text upload formatting FastAPI validation error payloads
  • response mocks supporting both JSON and plain-text bodies

Verification completed

  • focused tests passed
  • syntax compilation passed

Commands used

./.venv/bin/pytest tests/test_async_logic.py tests/test_cli_commands.py -q ./.venv/bin/python -m compileall src tests

Result

  • 23 passed

User-Facing Impact

After this MR, CLI users should see backend-authored messages instead of vague status-only errors.

Examples:

  • authentication failures now show the actual backend reason
  • upload rejection reasons are shown next to the affected filename
  • extracted-text payload issues now identify the failing field path
  • category fetch problems are visible instead of silently degrading

This should significantly reduce debugging time when backend validation or authorization rules reject a request.


Why This Approach

The backend already provides structured and useful error responses.

The problem was not lack of backend messaging, but the CLI not consuming those responses correctly.

A shared parsing layer is the right fix because:

  • it removes repeated ad hoc response handling
  • it supports actual FastAPI response patterns already used by the server
  • it keeps future endpoint integrations simpler and more consistent

This also avoids hardcoding backend-specific strings throughout the CLI.


Out of Scope

This MR does not:

  • add support for new endpoints
  • redesign the CLI UX beyond error surfacing
  • change backend API contracts
  • change upload / extract business logic
  • add PATCH support for extracted text conflicts
  • introduce retry policies or recovery workflows

Files Changed

Implementation

  • src/corpus_client_cli/api_errors.py
  • src/corpus_client_cli/cli.py
  • src/corpus_client_cli/upload.py
  • src/corpus_client_cli/extracted_text_upload.py

Tests

  • tests/test_async_logic.py

Suggested Reviewer Focus

Reviewers may want to focus on:

  • whether the shared error parser covers backend response shapes expected going forward
  • whether the fallback behavior is appropriate when categories cannot be fetched
  • whether 409 extracted-text handling should remain warning-style
  • whether similar parsing should be extended to future endpoints

Risk Assessment

Risk: Low to Moderate

Reasons:

  • changes are localized to response parsing and user-visible error reporting
  • the request flow itself is unchanged
  • tests cover the new parsing paths
  • existing successful paths remain intact

Primary risk

If any endpoint returns an unexpected error shape not covered by detail, error, message, list, or plain text, the CLI will still fall back to HTTP <status> rather than failing outright.


Expected Outcome

Users should now be able to run the CLI and immediately understand backend rejections without inspecting server logs or guessing from status codes.

This is especially helpful for:

  • login problems
  • invalid upload metadata
  • category issues
  • extracted-text validation errors
  • record state conflicts
Edited by Lakshy Yarlagadda

Merge request reports

Loading