fix: surface backend error messages in cli uploads
Summary
This MR improves error handling in the Corpus CLI by aligning client-side failure reporting with the actual response formats returned by the local corpus-server-app.
Previously, several CLI flows displayed generic errors such as HTTP 400, HTTP 401, or silent/generic failures even when the backend was returning specific and actionable messages in FastAPI’s detail field or validation error payloads.
This change introduces a shared response/error parsing utility and wires it into the currently used API flows so the CLI now surfaces backend-provided messages directly.
Problem
The CLI was not consistently parsing backend error responses.
Observed issues
- Login failures only showed generic status codes instead of backend messages like
Incorrect phone number or password - Record upload chunk/finalization failures often reduced backend responses to plain
HTTP <status> - Extracted text upload errors did not properly format FastAPI validation errors
- Category fetch failures during upload setup did not clearly explain what went wrong
- Some failures were only logged and not clearly shown to the user at the CLI level
This was especially problematic because the backend already returns useful detail payloads for cases such as:
- auth failures
- validation failures
- forbidden operations
- missing categories / missing chunks
- extracted text conflicts
- record not found / permission errors
Scope of This MR
This MR updates error handling only for the endpoints the CLI is currently using:
POST /api/v1/auth/loginGET /api/v1/categories/POST /api/v1/records/upload/chunkPOST /api/v1/records/uploadPOST /api/v1/records/{record_id}/extracted_text
It does not introduce new endpoint coverage beyond those flows.
Changes Made
1. Added shared API error parsing utility
New file:
src/corpus_client_cli/api_errors.py
This utility centralizes response parsing and formatting logic for:
- JSON error objects containing
detail,error, ormessage - FastAPI / Pydantic validation error lists
- plain-text error bodies
- non-JSON error responses
Key behavior
- extracts meaningful user-facing messages from backend payloads
- formats validation lists like:
segments -> 0 -> end: Input should be greater than 0 - provides a fallback to
HTTP <status>when no better message exists
This avoids repeating response parsing logic across CLI modules and keeps backend error handling consistent.
2. Improved login error reporting
Updated:
src/corpus_client_cli/cli.py
Changes
-
authenticate()now reads backend response payloads instead of only checking status code - when login fails, the CLI prints the backend-provided message
- if the login response is malformed or not a dict, the CLI now reports that clearly instead of failing implicitly
Example improvement
-
Before:
Login failed: HTTP 401 -
After:
Login failed: Incorrect phone number or password
3. Improved record upload error handling
Updated:
src/corpus_client_cli/upload.py
Chunk upload flow
- chunk upload errors now parse backend payloads and extract real failure reasons
- errors are logged with backend detail and returned in a user-readable form
Upload finalization flow
- finalization errors now surface backend detail instead of generic status-only messages
Examples include:
- invalid category input
- missing chunks
- forbidden release rights (
downloaded) - invalid language / media / release-rights values
- backend validation or file-processing failures
Per-file reporting
-
process_file()now prints file-specific failure messages to the console - if chunk upload fails, the CLI summarizes chunk error messages for that file
- if finalization fails, the exact finalization error is shown next to the filename
- missing file / unexpected exceptions are now surfaced clearly to the user
4. Improved category fetch fallback behavior
Updated:
src/corpus_client_cli/upload.py
Changes
- category fetch now parses error responses from the backend
- if category retrieval fails, the CLI prints the actual reason and falls back to the default category
- if the response shape is unexpected, the CLI warns and still proceeds safely
This keeps the upload flow usable while making the failure reason visible.
5. Improved extracted-text upload error handling
Updated:
src/corpus_client_cli/extracted_text_upload.py
Changes
- extracted-text POST failures now parse backend error payloads instead of embedding raw response bodies inconsistently
-
409 Conflictresponses now preserve the backend message -
422validation errors are formatted into readable field-level messages - failed uploads now print clear console messages per stem/file
-
409 already existscases are surfaced as warnings rather than opaque failures
This is important because the backend for extracted text returns meaningful messages such as:
Record not foundExtracted text already exists for this record and cannot be overwritten via POST. Use PATCH to update.- validation errors for malformed segments or invalid payload structure
- permission / authorization failures
Testing
Updated:
tests/test_async_logic.py
Added / updated coverage for
- login failure using backend
detail - upload finalization returning backend rejection text
- extracted-text upload formatting FastAPI validation error payloads
- response mocks supporting both JSON and plain-text bodies
Verification completed
- focused tests passed
- syntax compilation passed
Commands used
./.venv/bin/pytest tests/test_async_logic.py tests/test_cli_commands.py -q ./.venv/bin/python -m compileall src tests
Result
- 23 passed
User-Facing Impact
After this MR, CLI users should see backend-authored messages instead of vague status-only errors.
Examples:
- authentication failures now show the actual backend reason
- upload rejection reasons are shown next to the affected filename
- extracted-text payload issues now identify the failing field path
- category fetch problems are visible instead of silently degrading
This should significantly reduce debugging time when backend validation or authorization rules reject a request.
Why This Approach
The backend already provides structured and useful error responses.
The problem was not lack of backend messaging, but the CLI not consuming those responses correctly.
A shared parsing layer is the right fix because:
- it removes repeated ad hoc response handling
- it supports actual FastAPI response patterns already used by the server
- it keeps future endpoint integrations simpler and more consistent
This also avoids hardcoding backend-specific strings throughout the CLI.
Out of Scope
This MR does not:
- add support for new endpoints
- redesign the CLI UX beyond error surfacing
- change backend API contracts
- change upload / extract business logic
- add PATCH support for extracted text conflicts
- introduce retry policies or recovery workflows
Files Changed
Implementation
src/corpus_client_cli/api_errors.pysrc/corpus_client_cli/cli.pysrc/corpus_client_cli/upload.pysrc/corpus_client_cli/extracted_text_upload.py
Tests
tests/test_async_logic.py
Suggested Reviewer Focus
Reviewers may want to focus on:
- whether the shared error parser covers backend response shapes expected going forward
- whether the fallback behavior is appropriate when categories cannot be fetched
- whether
409extracted-text handling should remain warning-style - whether similar parsing should be extended to future endpoints
Risk Assessment
Risk: Low to Moderate
Reasons:
- changes are localized to response parsing and user-visible error reporting
- the request flow itself is unchanged
- tests cover the new parsing paths
- existing successful paths remain intact
Primary risk
If any endpoint returns an unexpected error shape not covered by detail, error, message, list, or plain text, the CLI will still fall back to HTTP <status> rather than failing outright.
Expected Outcome
Users should now be able to run the CLI and immediately understand backend rejections without inspecting server logs or guessing from status codes.
This is especially helpful for:
- login problems
- invalid upload metadata
- category issues
- extracted-text validation errors
- record state conflicts