feat: add record listing command with pagination, filtering, and sorting (!3) · Merge requests · Corpus / Corpus Client CLI

Madavarapu Sai Harshavardhan requested to merge feat/record-listing-harsha into develop Mar 28, 2026

Overview

Adds a corpus-client list command that lets users browse their uploaded records directly from the CLI with pagination, filtering, and sorting. This implements ROADMAP item 2.1 (Record Listing) — previously users had no way to view records without switching to the web frontend.

What does this MR do and why?

feat: add record listing command with pagination, filtering, and sorting

Implements ROADMAP 2.1 - adds corpus-client list command that fetches records from GET /api/v1/records/ with support for media type, language, status, and date range filters plus sorting by date, title, or status.

Motivation: The CLI supports uploading and extracting but had no way to view uploaded records. Users had to switch to the React web app to check record status or browse contributions, breaking the CLI-only workflow.

Approach: Follows the existing architecture — thin Typer command in cli.py delegating to a new records.py module with a single run_list_records() entry point. This is the same pattern used by upload.py and extracted_text_upload.py.

Trade-offs: The API response format is handled defensively (results, items, or direct list) since the exact backend response structure hasn't been confirmed. Unsupported query params are silently ignored by the API.

Changes Made

File	Action	Purpose
`src/corpus_client_cli/records.py`	Created	API fetching, response parsing, Rich table rendering
`src/corpus_client_cli/cli.py`	Modified	Added `list` command with 9 Typer options, imported `records` module
`docs/ROADMAP.md`	Modified	Marked 2.1 items as `[x]` complete

Technical Details

Architecture:

cli.py adds a list_records() command that checks auth, creates an aiohttp.ClientSession, and delegates to records.run_list_records()
records.py contains 4 focused functions:
- _build_params() — builds query params dict, excluding None values
- _parse_response() — extracts records list and total count from flexible API response formats
- _display_records() — renders Rich table with emoji media type labels and pagination info
- run_list_records() — single async entry point that fetches, parses, displays, and handles errors

CLI options:

Option	Type	Default	Description
`--page` / `-p`	int	1	Page number
`--size` / `-s`	int	20	Records per page
`--type` / `-t`	str	None	Filter: text, audio, video, document, image
`--language` / `-l`	str	None	Filter: language code
`--status`	str	None	Filter: record status
`--sort`	str	None	Sort field: date, title, status
`--order`	str	desc	Sort order: asc, desc
`--from`	str	None	Start date (YYYY-MM-DD)
`--to`	str	None	End date (YYYY-MM-DD)

Type of Change

Related Issues / References

Related to: ROADMAP Priority 2 (Fetch/Retrieve Features)
Enables: 2.2 Record Details (depends on record IDs visible from list output)
Depends on: Backend GET /api/v1/records/ endpoint

Screenshots or Screen Recordings

corpus-client list --help output:

Usage: corpus-client list [OPTIONS]

 📋 List uploaded records with filters and pagination

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --page      -p      INTEGER  Page number [default: 1]                        │
│ --size      -s      INTEGER  Records per page [default: 20]                  │
│ --type      -t      TEXT     Filter by media type (text, audio, video,       │
│                              document, image)                                │
│ --language  -l      TEXT     Filter by language                              │
│ --status            TEXT     Filter by status                                │
│ --sort              TEXT     Sort by field (date, title, status)             │
│ --order             TEXT     Sort order (asc, desc) [default: desc]          │
│ --from              TEXT     Filter from date (YYYY-MM-DD)                   │
│ --to                TEXT     Filter to date (YYYY-MM-DD)                     │
│ --help                       Show this message and exit.                     │
╰──────────────────────────────────────────────────────────────────────────────╯

Table output (mock data):

        📋 Records (Page 1)
┌────┬─────────┬──────────────┬──────────────┬────────────┬──────────────┬──────────────┐
│ #  │ ID      │ Title        │ Type         │ Language   │ Status       │ Created      │
├────┼─────────┼──────────────┼──────────────┼────────────┼──────────────┼──────────────┤
│ 1  │ abc-123 │ Sample Audio │ 🎵 Audio     │ telugu     │ approved     │ 2025-06-15   │
│ 2  │ def-456 │ My Document  │ 📑 Document  │ hindi      │ pending      │ 2025-07-20   │
└────┴─────────┴──────────────┴──────────────┴────────────┴──────────────┴──────────────┘
Showing page 1 of 1 (2 total records)

How to Set Up and Validate Locally

Pull this branch:
```
git checkout feat/record-listing-harsha
```
Install dependencies:
```
uv sync
```
Verify command exists:
```
corpus-client list --help
```

Test auth guard (without logging in):

corpus-client list
# Expected: "Not logged in. Run: corpus-client login" + exit code 1

Test with authentication:

corpus-client login
corpus-client list
corpus-client list --type audio --language telugu
corpus-client list --page 2 --size 10
corpus-client list --sort title --order asc
corpus-client list --from 2025-01-01 --to 2025-12-31

Expected: Rich table with records or "No records found." if no matches.

Testing Done

Manual testing completed
Unit tests run (13 tests via inline test script)

Test Cases Covered:

Scenario	Expected Result	Status
List records (default pagination)	Table with up to 20 records, pagination footer	✅
Pagination (page 2, size 10)	Row numbering starts at 11, correct page count	✅
Filter by media type	Only matching records shown	✅
Filter by language	Only matching records shown	✅
Filter by status	Only matching records shown	✅
Filter by date range	date_from/date_to params sent	✅
Sort by title asc	sort_by + sort_order params sent	✅
Empty results	"No records found." in yellow	✅
Unauthenticated user	Auth error message, exit code 1	✅
Expired token (401)	"Unauthorized. Please login again."	✅
Connection error	"Connection error: ..."	✅
Generic API error (500)	"API error (HTTP 500): ..."	✅
Title truncation (>28 chars)	Truncated to 25 + "..."	✅

Code Quality Checklist

Code Standards

Code follows project conventions (async pattern, module delegation, Rich output)
No console.log() or debugger statements left in code
No unused imports, variables, or functions
No duplicate code and use of existing components for reusability
ruff check passes on records.py (0 errors)

Python / CLI Best Practices

Follows existing module pattern (run_* entry point per module)
Auth check consistent with other commands (upload_files, upload_extracted)
Rich console passed from cli.py (single instance, not recreated)
aiohttp session created in cli.py and passed to module (consistent with existing commands)
Error handling covers: 200, 401, other HTTP errors, connection exceptions

API & Data Fetching

Bearer token auth header included
Query params built cleanly (None values excluded)
Response format handled defensively (results/items/list)
HTTP error codes handled (401 specific message, generic for others)
Connection errors caught and displayed

Error Handling

Errors caught and handled gracefully
User-friendly error messages displayed with Rich formatting
Network failures handled with actionable message

Documentation

ROADMAP.md updated (2.1 items marked complete)
README.md updated — not needed, no setup changes
user-manual.md updated — should be updated in a follow-up to document the list command

Known Limitations / Technical Debt

API response format assumed: The _parse_response() function handles results, items, and direct list formats defensively. The actual backend response format should be confirmed.
Query param names assumed: media_type, language, status, sort_by, sort_order, date_from, date_to may need adjustment based on actual API contract.
No formal pytest tests: Verification done via inline test scripts (13 tests). Formal pytest tests should be added as the test infrastructure matures.
user-manual.md not updated: The list command should be documented in a follow-up MR.

Additional Notes

This is a read-only command — no state files created, no data modified.
The sort_order param is only sent when sort_by is provided (avoids meaningless param).
Pre-existing lint warnings in cli.py (unused logging import, unused glob import, ambiguous variable l) are not addressed in this MR — they're unrelated to this feature.

MR Acceptance Checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.