Skip to content

feat: add record listing command with pagination, filtering, and sorting

Madavarapu Sai Harshavardhan requested to merge feat/record-listing-harsha into develop

Overview

Adds a corpus-client list command that lets users browse their uploaded records directly from the CLI with pagination, filtering, and sorting. This implements ROADMAP item 2.1 (Record Listing) — previously users had no way to view records without switching to the web frontend.

What does this MR do and why?

feat: add record listing command with pagination, filtering, and sorting

Implements ROADMAP 2.1 - adds corpus-client list command that fetches records from GET /api/v1/records/ with support for media type, language, status, and date range filters plus sorting by date, title, or status.

Motivation: The CLI supports uploading and extracting but had no way to view uploaded records. Users had to switch to the React web app to check record status or browse contributions, breaking the CLI-only workflow.

Approach: Follows the existing architecture — thin Typer command in cli.py delegating to a new records.py module with a single run_list_records() entry point. This is the same pattern used by upload.py and extracted_text_upload.py.

Trade-offs: The API response format is handled defensively (results, items, or direct list) since the exact backend response structure hasn't been confirmed. Unsupported query params are silently ignored by the API.

Changes Made

File Action Purpose
src/corpus_client_cli/records.py Created API fetching, response parsing, Rich table rendering
src/corpus_client_cli/cli.py Modified Added list command with 9 Typer options, imported records module
docs/ROADMAP.md Modified Marked 2.1 items as [x] complete

Technical Details

Architecture:

  • cli.py adds a list_records() command that checks auth, creates an aiohttp.ClientSession, and delegates to records.run_list_records()
  • records.py contains 4 focused functions:
    • _build_params() — builds query params dict, excluding None values
    • _parse_response() — extracts records list and total count from flexible API response formats
    • _display_records() — renders Rich table with emoji media type labels and pagination info
    • run_list_records() — single async entry point that fetches, parses, displays, and handles errors

CLI options:

Option Type Default Description
--page / -p int 1 Page number
--size / -s int 20 Records per page
--type / -t str None Filter: text, audio, video, document, image
--language / -l str None Filter: language code
--status str None Filter: record status
--sort str None Sort field: date, title, status
--order str desc Sort order: asc, desc
--from str None Start date (YYYY-MM-DD)
--to str None End date (YYYY-MM-DD)

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 📝 Documentation update
  • 🎨 UI/UX improvement
  • ️ Refactor (no functional changes)
  • Performance improvement
  • 🧪 Test update
  • 🔧 Configuration change
  • 🚨 Security fix

Related Issues / References

  • Related to: ROADMAP Priority 2 (Fetch/Retrieve Features)
  • Enables: 2.2 Record Details (depends on record IDs visible from list output)
  • Depends on: Backend GET /api/v1/records/ endpoint

Screenshots or Screen Recordings

corpus-client list --help output:

Usage: corpus-client list [OPTIONS]

 📋 List uploaded records with filters and pagination

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --page      -p      INTEGER  Page number [default: 1]                        │
│ --size      -s      INTEGER  Records per page [default: 20]                  │
│ --type      -t      TEXT     Filter by media type (text, audio, video,       │
│                              document, image)                                │
│ --language  -l      TEXT     Filter by language                              │
│ --status            TEXT     Filter by status                                │
│ --sort              TEXT     Sort by field (date, title, status)             │
│ --order             TEXT     Sort order (asc, desc) [default: desc]          │
│ --from              TEXT     Filter from date (YYYY-MM-DD)                   │
│ --to                TEXT     Filter to date (YYYY-MM-DD)                     │
│ --help                       Show this message and exit.                     │
╰──────────────────────────────────────────────────────────────────────────────╯

Table output (mock data):

        📋 Records (Page 1)
┌────┬─────────┬──────────────┬──────────────┬────────────┬──────────────┬──────────────┐
│ #  │ ID      │ Title        │ Type         │ Language   │ Status       │ Created      │
├────┼─────────┼──────────────┼──────────────┼────────────┼──────────────┼──────────────┤
│ 1  │ abc-123 │ Sample Audio │ 🎵 Audio     │ telugu     │ approved     │ 2025-06-15   │
│ 2  │ def-456 │ My Document  │ 📑 Document  │ hindi      │ pending      │ 2025-07-20   │
└────┴─────────┴──────────────┴──────────────┴────────────┴──────────────┴──────────────┘
Showing page 1 of 1 (2 total records)

How to Set Up and Validate Locally

  1. Pull this branch:
    git checkout feat/record-listing-harsha
  2. Install dependencies:
    uv sync
  3. Verify command exists:
    corpus-client list --help
  4. Test auth guard (without logging in):
    corpus-client list
    # Expected: "Not logged in. Run: corpus-client login" + exit code 1
  5. Test with authentication:
    corpus-client login
    corpus-client list
    corpus-client list --type audio --language telugu
    corpus-client list --page 2 --size 10
    corpus-client list --sort title --order asc
    corpus-client list --from 2025-01-01 --to 2025-12-31
  6. Expected: Rich table with records or "No records found." if no matches.

Testing Done

  • Manual testing completed
  • Unit tests run (13 tests via inline test script)

Test Cases Covered:

Scenario Expected Result Status
List records (default pagination) Table with up to 20 records, pagination footer
Pagination (page 2, size 10) Row numbering starts at 11, correct page count
Filter by media type Only matching records shown
Filter by language Only matching records shown
Filter by status Only matching records shown
Filter by date range date_from/date_to params sent
Sort by title asc sort_by + sort_order params sent
Empty results "No records found." in yellow
Unauthenticated user Auth error message, exit code 1
Expired token (401) "Unauthorized. Please login again."
Connection error "Connection error: ..."
Generic API error (500) "API error (HTTP 500): ..."
Title truncation (>28 chars) Truncated to 25 + "..."

Code Quality Checklist

Code Standards

  • Code follows project conventions (async pattern, module delegation, Rich output)
  • No console.log() or debugger statements left in code
  • No unused imports, variables, or functions
  • No duplicate code and use of existing components for reusability
  • ruff check passes on records.py (0 errors)

Python / CLI Best Practices

  • Follows existing module pattern (run_* entry point per module)
  • Auth check consistent with other commands (upload_files, upload_extracted)
  • Rich console passed from cli.py (single instance, not recreated)
  • aiohttp session created in cli.py and passed to module (consistent with existing commands)
  • Error handling covers: 200, 401, other HTTP errors, connection exceptions

API & Data Fetching

  • Bearer token auth header included
  • Query params built cleanly (None values excluded)
  • Response format handled defensively (results/items/list)
  • HTTP error codes handled (401 specific message, generic for others)
  • Connection errors caught and displayed

Error Handling

  • Errors caught and handled gracefully
  • User-friendly error messages displayed with Rich formatting
  • Network failures handled with actionable message

Documentation

  • ROADMAP.md updated (2.1 items marked complete)
  • README.md updated — not needed, no setup changes
  • user-manual.md updated — should be updated in a follow-up to document the list command

Known Limitations / Technical Debt

  • API response format assumed: The _parse_response() function handles results, items, and direct list formats defensively. The actual backend response format should be confirmed.
  • Query param names assumed: media_type, language, status, sort_by, sort_order, date_from, date_to may need adjustment based on actual API contract.
  • No formal pytest tests: Verification done via inline test scripts (13 tests). Formal pytest tests should be added as the test infrastructure matures.
  • user-manual.md not updated: The list command should be documented in a follow-up MR.

Additional Notes

  • This is a read-only command — no state files created, no data modified.
  • The sort_order param is only sent when sort_by is provided (avoids meaningless param).
  • Pre-existing lint warnings in cli.py (unused logging import, unused glob import, ambiguous variable l) are not addressed in this MR — they're unrelated to this feature.

MR Acceptance Checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Merge request reports

Loading