Skip to content

Draft: refact: made backfill command not to run on database directly

Bhaskar Battula requested to merge backfill-operations into develop

Summary

This MR refactors the actions backfill workflow so the CLI no longer depends on direct database access or an external corpus- server-app checkout for the main backfill flows. Instead, the CLI now ships bundled local backfill scripts and runs them directly from this repo.

What This MR Does

  • Reworks corpus-client actions to use bundled backfill modules by default.
  • Removes the --database-url / DATABASE_URL dependency from the CLI actions flow.
  • Uses authenticated API context from the saved CLI session, or CORPUS_API_TOKEN and CORPUS_API_BASE_URL.
  • Adds bundled local backfill implementations for:
    • actions duration
    • actions file-hash
    • actions snr
  • Keeps --server-path as an override for external action scripts when needed.
  • Updates docs and command references to reflect the new client-side bundled backfill behavior.
  • Retains actions version-0 as a server-assisted flow because there is no existing public history-write API exposed through normal record APIs in this repo.

Behavior Change

Before

  • CLI launched external scripts from corpus-server-app/actions/
  • Scripts expected direct DB access via DATABASE_URL

After

  • CLI launches bundled scripts from src/corpus_client_cli/backfills
  • Scripts run locally
  • Scripts use authenticated API access for record fetch/download/update
  • No DB credentials are required on the client for the bundled backfills

Test Coverage

Added and expanded coverage for:

  • action command wiring
  • bundled backfill module execution
  • API auth/env resolution
  • record discovery and pagination behavior
  • download, patch, and post flows
  • temp-file cleanup paths
  • hash, duration, and SNR helper logic
  • failure and skip scenarios
  • version-0 candidate selection flow

Validation completed:

  • uv run pre-commit run --all-files
  • uv run pytest -n auto --cov=src/corpus_client_cli --cov-report=term-missing --cov-fail-under=75
  • Full test suite: 404 passed
  • Coverage: 80.32%

Closes

Closes: #48

Merge request reports

Loading