Refactor `actions/backfill` to Run Client-Side via API Endpoints Instead of Direct DB Access
Summary
Refactor the existing actions/backfill workflow so that backfill scripts continue to run client-side, but update data through backend API endpoints instead of directly connecting to the database.
Currently, the CLI launches Python backfill scripts that perform direct DB operations using database credentials. This change will remove direct DB access from the client and move all persistence/update operations behind authenticated API endpoints.
Current Architecture
CLI -> local backfill script -> database
Current issues:
- Requires DB credentials on client machine
- Requires direct database connectivity
- Tight coupling between scripts and database layer
Target Architecture
CLI -> local uvx snippet/backfill script -> API endpoint -> server -> database
The scripts will:
- Continue running locally/client-side
- Process files/data locally as needed
- Send updates through authenticated API requests
The backend will:
- Expose/update endpoints for backfill operations
- Own all DB write/update logic
- Validate and persist incoming updates
Scope
Client-Side Changes
- Keep backfill/actions scripts executable locally
- Replace direct DB writes with API calls
- Remove database dependency from scripts
- Use authenticated HTTP requests for updates
Backend Changes
- Add new API endpoints or extend existing ones
- Support batch update operations
- Validate admin authorization
- Handle persistence and validation server-side
Requirements
- Backfill/actions scripts should continue running locally
- No direct DB connections from CLI/scripts
- No DB credentials required on client
- Updates should happen only through API endpoints
- Existing functionality should remain intact
- API requests should support batching where needed
Proposed API Usage
Example endpoints:
| Operation | Endpoint |
|---|---|
| Media duration updates | POST /api/v1/backfills/media-duration |
| File hash updates | POST /api/v1/backfills/file-hashing |
| SNR updates | POST /api/v1/backfills/snr-frequency |
| Version migrations | POST /api/v1/backfills/version-0 |
Possible payload fields:
record_iddurationhashsnrfrequencymedia_typealgorithmbatch_sizedry_run
Acceptance Criteria
- Backfill/actions scripts continue running client-side
- Direct DB access removed from scripts
- DB credentials no longer required in CLI
- Scripts update data through API endpoints only
- Existing workflows continue functioning
- Backend validates/admin-protects endpoints
- Batch updates supported where applicable
- Error handling works correctly for API failures
Test Cases
- Client-side script successfully updates records through API
- No direct DB connections exist in scripts
- Invalid API auth returns proper error
- API validation failures handled gracefully
- Batch updates process correctly
- Existing backfill workflows continue working after migration
- Scripts work without database credentials
Technical Notes
- Existing subprocess/uvx execution flow can remain
- Replace DB repositories/session usage with HTTP client calls (
aiohttp) - Existing endpoints may be reused or extended where possible
- New endpoints should support bulk update operations for efficiency
Definition of Done
- Direct DB access removed from backfill/actions scripts
- API-based update flow implemented
- Required backend endpoints added/updated
- Tests updated and passing
- Documentation updated
- Admin authorization verified