feat: S3 file upload support in csv based uploading (!43) · Merge requests · Corpus / Corpus Client CLI

Bhaskar Battula requested to merge s3-bucket-support-in-csv into develop Apr 22, 2026

Description

Currently, the CSV file used for uploading data to the corpus only supports local file paths. This limitation requires files to be present on the local system before ingestion, which reduces flexibility and scalability of the data upload process.

To improve the data ingestion workflow, we should extend support to include S3 bucket URLs in the CSV file. This enhancement will allow users to directly reference files stored in S3, eliminating the need for manual downloads or local file management.

Proposed Enhancement

Enable the CSV parser to accept S3 URLs (e.g., s3://bucket-name/path/to/file or HTTPS S3 links).
Implement logic to fetch and process files directly from S3.
Ensure proper authentication/authorization mechanisms are handled (e.g., IAM roles, access keys, or pre-signed URLs).
Maintain backward compatibility with existing local file path support.

Benefits

Streamlines data ingestion by removing dependency on local storage.
Improves scalability for large datasets.
Aligns with cloud-native workflows and storage practices.

Acceptance Criteria

CSV file accepts both local file paths and S3 URLs.
Files referenced via S3 URLs are successfully fetched and processed.
Proper error handling for invalid or inaccessible S3 paths.
Documentation updated to reflect the new capability.

feat: S3 file upload support in csv based uploading

Description

Proposed Enhancement

Benefits

Acceptance Criteria

Merge request reports