Skip to content

fix: migrate S3 bucket interactions to Python MinIO SDK

Ahlad Pataparla requested to merge fix/hetzner into develop

Summary

Migrate from aioboto3/botocore to the official MinIO Python SDK (minio>=7.2.5) for S3-compatible object storage operations. This change ensures compatibility with corpus-server-app and resolves issues encountered with Boto.

Changes

Dependency Migration

  • Replace aioboto3>=13.1.1 and botocore>=13.1.1 with minio>=7.2.5 in pyproject.toml
  • Update uv.lock with new dependency tree

Code Updates (upload.py)

  • Replace aioboto3.Session and botocore.config.Config with minio.Minio client
  • Update endpoint URL parsing to extract host and secure flag for MinIO client initialization
  • Convert async Boto calls to threaded blocking calls using asyncio.to_thread():
    • client.fget_object() for file downloads
    • client.list_objects() for object listing
  • Remove Boto-specific paginator logic in favor of MinIO's list_objects() API
  • Remove unused urllib.parse.urlencode import (now using urlparse)

Test Updates (test_s3_upload.py)

  • Update tests to reflect MinIO SDK usage

Why This Change?

  1. Compatibility: Aligns with corpus-server-app which uses the MinIO SDK
  2. Boto Issues: Resolves compatibility issues with Boto's async behavior and configuration complexity
  3. Simpler API: MinIO SDK provides a cleaner, more direct API for S3-compatible operations
  4. Reduced Dependencies: Removes aioboto3 and botocore in favor of a single lightweight SDK

Merge request reports

Loading