Skip to content

feat: added retry logic for the 5xx errors

Bhaskar Battula requested to merge automatic-retry-logic into develop

Overview

This MR introduces a robust retry mechanism for handling transient API failures during upload workflows. It enhances system reliability by automatically retrying failed requests using an exponential backoff strategy, reducing the need for manual intervention in cases of temporary server or network instability.


What This MR Does

  • Implements a reusable asynchronous retry helper with exponential backoff.
  • Automatically retries transient failures, including:
    • HTTP 5xx errors (500, 502, 503, 504)
    • Network-related issues (e.g., connection failures)
    • Timeout errors
    • Client exceptions (e.g., aiohttp.ClientError)
  • Applies retry logic to:
    • File chunk upload requests
    • Upload finalization requests
    • Extracted text upload requests
  • Adds contextual retry logging for better observability (e.g., retry attempt count, affected file/chunk).
  • Introduces a configurable --max-retries CLI option.
  • Persists retry configuration across resume workflows.
  • Ensures non-transient errors (e.g., 4xx responses) are not retried incorrectly.

Behavioral Details

  • Retries are performed at the individual API request level.
  • For chunked uploads:
    • Only failed chunks are retried; successful chunks are not re-uploaded.
    • If retries are exhausted, the file is marked as failed and can be resumed later.
  • Upload finalization is retried independently after all chunks are uploaded.

Impact

  • Improves reliability and resilience of bulk and large file uploads.
  • Reduces failure rates caused by temporary infrastructure or network issues.
  • Minimizes manual intervention by automatically recovering from transient errors.
  • Enhances CLI transparency through detailed retry logs.
  • Maintains backward compatibility with existing upload and resume workflows.

Acceptance Criteria

  • Transient failures (HTTP 5xx, timeouts, connection errors) are retried automatically.
  • Retry attempts follow an exponential backoff strategy.
  • Maximum retry attempts are configurable via CLI (--max-retries).
  • Retry behavior is consistent across:
    • Chunk uploads
    • Upload finalization
    • Extracted text uploads
  • Successful chunks are not re-uploaded during retries.
  • Retry attempts and failures are clearly logged in the CLI.
  • Non-transient errors (e.g., HTTP 4xx) are not retried.
  • Retry configuration persists correctly for resumed uploads.
  • All new and existing tests pass (174 passed), including retry-specific regression tests.

Merge request reports

Loading