Feat: Integrate anyLLM for Remote LLM Inference

Summary

Add support for remote LLM inference via the anyLLM library, enabling text-based metadata extraction without requiring a local GPU. This decouples text extraction (remote HTTP) from vision/image analysis (local vLLM), allowing the system to gracefully degrade when GPU memory is insufficient.

Problem Statement

The current architecture relies exclusively on a local vLLM instance for all LLM tasks, including both text-based semantic extraction and vision-language analysis. On hardware with limited GPU memory (e.g., RTX 3050 6GB), loading a 7B-parameter VLM (Qwen2.5-VL-7B-Instruct) causes an OOM error, which blocks the entire extraction pipeline — including text-only tasks that do not require a GPU.

Proposed Solution

  1. Introduce anyLLM as an optional dependency for remote HTTP-based inference.
  2. Create a unified LLMClient abstraction with two implementations:
    • AnyLLMClient — remote inference via OpenAI-compatible endpoints (Ollama /v1, HuggingFace Inference API, etc.)
    • LocalVLLMClient — local vLLM inference (existing behavior, fallback)
  3. Decouple llm_client (text) from vision_client (image) in ExtractionPipeline.
  4. Auto-select the backend based on environment variable configuration.
  5. Gracefully skip vision analysis when local vLLM fails (OOM), while continuing text extraction via the remote endpoint.

Acceptance Criteria

  • Remote LLM inference works with OpenAI-compatible endpoints (Ollama /v1, HuggingFace Inference API)
  • Local vLLM remains the default when no remote configuration is present
  • Incomplete remote configuration raises a clear error message
  • Vision analysis is skipped gracefully when local vLLM OOMs; text extraction continues
  • All existing tests pass (212 tests)
  • New tests cover AnyLLMClient, config detection, and fallback logic
  • .env.example documents the new BOOKEXTRACTOR_LLM_* variables
  • pyproject.toml includes anyllm>=0.2.4 as an optional dependency (remote-llm extra)

Technical Details

New Files

  • bookextractor/llm_clients.pyBaseLLMClient, AnyLLMClient, LocalVLLMClient, create_llm_client()
  • tests/test_llm_clients.py — 15 unit tests for LLM client abstraction

Modified Files

  • bookextractor/__init__.py.env auto-loading bootstrap
  • bookextractor/config.pySettings.__init__ with BOOKEXTRACTOR_LLM_* env vars
  • bookextractor/pipeline.py — decoupled llm_client / vision_client; JSON segments/content_list parsing
  • bookextractor/main.pyensure_ascii=False for Unicode output
  • bookextractor/__init__.py — dotenv auto-load
  • pyproject.tomlanyllm>=0.2.4 dependency + remote-llm optional extra
  • .env.example — documented remote LLM configuration examples
  • tests/conftest.py — updated fixtures for decoupled clients
  • tests/test_pipeline.py — updated mocks for llm_client
  • tests/test_json_pipeline_update.py — updated mocks for llm_client

Environment Variables

Variable Description Example
BOOKEXTRACTOR_LLM_MODEL Model identifier Qwen/Qwen2.5-7B-Instruct
BOOKEXTRACTOR_LLM_BASE_URL API base URL https://api-inference.huggingface.co/v1
BOOKEXTRACTOR_LLM_API_KEY API key hf_...
BOOKEXTRACTOR_LLM_PROVIDER Provider name (default: openai) openai

Labels

type::feature, priority::high, group::backend

Related Issues/MRs

  • Related to: (link any parent epics or related issues)
  • Blocked by: (none)
  • Blocks: (none)
Edited by Praneeth Ashish