Feat: Integrate anyLLM for Remote LLM Inference

What does this MR do and why?

Integrates the anyLLM library to enable remote HTTP-based LLM inference for text extraction, decoupling it from the local vLLM vision model. This allows the system to:

  • Perform text-based metadata extraction via remote endpoints (Ollama /v1, HuggingFace Inference API, OpenAI, etc.) without requiring a local GPU.
  • Gracefully degrade when local vLLM fails due to insufficient GPU memory (OOM) — vision analysis is skipped, but text extraction continues via the remote backend.
  • Maintain backward compatibility — local vLLM remains the default when no remote configuration is present.

Architecture

┌─────────────────────────────────────────────────────┐
│                  ExtractionPipeline                  │
│                                                      │
│  ┌──────────────────┐    ┌───────────────────────┐  │
│  │  llm_client      │    │  vision_client         │  │
│  │  (text only)     │    │  (image/vision only)   │  │
│  │                  │    │                        │  │
│  │  AnyLLMClient ──┤    │  LocalVLLMClient       │  │
│  │  (remote HTTP)  │    │  (local vLLM, optional)│  │
│  │       or         │    │       or               │  │
│  │  LocalVLLMClient │    │  None (skipped)        │  │
│  │  (fallback)      │    │                        │  │
│  └──────────────────┘    └───────────────────────┘  │
└─────────────────────────────────────────────────────┘

Backend selection is automatic based on environment variables:

  • If BOOKEXTRACTOR_LLM_MODEL, BOOKEXTRACTOR_LLM_BASE_URL, and BOOKEXTRACTOR_LLM_API_KEY are all set → AnyLLMClient (remote)
  • Otherwise → LocalVLLMClient (local, fallback)

Screenshots / Screenscasts

(N/A — backend-only change)

MR Acceptance Checklist

General

  • I have read the contributing guidelines
  • I have added tests for new functionality
  • I have updated the documentation (.env.example, architecture docs)
  • I have followed the project's code style (ruff, mypy)
  • All CI pipelines pass

Testing

  • Unit tests: 212 tests pass
  • New tests added: 15 tests in tests/test_llm_clients.py
  • Coverage maintained at 91%

Security & Performance

  • No secrets or API keys committed (.env is in .gitignore)
  • API keys are masked in log output
  • No regression in existing functionality

How to Set Up and Test This MR

1. Install dependencies

uv sync --extra remote-llm

2. Configure remote LLM (example: HuggingFace Inference API)

BOOKEXTRACTOR_LLM_MODEL=Qwen/Qwen2.5-7B-Instruct
BOOKEXTRACTOR_LLM_BASE_URL=https://api-inference.huggingface.co/v1
BOOKEXTRACTOR_LLM_API_KEY=hf_your-key-here
BOOKEXTRACTOR_LLM_PROVIDER=openai

3. Configure remote LLM (example: Ollama local)

BOOKEXTRACTOR_LLM_MODEL=qwen2.5:7b
BOOKEXTRACTOR_LLM_BASE_URL=http://localhost:11434/v1
BOOKEXTRACTOR_LLM_API_KEY=ollama
BOOKEXTRACTOR_LLM_PROVIDER=openai

4. Run extraction

bookextractor extract input.json output.json

5. Run tests

uv run pytest tests/ -v

Known Limitations

  • anyLLM v0.2.4 supports only 4 providers: openai, anthropic, ollama, llamacpp. Custom providers require model string prefix format: openai/your-model-name.
  • Vision/image analysis still requires local vLLM (GPU). If GPU memory is insufficient, vision analysis is skipped gracefully.
  • The provider parameter is not passed as a kwarg to anyllm.chat() — instead, the model string is prefixed (e.g., openai/Qwen/Qwen2.5-7B-Instruct) because anyLLM's parse_model_string() extracts the provider from the first / segment.

Related Issues

Edited by Praneeth Ashish

Merge request reports

Loading