Feat: Integrate anyLLM for Remote LLM Inference

Summary

Add support for remote LLM inference via the anyLLM library, enabling text-based metadata extraction without requiring a local GPU. This decouples text extraction (remote HTTP) from vision/image analysis (local vLLM), allowing the system to gracefully degrade when GPU memory is insufficient.

Problem Statement

The current architecture relies exclusively on a local vLLM instance for all LLM tasks, including both text-based semantic extraction and vision-language analysis. On hardware with limited GPU memory (e.g., RTX 3050 6GB), loading a 7B-parameter VLM (Qwen2.5-VL-7B-Instruct) causes an OOM error, which blocks the entire extraction pipeline — including text-only tasks that do not require a GPU.

Proposed Solution

Introduce anyLLM as an optional dependency for remote HTTP-based inference.
Create a unified LLMClient abstraction with two implementations:
- AnyLLMClient — remote inference via OpenAI-compatible endpoints (Ollama /v1, HuggingFace Inference API, etc.)
- LocalVLLMClient — local vLLM inference (existing behavior, fallback)
Decouple llm_client (text) from vision_client (image) in ExtractionPipeline.
Auto-select the backend based on environment variable configuration.
Gracefully skip vision analysis when local vLLM fails (OOM), while continuing text extraction via the remote endpoint.

Acceptance Criteria

Remote LLM inference works with OpenAI-compatible endpoints (Ollama /v1, HuggingFace Inference API)
Local vLLM remains the default when no remote configuration is present
Incomplete remote configuration raises a clear error message
Vision analysis is skipped gracefully when local vLLM OOMs; text extraction continues
All existing tests pass (212 tests)
New tests cover AnyLLMClient, config detection, and fallback logic
.env.example documents the new BOOKEXTRACTOR_LLM_* variables
pyproject.toml includes anyllm>=0.2.4 as an optional dependency (remote-llm extra)

Technical Details

New Files

bookextractor/llm_clients.py — BaseLLMClient, AnyLLMClient, LocalVLLMClient, create_llm_client()
tests/test_llm_clients.py — 15 unit tests for LLM client abstraction

Modified Files

bookextractor/__init__.py — .env auto-loading bootstrap
bookextractor/config.py — Settings.__init__ with BOOKEXTRACTOR_LLM_* env vars
bookextractor/pipeline.py — decoupled llm_client / vision_client; JSON segments/content_list parsing
bookextractor/main.py — ensure_ascii=False for Unicode output
bookextractor/__init__.py — dotenv auto-load
pyproject.toml — anyllm>=0.2.4 dependency + remote-llm optional extra
.env.example — documented remote LLM configuration examples
tests/conftest.py — updated fixtures for decoupled clients
tests/test_pipeline.py — updated mocks for llm_client
tests/test_json_pipeline_update.py — updated mocks for llm_client

Environment Variables

Variable	Description	Example
`BOOKEXTRACTOR_LLM_MODEL`	Model identifier	`Qwen/Qwen2.5-7B-Instruct`
`BOOKEXTRACTOR_LLM_BASE_URL`	API base URL	`https://api-inference.huggingface.co/v1`
`BOOKEXTRACTOR_LLM_API_KEY`	API key	`hf_...`
`BOOKEXTRACTOR_LLM_PROVIDER`	Provider name (default: `openai`)	`openai`

Labels

type::feature, priority::high, group::backend

Related Issues/MRs

Related to: (link any parent epics or related issues)
Blocked by: (none)
Blocks: (none)

Edited May 22, 2026 by Praneeth Ashish