Feat: Integrate anyLLM for Remote LLM Inference
Summary
Add support for remote LLM inference via the anyLLM library, enabling text-based metadata extraction without requiring a local GPU. This decouples text extraction (remote HTTP) from vision/image analysis (local vLLM), allowing the system to gracefully degrade when GPU memory is insufficient.
Problem Statement
The current architecture relies exclusively on a local vLLM instance for all LLM tasks, including both text-based semantic extraction and vision-language analysis. On hardware with limited GPU memory (e.g., RTX 3050 6GB), loading a 7B-parameter VLM (Qwen2.5-VL-7B-Instruct) causes an OOM error, which blocks the entire extraction pipeline — including text-only tasks that do not require a GPU.
Proposed Solution
- Introduce
anyLLMas an optional dependency for remote HTTP-based inference. - Create a unified
LLMClientabstraction with two implementations:-
AnyLLMClient— remote inference via OpenAI-compatible endpoints (Ollama/v1, HuggingFace Inference API, etc.) -
LocalVLLMClient— local vLLM inference (existing behavior, fallback)
-
- Decouple
llm_client(text) fromvision_client(image) inExtractionPipeline. - Auto-select the backend based on environment variable configuration.
- Gracefully skip vision analysis when local vLLM fails (OOM), while continuing text extraction via the remote endpoint.
Acceptance Criteria
-
Remote LLM inference works with OpenAI-compatible endpoints (Ollama /v1, HuggingFace Inference API) -
Local vLLM remains the default when no remote configuration is present -
Incomplete remote configuration raises a clear error message -
Vision analysis is skipped gracefully when local vLLM OOMs; text extraction continues -
All existing tests pass (212 tests) -
New tests cover AnyLLMClient, config detection, and fallback logic -
.env.exampledocuments the newBOOKEXTRACTOR_LLM_*variables -
pyproject.tomlincludesanyllm>=0.2.4as an optional dependency (remote-llmextra)
Technical Details
New Files
-
bookextractor/llm_clients.py—BaseLLMClient,AnyLLMClient,LocalVLLMClient,create_llm_client() -
tests/test_llm_clients.py— 15 unit tests for LLM client abstraction
Modified Files
-
bookextractor/__init__.py—.envauto-loading bootstrap -
bookextractor/config.py—Settings.__init__withBOOKEXTRACTOR_LLM_*env vars -
bookextractor/pipeline.py— decoupledllm_client/vision_client; JSONsegments/content_listparsing -
bookextractor/main.py—ensure_ascii=Falsefor Unicode output -
bookextractor/__init__.py— dotenv auto-load -
pyproject.toml—anyllm>=0.2.4dependency +remote-llmoptional extra -
.env.example— documented remote LLM configuration examples -
tests/conftest.py— updated fixtures for decoupled clients -
tests/test_pipeline.py— updated mocks forllm_client -
tests/test_json_pipeline_update.py— updated mocks forllm_client
Environment Variables
| Variable | Description | Example |
|---|---|---|
BOOKEXTRACTOR_LLM_MODEL |
Model identifier | Qwen/Qwen2.5-7B-Instruct |
BOOKEXTRACTOR_LLM_BASE_URL |
API base URL | https://api-inference.huggingface.co/v1 |
BOOKEXTRACTOR_LLM_API_KEY |
API key | hf_... |
BOOKEXTRACTOR_LLM_PROVIDER |
Provider name (default: openai) |
openai |
Labels
type::feature, priority::high, group::backend
Related Issues/MRs
- Related to: (link any parent epics or related issues)
- Blocked by: (none)
- Blocks: (none)
Edited by Praneeth Ashish