Feat: Integrate anyLLM for Remote LLM Inference
What does this MR do and why?
Integrates the anyLLM library to enable remote HTTP-based LLM inference for text extraction, decoupling it from the local vLLM vision model. This allows the system to:
- Perform text-based metadata extraction via remote endpoints (Ollama
/v1, HuggingFace Inference API, OpenAI, etc.) without requiring a local GPU. - Gracefully degrade when local vLLM fails due to insufficient GPU memory (OOM) — vision analysis is skipped, but text extraction continues via the remote backend.
- Maintain backward compatibility — local vLLM remains the default when no remote configuration is present.
Architecture
┌─────────────────────────────────────────────────────┐
│ ExtractionPipeline │
│ │
│ ┌──────────────────┐ ┌───────────────────────┐ │
│ │ llm_client │ │ vision_client │ │
│ │ (text only) │ │ (image/vision only) │ │
│ │ │ │ │ │
│ │ AnyLLMClient ──┤ │ LocalVLLMClient │ │
│ │ (remote HTTP) │ │ (local vLLM, optional)│ │
│ │ or │ │ or │ │
│ │ LocalVLLMClient │ │ None (skipped) │ │
│ │ (fallback) │ │ │ │
│ └──────────────────┘ └───────────────────────┘ │
└─────────────────────────────────────────────────────┘
Backend selection is automatic based on environment variables:
- If
BOOKEXTRACTOR_LLM_MODEL,BOOKEXTRACTOR_LLM_BASE_URL, andBOOKEXTRACTOR_LLM_API_KEYare all set →AnyLLMClient(remote) - Otherwise →
LocalVLLMClient(local, fallback)
Screenshots / Screenscasts
(N/A — backend-only change)
MR Acceptance Checklist
General
-
I have read the contributing guidelines -
I have added tests for new functionality -
I have updated the documentation ( .env.example, architecture docs) -
I have followed the project's code style (ruff, mypy) -
All CI pipelines pass
Testing
-
Unit tests: 212 tests pass -
New tests added: 15 tests in tests/test_llm_clients.py -
Coverage maintained at 91%
Security & Performance
-
No secrets or API keys committed ( .envis in.gitignore) -
API keys are masked in log output -
No regression in existing functionality
How to Set Up and Test This MR
1. Install dependencies
uv sync --extra remote-llm
2. Configure remote LLM (example: HuggingFace Inference API)
BOOKEXTRACTOR_LLM_MODEL=Qwen/Qwen2.5-7B-Instruct
BOOKEXTRACTOR_LLM_BASE_URL=https://api-inference.huggingface.co/v1
BOOKEXTRACTOR_LLM_API_KEY=hf_your-key-here
BOOKEXTRACTOR_LLM_PROVIDER=openai
3. Configure remote LLM (example: Ollama local)
BOOKEXTRACTOR_LLM_MODEL=qwen2.5:7b
BOOKEXTRACTOR_LLM_BASE_URL=http://localhost:11434/v1
BOOKEXTRACTOR_LLM_API_KEY=ollama
BOOKEXTRACTOR_LLM_PROVIDER=openai
4. Run extraction
bookextractor extract input.json output.json
5. Run tests
uv run pytest tests/ -v
Known Limitations
-
anyLLMv0.2.4 supports only 4 providers:openai,anthropic,ollama,llamacpp. Custom providers require model string prefix format:openai/your-model-name. - Vision/image analysis still requires local vLLM (GPU). If GPU memory is insufficient, vision analysis is skipped gracefully.
- The
providerparameter is not passed as a kwarg toanyllm.chat()— instead, the model string is prefixed (e.g.,openai/Qwen/Qwen2.5-7B-Instruct) because anyLLM'sparse_model_string()extracts the provider from the first/segment.
Related Issues
- Closes: #10 (closed)
Edited by Praneeth Ashish