feat: cache categories and languages locally for offline read support
Summary
Add local caching for category and language discovery data so the CLI can continue to serve read-only discovery commands even when the API is temporarily unavailable or the user is offline.
Background
The roadmap includes 5.2 Cache & Offline Support, specifically:
- cache category/language lists locally
- work offline for read operations
- sync when online
The CLI already depends on category and language data for discovery and upload flows, but currently these reads appear to rely on live API access. This creates friction when the API is slow, unavailable, or the user has intermittent connectivity.
Problem
Users cannot reliably access category and language discovery data without a live API connection. This affects:
- category discovery
- language discovery
- upload flows that depend on category/language selection
A lightweight local cache would improve usability and reduce repeated API calls for stable reference data.
Proposed Solution
Implement a cache layer for categories and languages:
- save category list locally after a successful API fetch
- save language list locally after a successful API fetch
- use cached data as fallback when online fetch fails
- optionally allow explicit cache refresh
- clearly indicate whether the output is from live API data or local cache
Commands / UX
Expected behavior:
- corpus-client categories
- first tries API
- falls back to cache if API fails
- prints a note like Using cached category data
- corpus-client languages
- first tries API
- falls back to cache if API fails
- prints a note like Using cached language data
Optional extensions:
- --refresh to force re-fetch and overwrite cache
- --offline to skip network calls and use cache only
Scope
In scope:
- local cache storage for categories
- local cache storage for languages
- fallback logic for offline/unavailable API
- cache metadata such as fetched_at
- clear terminal messaging about cache vs live data
- tests for cache behavior
Out of scope:
- full offline sync for uploads
- background sync
- caching mutable user-specific resources like records or profile unless separately planned
Implementation Notes
Suggested cache contents:
- categories
- languages
- fetched_at
- optional base_url to avoid mixing cache across environments
Suggested storage approach:
- store cache alongside CLI state/config files, or under a centralized state directory if that work is already in progress
- use JSON for readability and easy debugging
Suggested behavior:
- on successful fetch:
- update cache
- return live data
- on failed fetch:
- if cache exists, return cached data with warning/info message
- if cache does not exist, return a clear error
- if --offline is provided:
- do not attempt API call
- use cache only
Acceptance Criteria
- Categories are cached locally after a successful fetch
- Languages are cached locally after a successful fetch
- categories command can return cached data when API is unavailable
- languages command can return cached data when API is unavailable
- CLI clearly shows whether data is from API or cache
- Cache is scoped safely so data from one environment does not incorrectly appear for another
- Tests cover:
- cache creation
- cache reuse
- API failure fallback
- empty cache + offline error path
- cache refresh behavior if implemented
Possible File/Code Areas
Likely touched areas:
- CLI command handlers for categories and languages
- shared API fetch utilities
- state/config/cache management layer
- tests for command behavior and cache fallback
Testing Checklist
- Fetch categories online and confirm cache file is created
- Fetch languages online and confirm cache file is created
- Simulate API failure and verify cached data is shown
- Run with no network and no cache, verify helpful error
- Verify cache does not mix data across different API base URLs
- Verify output messaging is clear and non-confusing