Skip to content

feat: add Grafana and Loki logging stack for persistent log monitoring

Summary

  • Added Loki (log store), Promtail (log shipper), and Grafana (dashboard) as Docker Compose services for persistent log monitoring
  • Promtail scrapes all container logs via the Docker socket using docker_sd_configs — no changes needed to existing services
  • Each log stream is labelled with service, container, logstream, and level (parsed from JSON lines) for fast Loki queries
  • Grafana auto-provisioned with Loki as the default datasource on startup
  • app/core/logging_config.py refactored to emit structured JSON lines via JsonFormatter (controlled by LOG_FORMAT env var, defaults to json). Plain-text format available via LOG_FORMAT=text
  • Loki configured with 31-day log retention and filesystem storage
  • Dev: monitoring services run under local/development profile (same as pgadmin/minio) — opt-in via --profile local
  • Prod (docker-compose.prod.yml): Loki and Grafana bound to 127.0.0.1 only for security

New files

  • monitoring/loki/loki-config.yml — Loki single-binary, filesystem storage, 31-day retention
  • monitoring/promtail/promtail-config.yml — Docker SD scrape + JSON pipeline stage
  • monitoring/grafana/provisioning/datasources/loki.yml — auto-provision Loki datasource

Access (dev)

Service URL
Grafana http://localhost:3000 (admin / admin)
Loki API http://localhost:3100

Test plan

  • docker compose --profile local up -d loki promtail grafana starts all 3 services
  • GET /ready on Loki → ready
  • Grafana health check → { "database": "ok" }
  • Loki labels API returns service, container, job, logstream
  • {job="corpus-backend"} query returns logs from all services (redis, app, celery, etc.)
  • {service="app"} query returns app container logs
  • LOG_FORMAT=text produces plain-text log output
  • docker compose --profile local stop loki promtail grafana stops cleanly without affecting main stack

Checklist

  • Code follows project API guidelines
  • Documentation is updated (OpenAPI docs unaffected — infra-only change)
  • Code adheres to project coding standards

Closes #125

Merge request reports

Loading