Fix: Resolve CI pipeline failures — type checking, linting, and security scan
Summary
The fix/type-checking branch addresses multiple CI pipeline failures across the validate, lint, type-check, security, and test stages. The root cause is a combination of pyrefly type errors, ruff lint violations, bandit security scan findings, and CI jobs missing dev dependency installation.
Problem Statement
The CI pipeline on GitLab was failing across multiple stages:
| Stage | Failure |
|---|---|
| validate |
pre-commit hooks ran without dev dependencies installed |
| lint |
ruff check flagged missing timeout on requests.post call |
| type-check | pyrefly reported errors across endpoints, tasks, services, schemas, backfill scripts, and utility modules |
| security | bandit flagged 34 issues (21 Low, 13 Medium) — OTP using random instead of secrets, missing request timeout, hardcoded tmp paths, subprocess usage, assert usage |
| test | pytest jobs lacked libmagic1 and dev dependencies |
Proposed Solution
1. CI Pipeline Fixes (.gitlab-ci.yml)
- Add
uv sync --frozen --extra devdependenciestobefore_scriptof every job that requires dev tools (validate, lint, security, format, test, pyupgrade) - Install
libmagic1in the pytest job forpython-magicsupport - Expand pyrefly scope from
app/ main.pyto.with--project-excludes tests/ test_async_validation.py
2. Security Fixes
-
B311: Replace
random.randintwithsecrets.randbelowin OTP generation (app/services/otp_service.py) and username suffix generation (app/core/validators.py) — cryptographically secure RNG -
B113: Add
timeout=30torequests.postcall inapp/utils/create_categories.py -
Bandit config: Add
[tool.bandit]skips for false positives: B101 (assert), B106 (bearer token type), B108 (/tmp in Docker), B404/B603 (subprocess for ffmpeg)
3. Type Checking Fixes (pyrefly)
-
SQLModel
col()usage: Replace direct column attribute access withcol(Model.column)across all endpoint files, services, and backfill scripts for proper type inference -
Return type annotations: Add explicit return types to functions in
init_db.py,manage_database.py,manage_user_roles.py,setup_postgresql.py, backfill scripts -
Type narrowing: Add
cast()andis not Noneguards where the type checker cannot infer non-nullability (e.g.,Record.uid,RecordHistory.uid,Record.created_at) -
sa_columnvssa_type: FixARRAY(String)field declarations inrecord_history.py -
~operator: Replace~Model.fieldwithnot Model.fieldandcol(Model.field).isnot(None)for boolean column negation -
Literal types: Replace deprecated
const=TruewithLiteral[...]in validation schemas -
max_items→max_length: Fix deprecated Pydantic v2 field validator args -
RoleEnumstring literals → enum members: Replaceroles=["admin"]withroles=[RoleEnum.admin]across endpoints -
Async → sync:
upload_file_to_hetznerandHetznerStorageClient.upload_filechanged from async to sync (MinIO client is synchronous) -
Metadata type: Change
dict[str, str]todict[str, Any]for Hetzner storage metadata, with string conversion at upload time
4. Schema & Model Fixes
-
TokenResponse.usernamemade optional (str | None) to matchapp/api/auth.pyresponse -
validate_age_from_birthdatereturn type corrected fromdatetostr -
BulkHistoryRequest.max_items→max_length - Added
RoleEnum.systemto model and schema enums
5. Service & Logic Improvements
-
RecordHistoryService: Add null guards forrecord.uid, wraplist()around.exec().all()calls, addcast()forfunc.count()expressions -
StreakService: Fix union subquery ordering, add null guards forRecord.uidandRecordHistory.uid -
StreakService.get_user_activity_timeline→get_daily_activity_counts: Return richer activity data with contributions + edits counts -
reports_backup.py: Fixuser.uid→user.id, add null guards for date comparisons
6. Backfill Module Improvements
- Typed stats dicts with
typealiases (Version0Stats,BackfillStats,ProgressMetadata,CounterMap, etc.) - Safe stat access with
isinstanceguards instead of direct dict mutation -
StatisticsManagerrefactored: separate_counters,_errors,_custom_statsfor type safety - Explicit
Sessiontype annotations on method parameters - Replace
from backfill.core import BaseBackfillerwith direct imports to avoid lazy__getattr__issues
7. Script Fixes
-
seed_database.py: Addexecute_delete()helper for typed delete statements, fixseed_initial_dataimport alias -
setup_postgresql.py: Addget_server_url()guard for unsetDATABASE_URL, add return type annotations
Files Changed
50 files changed, ~730 insertions, ~438 deletions across:
-
.gitignore,.gitlab-ci.yml,pyproject.toml,uv.lock -
app/api/v1/endpoints/(6 files) -
app/core/(2 files) -
app/models/(6 files) -
app/schemas/(6 files) -
app/services/(3 files) -
app/tasks/(3 files) -
app/utils/(2 files) -
backfill/(8 files) - Root scripts:
init_db.py,manage_database.py,manage_user_roles.py,setup_postgresql.py,scripts/seed_database.py
Acceptance Criteria
-
All CI stages pass (validate, format, lint, type-check, security, test) -
Bandit scan returns 0 issues (exit code 0) -
Ruff lint returns 0 errors -
Pyrefly type check returns 0 errors -
Pytest passes with ≥70% coverage (actual: 72.47%) -
1716 tests pass, 254 skipped
Risks & Considerations
-
col()migration: Changes how SQLModel column references are typed — no runtime behavior change, but worth monitoring in production queries -
upload_file_to_hetznersync change: Callers thatawaited this function will need updating (onlyrecords.pyendpoint calls it, already updated) -
RoleEnum.system: New enum value — ensure no downstream consumers break -
Bandit skips: B101/B106/B108/B404/B603 are skipped via config, not
# noseccomments — these are documented false positives for this codebase context