Skip to content

feat: add compliance-only checks, fix detection bugs, and improve batch UI

Bikkumalla Sai Krishna requested to merge document_details into main

Overview

This MR refactors the GitLab Compliance Checker from a multi-purpose analytics tool into a focused, production-ready project compliance checker. It adds new checks, fixes broken detection, improves the UI, removes all unrelated code, and brings documentation and Docker up to date.


What Changed

New Checks Added

Documentation Files (docs_checker.py)

Checks whether each repository contains the 4 required documentation files:

  • README.md
  • CONTRIBUTING.md
  • USER_MANUAL.md
  • AGENTS.md

Features:

  • Shows / per file with a warning listing which are missing
  • Missing files appear in the Suggestions section with fix instructions

Repository Health Files (docs_checker.py → check_repo_files())

Checks for 8 standard repository health files:

  • .gitignore
  • .editorconfig
  • CHANGELOG.md
  • SECURITY.md
  • CODE_OF_CONDUCT.md
  • .env.example
  • Dockerfile
  • .dockerignore

Features:

  • Shows per-file / with a reason for each
  • Displays a score (X/8 files present)

Pre-commit Hook Analysis (pipeline_checker.py → check_precommit())

Parses .pre-commit-config.yaml and categorises all configured hooks into 5 groups:

  • Lint
  • Format
  • Type Check
  • Security
  • Quality

Features:

  • Displays like the CI pipeline analysis
  • Columns per category with detected hook names and quality issues (🔴/🟡)

Score Breakdown Expander

Collapsible section below the summary metrics showing all 8 scoring checks:

  • Each check shows:
    • /
    • Label
    • +12.5% or +0%
    • What the checker looks for
  • Appears in both single and batch project views

Bugs Fixed

Project type always "Unknown" → Quality & Tools showing nothing

Root cause:

_get_paginated passes params as **kwargs to gl.paginate(); when params included "path": "" it conflicted with glabflow internals and silently returned an empty list, so filenames was always [] and detect_project_type always returned "Unknown".

Fix:

Replaced the unreliable tree-listing approach with direct file probing using _get():

  • Tries pyproject.toml
  • Tries requirements.txt
  • Tries package.json
  • Tries tsconfig.json directly at root
  • If none found, fetches root tree once and probes one subdirectory level

Coverage not detected

Root cause:

The test job ran:

pytest -v --cov=. -n auto

The tool list had:

  • pytest --cov (exact string)
  • coverage (whole word)

Neither matched --cov=.

Fix:

  • Added "--cov" to coverage tools
  • Changed contains_tool() to use plain substring matching for flag-style tools (those starting with -) since \b word boundaries don't work before dashes

CI type_check stage not detected

Root cause:

Repo uses type-check (hyphen) in .gitlab-ci.yml but EXPECTED_STAGES uses type_check (underscore); stage names never matched.

Fix:

Normalise all stage names extracted from the CI file by replacing:

- → _

ruff not detected as a format tool

Root cause:

ruff was only in the lint tools list; ruff format is a valid formatter.

Fix:

Added ruff to:

STAGE_TOOLS["format"]

Tool Detection Expanded (50+ tools added)

Stage Added
lint pyflakes, pycodestyle, pydocstyle, prospector, semgrep, pylama, wemake, autoflake, tslint, oxlint, shellcheck, hadolint, yamllint, jsonlint, markdownlint, tflint, golangci-lint, staticcheck, sonarqube, codeclimate, megalinter
format autopep8, yapf, blue, pyupgrade, shfmt, gofmt, rustfmt, ktlint, spotless, google-java-format
type_check pytype, pyre, beartype, typeguard, ty, sorbet
test nose, nose2, doctest, hypothesis, behave, robot, jasmine, karma, qunit, tape, testcafe, puppeteer, go test, cargo test
coverage --coverage, --cov-report, coveralls, codecov, lcov, gcov, jacoco, cobertura, simplecov
security detect-secrets, git-secrets, secretlint, trivy, checkov, safety, snyk, osv-scanner

Also:

  • Python quality tools now also checked in CI yaml (not only in pyproject/pre-commit)
  • JS tools checked in CI yaml in addition to package.json

UI Improvements

Single project — tabbed → single scrollable page

Replaced the 7-tab layout with one scrollable page.

Section order:

  1. Metadata
  2. Documentation Files
  3. Repository Health Files
  4. Quality & Tools
  5. Security
  6. Testing
  7. Automation & CI
  8. Suggestions

Batch projects — flat dataframe → bordered cards

Each project now renders as a st.container(border=True) card with:

Row 1

  • Score colour icon:
    • 🟢 ≥75%
    • 🟡 ≥40%
    • 🔴 <40%
  • Project name
  • Branch analysed
  • Score %
  • Detected stack

Row 2

5 metric badges:

  • AGPLv3
  • Security
  • Coverage
  • CI/CD
  • Pre-commit

Rendered as st.metric columns.

Expandable Report

Expandable:

📋 View Full Compliance Report

renders:

render_project_compliance_details()

identical to the single project view.

Branch selection in batch analysis

  • Text input for branch name (default: main) shown above the Run button
  • Each project checks if the selected branch exists
  • Falls back to that project's default branch if not
  • The branch actually used is shown on each project card

Removed — Non-Compliance Code

Features removed entirely

  • User Profile Overview mode
  • Team Leaderboard mode
  • Batch MR/Issue Analytics mode (the old services/batch/ system)
  • Sidebar mode selector (app goes straight to compliance on load)

Source files deleted (26 files)

infrastructure/gitlab/

  • batch
  • commits
  • config
  • description_quality
  • files_reader
  • groups
  • issues
  • merge_requests
  • network
  • parse_uvlock
  • retry_helper
  • users

services/batch/

Entire folder (8 files)

services/issues/

Entire folder

services/profile/

Entire folder

services/compliance/

  • classification.py
  • compliance_checks.py (unused wrappers)

ui/

  • batch.py
  • issues.py
  • leaderboard.py
  • profile.py

Dead code removed from client.py (~300 lines)

Removed:

  • _ZERO_ROW
  • _ZERO_ISSUE_ROW
  • _evaluate_single_mr()
  • _fetch_user_mrs()
  • _batch_evaluate_mrs_async()
  • batch_evaluate_mrs()
  • _evaluate_single_issue()
  • _fetch_user_issues()
  • _batch_evaluate_issues_async()
  • batch_evaluate_issues()
  • _evaluate_single_mr_efficiently()
  • _evaluate_single_issue_efficiently()
  • batch_evaluate_mrs_efficiently()
  • batch_evaluate_issues_efficiently()

Unused imports removed:

  • re
  • datetime
  • timezone

Test files deleted (30 files)

All tests for removed features:

  • batch
  • MR
  • profile
  • leaderboard
  • issues
  • utility tests for deleted modules

Scripts deleted

  • diagnose_review.py
  • fix_imports_v2.py
  • generate_report.py
  • parse_uvlock.py
  • verify_batch_users.py
  • verify_contribution_fix.py
  • verify_data.py

Assets deleted

assets/badges/

13 SVGs

assets/*.png

9 images

Only referenced by deleted UI files.


Docker & Infrastructure

Dockerfile (updated)

  • Upgraded base image from python:3.11-slimpython:3.13-slim
  • Multi-stage build:
    • Builder stage installs deps
    • Runtime stage copies only .venv + app source
    • No build tools in final image
  • Copies only:
    • app.py
    • src/
  • Instead of:
    COPY . .
  • Added:
    • PYTHONUNBUFFERED
    • PYTHONDONTWRITEBYTECODE
    • Streamlit env vars (headless, no telemetry)
  • Added HEALTHCHECK hitting:
/_stcore/health

every 30s

docker-compose.yml (new)

  • Reads credentials from .env automatically via env_file
  • restart: unless-stopped
  • Same health check as Dockerfile
  • One-command startup:
docker compose up --build

Documentation

File Change
README.md Full rewrite — added Table of Contents, How It Works flow diagram, Compliance Score section, full project structure tree, Docker Compose quick start
CONTRIBUTING.md Updated tool paths, removed obsolete babel/messages.pot references, fixed vulture path to src/
CHANGELOG.md Added [2.0.0] entry documenting all changes
USER_MANUAL.md Created — step-by-step user guide covering single/batch analysis, reading the report, troubleshooting
AGENTS.md Created — AI agent context: architecture, data flow, conventions for adding checks/tools, score formula, test structure, what not to add

Test Plan

  • uv run pytest — 37 tests pass, 0 failures
  • Single project: enter a GitLab URL → fetch branches → run analysis → verify all sections render
  • Verify score breakdown shows correct pass/fail for each of the 8 checks
  • Verify type-check stage in CI yaml is correctly recognised as type_check
  • Verify --cov flag in test script is detected as coverage
  • Verify pre-commit hook analysis shows correct categories and hook names
  • Batch: enter 2+ project URLs, set branch, run → verify cards render with correct metrics
  • Batch: set a non-existent branch → verify fallback to default branch and card shows actual branch used
  • Click "View Full Compliance Report" on a batch card → verify full report renders
  • docker compose up --build → verify app loads at http://localhost:8501

Merge request reports

Loading