Bikkumalla Sai Krishna requested to merge document_details into main May 30, 2026

Overview

This MR refactors the GitLab Compliance Checker from a multi-purpose analytics tool into a focused, production-ready project compliance checker. It adds new checks, fixes broken detection, improves the UI, removes all unrelated code, and brings documentation and Docker up to date.

What Changed

New Checks Added

Documentation Files (`docs_checker.py`)

Checks whether each repository contains the 4 required documentation files:

README.md
CONTRIBUTING.md
USER_MANUAL.md
AGENTS.md

Features:

Shows ✅/❌ per file with a warning listing which are missing
Missing files appear in the Suggestions section with fix instructions

Repository Health Files (`docs_checker.py → check_repo_files()`)

Checks for 8 standard repository health files:

.gitignore
.editorconfig
CHANGELOG.md
SECURITY.md
CODE_OF_CONDUCT.md
.env.example
Dockerfile
.dockerignore

Features:

Shows per-file ✅/❌ with a reason for each
Displays a score (X/8 files present)

Pre-commit Hook Analysis (`pipeline_checker.py → check_precommit()`)

Parses .pre-commit-config.yaml and categorises all configured hooks into 5 groups:

Lint
Format
Type Check
Security
Quality

Features:

Displays like the CI pipeline analysis
Columns per category with detected hook names and quality issues (🔴/🟡)

Score Breakdown Expander

Collapsible section below the summary metrics showing all 8 scoring checks:

Each check shows:
- ✅/❌
- Label
- +12.5% or +0%
- What the checker looks for
Appears in both single and batch project views

Bugs Fixed

Project type always "Unknown" → Quality & Tools showing nothing

Root cause:

_get_paginated passes params as **kwargs to gl.paginate(); when params included "path": "" it conflicted with glabflow internals and silently returned an empty list, so filenames was always [] and detect_project_type always returned "Unknown".

Fix:

Replaced the unreliable tree-listing approach with direct file probing using _get():

Tries pyproject.toml
Tries requirements.txt
Tries package.json
Tries tsconfig.json directly at root
If none found, fetches root tree once and probes one subdirectory level

Coverage not detected

Root cause:

The test job ran:

pytest -v --cov=. -n auto

The tool list had:

pytest --cov (exact string)
coverage (whole word)

Neither matched --cov=.

Fix:

Added "--cov" to coverage tools
Changed contains_tool() to use plain substring matching for flag-style tools (those starting with -) since \b word boundaries don't work before dashes

CI type_check stage not detected

Root cause:

Repo uses type-check (hyphen) in .gitlab-ci.yml but EXPECTED_STAGES uses type_check (underscore); stage names never matched.

Fix:

Normalise all stage names extracted from the CI file by replacing:

- → _

ruff not detected as a format tool

Root cause:

ruff was only in the lint tools list; ruff format is a valid formatter.

Fix:

Added ruff to:

STAGE_TOOLS["format"]

Tool Detection Expanded (50+ tools added)

Stage	Added
lint	pyflakes, pycodestyle, pydocstyle, prospector, semgrep, pylama, wemake, autoflake, tslint, oxlint, shellcheck, hadolint, yamllint, jsonlint, markdownlint, tflint, golangci-lint, staticcheck, sonarqube, codeclimate, megalinter
format	autopep8, yapf, blue, pyupgrade, shfmt, gofmt, rustfmt, ktlint, spotless, google-java-format
type_check	pytype, pyre, beartype, typeguard, ty, sorbet
test	nose, nose2, doctest, hypothesis, behave, robot, jasmine, karma, qunit, tape, testcafe, puppeteer, go test, cargo test
coverage	--coverage, --cov-report, coveralls, codecov, lcov, gcov, jacoco, cobertura, simplecov
security	detect-secrets, git-secrets, secretlint, trivy, checkov, safety, snyk, osv-scanner

Also:

Python quality tools now also checked in CI yaml (not only in pyproject/pre-commit)
JS tools checked in CI yaml in addition to package.json

UI Improvements

Single project — tabbed → single scrollable page

Replaced the 7-tab layout with one scrollable page.

Section order:

Metadata
Documentation Files
Repository Health Files
Quality & Tools
Security
Testing
Automation & CI
Suggestions

Batch projects — flat dataframe → bordered cards

Each project now renders as a st.container(border=True) card with:

Row 1

Score colour icon:
- 🟢 ≥75%
- 🟡 ≥40%
- 🔴 <40%
Project name
Branch analysed
Score %
Detected stack

Row 2

5 metric badges:

AGPLv3
Security
Coverage
CI/CD
Pre-commit

Rendered as st.metric columns.

Expandable Report

Expandable:

📋 View Full Compliance Report

renders:

render_project_compliance_details()

identical to the single project view.

Branch selection in batch analysis

Text input for branch name (default: main) shown above the Run button
Each project checks if the selected branch exists
Falls back to that project's default branch if not
The branch actually used is shown on each project card

Removed — Non-Compliance Code

Features removed entirely

User Profile Overview mode
Team Leaderboard mode
Batch MR/Issue Analytics mode (the old services/batch/ system)
Sidebar mode selector (app goes straight to compliance on load)

Source files deleted (26 files)

infrastructure/gitlab/

batch
commits
config
description_quality
files_reader
groups
issues
merge_requests
network
parse_uvlock
retry_helper
users

services/batch/

Entire folder (8 files)

services/issues/

Entire folder

services/profile/

Entire folder

services/compliance/

classification.py
compliance_checks.py (unused wrappers)

ui/

batch.py
issues.py
leaderboard.py
profile.py

Dead code removed from client.py (~300 lines)

Removed:

_ZERO_ROW
_ZERO_ISSUE_ROW
_evaluate_single_mr()
_fetch_user_mrs()
_batch_evaluate_mrs_async()
batch_evaluate_mrs()
_evaluate_single_issue()
_fetch_user_issues()
_batch_evaluate_issues_async()
batch_evaluate_issues()
_evaluate_single_mr_efficiently()
_evaluate_single_issue_efficiently()
batch_evaluate_mrs_efficiently()
batch_evaluate_issues_efficiently()

Unused imports removed:

re
datetime
timezone

Test files deleted (30 files)

All tests for removed features:

batch
MR
profile
leaderboard
issues
utility tests for deleted modules

Scripts deleted

diagnose_review.py
fix_imports_v2.py
generate_report.py
parse_uvlock.py
verify_batch_users.py
verify_contribution_fix.py
verify_data.py

Assets deleted

assets/badges/

13 SVGs

assets/*.png

9 images

Only referenced by deleted UI files.

Docker & Infrastructure

Dockerfile (updated)

Upgraded base image from python:3.11-slim → python:3.13-slim
Multi-stage build:
- Builder stage installs deps
- Runtime stage copies only .venv + app source
- No build tools in final image
Copies only:
- app.py
- src/
Instead of:
```
COPY . .
```
Added:
- PYTHONUNBUFFERED
- PYTHONDONTWRITEBYTECODE
- Streamlit env vars (headless, no telemetry)
Added HEALTHCHECK hitting:

/_stcore/health

every 30s

docker-compose.yml (new)

Reads credentials from .env automatically via env_file
restart: unless-stopped
Same health check as Dockerfile
One-command startup:

docker compose up --build

Documentation

File	Change
README.md	Full rewrite — added Table of Contents, How It Works flow diagram, Compliance Score section, full project structure tree, Docker Compose quick start
CONTRIBUTING.md	Updated tool paths, removed obsolete babel/messages.pot references, fixed vulture path to src/
CHANGELOG.md	Added [2.0.0] entry documenting all changes
USER_MANUAL.md	Created — step-by-step user guide covering single/batch analysis, reading the report, troubleshooting
AGENTS.md	Created — AI agent context: architecture, data flow, conventions for adding checks/tools, score formula, test structure, what not to add

feat: add compliance-only checks, fix detection bugs, and improve batch UI

Overview

What Changed

New Checks Added

Documentation Files (docs_checker.py)

Repository Health Files (docs_checker.py → check_repo_files())

Pre-commit Hook Analysis (pipeline_checker.py → check_precommit())

Score Breakdown Expander

Bugs Fixed

Project type always "Unknown" → Quality & Tools showing nothing

Coverage not detected

CI type_check stage not detected

ruff not detected as a format tool

Tool Detection Expanded (50+ tools added)

UI Improvements

Single project — tabbed → single scrollable page

Batch projects — flat dataframe → bordered cards

Row 1

Row 2

Expandable Report

Branch selection in batch analysis

Removed — Non-Compliance Code

Features removed entirely

Source files deleted (26 files)

infrastructure/gitlab/

services/batch/

services/issues/

services/profile/

services/compliance/

ui/

Dead code removed from client.py (~300 lines)

Test files deleted (30 files)

Scripts deleted

Assets deleted

assets/badges/

assets/*.png

Docker & Infrastructure

Dockerfile (updated)

docker-compose.yml (new)

Documentation

Test Plan

Merge request reports

Documentation Files (`docs_checker.py`)

Repository Health Files (`docs_checker.py → check_repo_files()`)

Pre-commit Hook Analysis (`pipeline_checker.py → check_precommit()`)