feat(backend): add skip, skip_reason and category fields to story dataset segments

Description

This merge request adds 3 new optional fields (skip, skip_reason, and category) to the segments model in extracted text validations, applicable only when dataset = story.

New fields added to segments:

  • skip (boolean) — marks whether a segment should be skipped
  • skip_reason (enum) — "Unclear page" or "Other"
  • category (enum) — "poem", "story", "interview", "article"

Validation rule: if skip = true, at least one of skip_reason or category must be present.

An Alembic migration is included to add the 3 new columns to the extractedtext table in the database.


Checklist

  • The feature has been fully implemented.
  • Tests for the new feature are included and passing.
  • User documentation/guides have been updated (if applicable).
  • Impact on existing functionality has been considered.

Related Issue(s)

Closes #148

Merge request reports

Loading