[Feature] Add skip, skip_reason and category fields to segments when dataset type is story
🚀 Feature Request
Is your feature request related to a problem? Please describe.
When dataset = story, proofreaders have no way to mark a segment as skippable or categorize the type of content in a segment. This makes it difficult to flag unclear pages and properly categorize story content (poem, interview, article etc.) during the proofreading process, affecting overall corpus quality.
Describe the solution you'd like
Add 3 new optional fields to the segments model in extracted text validations, but ONLY when dataset = story:
- skip — boolean (optional) Marks whether this segment should be skipped.
- skip_reason — enum string (optional) Valid values:
- "Unclear page"
- "Other"
- category — enum string (optional) Valid values:
- "poem"
- "story"
- "interview"
- "article"
Validation Rule: If skip = true → at least one of skip_reason OR category must be present. Both can be present too.
{ "skip": true, "skip_reason": "Unclear page" }
Describe alternatives you've considered
- Add fields to all dataset types — rejected because skip and category are only meaningful for story datasets.
- Make both skip_reason and category required when skip=true — rejected because either one alone is sufficient to explain the reason for skipping.
Additional Context
- Changes are limited to backend extracted text validations
- All 3 fields are optional by default
- Validation constraint applies only when dataset = story
- Existing records without these fields must remain valid
- Branch: feat/story-segment-skip-and-category-fields