Optimize Record History System
Current System Analysis
Your current record history system uses 5 tables:
-
RecordHistory- Main change tracking -
RecordFieldHistory- Field-level details -
RecordVersion- Lightweight version tracking -
RecordSnapshot- Periodic snapshots -
RecordRestore- Restoration operations
Problems Identified:
- High storage overhead (5x data duplication)
- Complex query patterns across multiple tables
- Performance degradation with large datasets
- Maintenance complexity
Proposed Solution: Unified History System
Phase 1: Schema Consolidation
1.1 Merge RecordHistory + RecordFieldHistory
-- New unified RecordHistory table
CREATE TABLE record_history_v2 (
uid UUID PRIMARY KEY DEFAULT gen_random_uuid(),
record_id UUID NOT NULL REFERENCES record(uid),
version_number INTEGER NOT NULL CHECK (version_number >= 1),
change_type VARCHAR(20) NOT NULL,
change_source VARCHAR(20) NOT NULL,
changed_by UUID NOT NULL REFERENCES user(id),
-- Unified field changes (JSON)
field_changes JSONB NOT NULL DEFAULT '{}',
-- Format: {
-- "title": {"old": "Old Title", "new": "New Title", "reason": "Clarification"},
-- "description": {"old": "...", "new": "...", "reason": "Added details"}
-- }
-- Complete record snapshot (optional for major versions)
full_snapshot JSONB,
-- Metadata
change_reason VARCHAR(500),
validation_errors JSONB,
change_metadata JSONB,
-- Timestamps
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Constraints
UNIQUE(record_id, version_number)
);
1.2 Simplify RecordVersion
-- Lightweight version tracking
CREATE TABLE record_version_v2 (
record_id UUID PRIMARY KEY REFERENCES record(uid),
current_version INTEGER NOT NULL DEFAULT 1,
total_changes INTEGER NOT NULL DEFAULT 0,
last_changed_by UUID REFERENCES user(id),
last_change_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Statistics
user_edits INTEGER DEFAULT 0,
admin_edits INTEGER DEFAULT 0,
system_updates INTEGER DEFAULT 0,
-- Performance optimization
last_major_version INTEGER DEFAULT 1,
last_snapshot_at TIMESTAMPTZ
);
1.3 Replace RecordSnapshot with Smart Snapshots
-- Periodic snapshots (only for major versions)
CREATE TABLE record_major_snapshot (
uid UUID PRIMARY KEY DEFAULT gen_random_uuid(),
record_id UUID NOT NULL REFERENCES record(uid),
version_number INTEGER NOT NULL,
full_data JSONB NOT NULL,
snapshot_type VARCHAR(20) NOT NULL, -- 'major', 'quarterly', 'pre_review'
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(record_id, version_number)
);
Phase 2: PostgreSQL Native Versioning Integration
2.1 Implement Temporal Tables
-- Enable temporal tracking on record table
ALTER TABLE record
ADD COLUMN valid_from TIMESTAMPTZ NOT NULL DEFAULT NOW(),
ADD COLUMN valid_to TIMESTAMPTZ DEFAULT 'infinity';
-- Create temporal history view
CREATE OR REPLACE VIEW record_temporal AS
SELECT r.*,
rh.version_number,
rh.change_type,
rh.changed_by,
rh.created_at as version_created_at
FROM record r
LEFT JOIN record_history_v2 rh ON r.uid = rh.record_id
WHERE r.valid_to = 'infinity';
2.2 Use PostgreSQL Generated Columns
-- Auto-calculate version metadata
ALTER TABLE record_version_v2
ADD COLUMN change_frequency INTEGER GENERATED ALWAYS AS (
total_changes / EXTRACT(EPOCH FROM (NOW() - created_at)) / 86400
) STORED;
Phase 3: Archiving Strategy
3.1 Tiered Storage Approach
-- Archive table for old history
CREATE TABLE record_history_archive (
LIKE record_history_v2 INCLUDING ALL
) PARTITION BY RANGE (created_at);
-- Monthly partitions
CREATE TABLE record_history_archive_2024_01
PARTITION OF record_history_archive
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
3.2 Automated Archiving Process
# Archive old records (older than 2 years)
def archive_old_history():
"""Move history records older than 2 years to archive"""
cutoff_date = datetime.now() - timedelta(days=730)
# Move to archive
session.execute("""
INSERT INTO record_history_archive
SELECT * FROM record_history_v2
WHERE created_at < :cutoff
""", {'cutoff': cutoff_date})
# Delete from main table
session.execute("""
DELETE FROM record_history_v2
WHERE created_at < :cutoff
""", {'cutoff': cutoff_date})
Phase 4: Performance Optimizations
4.1 Smart Indexing Strategy
-- Composite indexes for common queries
CREATE INDEX idx_record_history_record_version
ON record_history_v2(record_id, version_number DESC);
CREATE INDEX idx_record_history_record_time
ON record_history_v2(record_id, created_at DESC);
CREATE INDEX idx_record_history_user_time
ON record_history_v2(changed_by, created_at DESC);
-- Partial indexes for recent data
CREATE INDEX idx_record_history_recent
ON record_history_v2(record_id, created_at DESC)
WHERE created_at > NOW() - INTERVAL '90 days';
4.2 Materialized Views for Analytics
-- User activity statistics
CREATE MATERIALIZED VIEW user_edit_stats AS
SELECT
changed_by,
COUNT(*) as total_edits,
COUNT(DISTINCT record_id) as unique_records,
MIN(created_at) as first_edit,
MAX(created_at) as last_edit
FROM record_history_v2
WHERE change_source = 'user_edit'
GROUP BY changed_by;
-- Refresh daily
CREATE OR REPLACE FUNCTION refresh_user_stats()
RETURNS void AS $$
BEGIN
REFRESH MATERIALIZED VIEW CONCURRENTLY user_edit_stats;
END;
$$ LANGUAGE plpgsql;
Phase 5: Migration Strategy
5.1 Gradual Migration Plan
# Step 1: Create new tables alongside existing ones
def create_new_history_tables():
"""Create new optimized history tables"""
# Create record_history_v2, record_version_v2, etc.
pass
# Step 2: Migrate data in batches
def migrate_history_data(batch_size=1000):
"""Migrate existing history data to new structure"""
offset = 0
while True:
# Get batch of old records
old_records = session.query(RecordHistory).offset(offset).limit(batch_size).all()
if not old_records:
break
# Transform and insert into new structure
for record in old_records:
transform_and_migrate(record)
offset += batch_size
session.commit()
# Step 3: Switch over applications
def switch_to_new_history_system():
"""Update application to use new history system"""
# Update models, update queries, test thoroughly
pass
# Step 4: Clean up old tables
def cleanup_old_tables():
"""Remove old history tables after validation"""
# Drop old tables, indexes, etc.
pass
Phase 6: Monitoring & Maintenance
6.1 Health Monitoring
-- Monitor history table growth
CREATE OR REPLACE VIEW history_table_stats AS
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
pg_total_relation_size(schemaname||'.'||tablename) as size_bytes,
(SELECT COUNT(*) FROM record_history_v2) as total_records
FROM pg_tables
WHERE tablename LIKE 'record_history%';
6.2 Automated Maintenance
# Daily maintenance tasks
def daily_history_maintenance():
"""Perform daily maintenance on history system"""
# 1. Archive old records
archive_old_history()
# 2. Update materialized views
refresh_user_stats()
# 3. Vacuum and analyze
session.execute("VACUUM ANALYZE record_history_v2")
# 4. Check table sizes and alert if needed
check_table_sizes()
Expected Benefits
Storage Reduction
- Current: ~500MB per 100K records (5x duplication)
- Optimized: ~150MB per 100K records (70% reduction)
Performance Improvements
- Query Speed: 3-5x faster for common history queries
- Insert Speed: 2x faster for new history records
- Storage I/O: 50% reduction in disk operations
Maintenance Benefits
- Simplified Schema: 3 tables instead of 5
- Easier Debugging: Unified change tracking
- Better Scalability: Archiving prevents unbounded growth
Implementation Timeline
- Week 1-2: Create new schema and migration scripts
- Week 3-4: Data migration and testing
- Week 5: Application updates and switch-over
- Week 6: Performance tuning and monitoring setup
- Week 7-8: Cleanup and documentation
Risk Mitigation
Data Integrity
- Implement comprehensive validation during migration
- Create backup snapshots before major changes
- Use transactional migration to prevent partial updates
Performance Impact
- Test migration on staging environment first
- Monitor query performance during transition
- Implement gradual rollout with feature flags
Rollback Strategy
- Keep old tables until new system is validated
- Create rollback procedures for each phase
- Document all changes for future reference
Success Metrics
Technical Metrics
- Storage usage reduction by 60%+
- Query response time improvement by 3x+
- Migration completion within 8 weeks
Business Metrics
- Zero data loss during migration
- No downtime for end users
- Improved developer experience with simplified schema