Refactor: Engine Logic abstraction
Problem Statement
The current ASR engine architecture has a few key issues that make it difficult to maintain and extend:
- Circular Dependency: The SwechaGonthukaEngine (app/asr/engines/swecha_gonthuka.py) imports and uses functions from the asr_service.py module. However, asr_service.py is a higher-level module that is supposed to use the engines. This circular dependency is a significant architectural smell.
- God Object: The asr_service.py module has become a "God object" with too many responsibilities. It contains core transcription logic for the Swecha engine, punctuation restoration, audio conversion, and subtitle generation. This violates the Single Responsibility Principle.
- Inconsistent Engine Implementations: The SwechaGonthukaEngine is not self-contained, while the WhisperEngine is. This inconsistency makes it harder to reason about the system and add new engines in the future.
Solution
The proposed refactoring will address these issues by improving the separation of concerns and making the ASR engine architecture more modular and maintainable.
- Make Engines Self-Contained: All engine-specific logic will be moved into the respective engine's module.
- Introduce a Base Engine Abstraction: A formal abstract base class will define a clear contract for all ASR engines.
- Refactor the Service Layer: The asr_service.py module will be slimmed down to a true service layer, responsible only for cross-cutting concerns.
- Decompose main.py: Logic from main.py will be moved into more specific service modules.
Commits
A detailed implementation plan, broken down into the tiniest commits possible:
-
feat(engine): Make SwechaGonthukaEngine self-contained
- Move the ASR pipeline loading logic (get_asr_pipeline) from asr_service.py into app/asr/engines/swecha_gonthuka.py as a private method of the SwechaGonthukaEngine class.
- Move the transcription logic (_swecha_transcribe_audio and _swecha_transcribe_pcm16) from asr_service.py into the SwechaGonthukaEngine class as transcribe and transcribe_pcm16 methods.
- Update app/asr/router.py to call the new methods on the SwechaGonthukaEngine instance.
- Remove the circular import from swecha_gonthuka.py.
-
refactor(engine): Introduce AsrEngine abstract base class
- In app/asr/types.py, define an AsrEngine abstract base class using abc.ABC.
- Define the abstract methods transcribe, transcribe_pcm16, name, and supported_languages.
- Make SwechaGonthukaEngine and WhisperEngine inherit from AsrEngine and implement the abstract methods.
-
refactor(service): Slim down asr_service.py
- Remove the now-unused _swecha_transcribe_audio and _swecha_transcribe_pcm16 functions from asr_service.py.
- Verify that asr_service.py no longer contains any engine-specific transcription logic.
-
refactor(router): Decouple engine configuration
- Modify the WhisperEngine to accept its supported languages in the constructor.
- In app/asr/router.py, when creating the WhisperEngine instance, pass the list of supported languages from the configuration.
- Remove the line in ModelRouter that modifies self._whisper.supported_languages.
-
feat(service): Create JobManager service
- Create a new file app/services/job_manager.py.
- Move the job management logic (creating jobs, polling status, etc.) from app/main.py into a JobManager class in the new file.
- Update app/main.py to use the JobManager service.
-
feat(service): Create StreamingService
- Create a new file app/services/streaming_service.py.
- Move the WebSocket streaming logic from app/main.py into a StreamingService class in the new file.
- Update app/main.py to use the StreamingService.
Decision Document
- Modules to be modified:
- app/asr/engines/swecha_gonthuka.py
- app/asr/engines/whisper.py
- app/asr/router.py
- app/asr_service.py
- app/main.py
- app/asr/types.py
- New modules to be created:
- app/services/job_manager.py
- app/services/streaming_service.py
- Architectural Decisions:
- Engines are self-contained and responsible for their own model loading and inference.
- A clear AsrEngine interface is established.
- The asr_service.py acts as a service layer for cross-cutting concerns.
- main.py is a thin API layer that delegates to dedicated services.
Testing Decisions
- What makes a good test: Tests should focus on the external behavior of the components, not the implementation details.
- Modules to be tested:
- SwechaGonthukaEngine: Unit tests with mocked pipelines to verify the transcription logic.
- WhisperEngine: Existing tests should be reviewed and updated if necessary.
- ModelRouter: Unit tests to verify the routing logic.
- JobManager: Unit tests for the job management logic.
- StreamingService: Integration tests for the WebSocket streaming logic.
- Prior art for the tests: The existing test suite in the tests/ directory should be used as a reference.
Out of Scope
- Changing the ASR models themselves.
- Modifying the API contracts.
- Changing the deployment process.