English Audio transcription

Wav2Vec2 not transcribing large audios without GPU, switching it with Speech2Text model that uses a convolutional downsampler to reduce the length of speech inputs by 3/4th before they are fed into the encoder.