By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
Research

Do End-to-End Speech Recognition Models Care About Context?

Corti
Do End-to-End Speech Recognition Models Care About Context?

The two most common paradigms for end-to-end speech recognition are connectionist temporal classification (CTC) and attention-based encoder-decoder (AED) models. It has been argued that the latter is better suited for learning an implicit language model. In this paper, this hypothesis is tested by measuring temporal context-sensitivity and it is evaluated how the models perform when the amount of contextual information is constrained in the audio input.