Research and articles

MultiQT: Multimodal Learning for Real-Time Question Tracking in Speech

May 2020

We address a challenging and practical task of labeling questions in speech in real time during telephone calls to emergency medical services in English, which embeds within a broader decision support system for emergency call-takers. We propose a novel multimodal approach to real-time sequence labelling in speech. Our model treats speech and its own textual representation as two separate modalities or views, as it jointly learns from streamed audio and its noisy transcription into text via automatic speech recognition.

Machine learning as a supportive tool to recognize cardiac arrest in emergency calls

May 2019

Emergency medical dispatchers fail to identify approximately 25% of cases of out of hospital cardiac arrest, thus lose the opportunity to provide the caller instructions in cardiopulmonary resuscitation. We examined whether a machine learning framework could recognize out-of-hospital cardiac arrest from audio files of calls to the emergency medical dispatch center.

BIVA: A Very Deep Hierarchy of Latent Variables for Generative Modelling

February 2019

We introduce the Bidirectional-Inference Variational Autoencoder (BIVA), characterized by a skip-connected generative model and an inference network formed by a bidirectional stochastic inference path. We show that BIVA reaches state-of-the-art test likelihoods, generates sharp and coherent natural images, and uses the hierarchy of latent variables to capture different aspects of the data distribution.

Paving the Product Highway for AI in Healthcare

November 2018

For some time now, people have been talking about the coming AI technologies that will revolutionize society and leave no line of work unaffected. Yet, so far, disappointingly few applications have actually seen the light of day. And those who have might seem rather underwhelming in scope. Where did the future go?

The Orb: How Design Breathes Life Into AI

November 2018

Even from outside of the Tom Rossau flagship store, one is mesmerized by the warm light emanating from numerous circular shapes. Clusters of sculptural lamps populate walls and ceilings like flotillas of gently glowing jellyfish, conveying a singular sense of design drawing equally on geometry, woodcarving, and origami.

On the Inductive Bias of Word-Character-Level Multi-Task Learning for Speech Recognition

November 2018

End-to-end automatic speech recognition (ASR) commonly transcribes audio signals into sequences of characters while its performance is evaluated by measuring the word-error rate (WER). This suggests that predicting sequences of words directly may be helpful instead.

Alexa, why don't you understand me?

October 2018

In January 2017, a morning show on San Diego’s CW6 News covered a story on how a little girl from Dallas, Texas, accidentally ordered a $300 doll house and four pounds of sugar cookies by asking the family’s Amazon Alexa if it wanted to play dollhouse. The purpose of the show was to discuss a new set of issues that consumers were facing, as these voice-based assistants had made their entry into our homes.

Exploiting Nontrivial Connectivity for Automatic Speech Recognition

September 2018

We tested the effectiveness of three neural network architectures commonly used in image recognition for automatic speech recognition. These architectures: Residual Networks, Highway Networks, and Densely Connected Networks, all use nontrivial connections or skip connections. This allows networks with a very large number of layers to be trained without suffering from the vanishing gradient problem.

Utilizing Domain Knowledge in End-to-End Audio Processing

September 2018

We performed an exploratory study into improving end-to-end audio classification models. By introducing the intermediary regression task of approximating mel-spectrograms, we were able to classify raw waveform and mel-spectrogram input with equal accuracy. In future experiments we aim to fine-tune the end-to-end classification model to outperform models trained on hand-crafted features.

Improving Pre-Hospital Care For Language Minorities Using Machine Learning

July 2018

Pre-hospital emergency care should be of the same quality for all citizens. Similarly, benefits from advances in machine learning should be spread equally across society. Unfortunately, neither is the case. Language barriers, for instance, limit the quality of pre-hospital care given to language minorities, and algorithmic bias has already led to harm of specific societal groups.

CTC Networks and Language Models: Prefix Beam Search Explained

January 2018

Automatic speech recognition (ASR) is one of the most difficult tasks in natural language processing. Traditionally it has been necessary to break down the process into a series of subtasks such as speech segmentation, acoustic modelling, and language modelling. Each of these subtasks was then solved by separate, individually trained models.

Encrypt your Machine Learning

January 2018

Emergency medical dispatchers fail to identify approximately 25% of cases of out of hospital cardiac arrest, thus lose the opportunity to provide the caller instrWe have a pretty good understanding of the application of machine learning and cryptography as a security concept, but when it comes to combining the two, things become a bit nebulous and we enter fairly untraveled wilderness.

Get started with Corti

Corti integrates seamlessly with your existing phone systems, recording calls without disrupting or interfering with the signal.