Towards general and flexible audio source separation and transcription
With the advent of deep-learning-based methods, audio source separation has seen a resurgence of interest and success. I will give an overview of techniques developed at MERL towards the goal of robustly and flexibly decomposing, analyzing, and transcribing an acoustic scene. In particular, I will describe our efforts to extend our early speech separation and enhancement methods to more challenging environments, and to more general and less supervised scenarios.
Keywords: audio source separation, deep learning, speech enhancement, total transcription, end-to-end automatic speech recognition