NESC: Robust Neural End-2-End Speech Coding with GANs.

FhG_IIS

Authors: Nicola Pia, Kishan Gupta, Srikanth Korse, Markus Multrus, Guillaume Fuchs.

Abstract: Neural networks have proven to be a formidable tool to tackle the problem of speech coding at very low bit rates. However, the design of a robust neural coder that can be operated robustly under real-world conditions remains a major challenge. Therefore, we present Neural End-2-End Speech Codec (NESC) a robust, scalable end-to-end neural speech codec for high-quality wideband speech coding at 3 kbps. The encoder uses a new architecture configuration, which relies on our proposed DualPathConvRNN layer, while the decoder architecture is based on our previous work SSMGAN. Our subjective listening tests show that NESC is particularly robust to unseen conditions and signal perturbations.

Preprint: Arxiv, accepted at INTERSPEECH 2022


For this demo:

Conditions of Use.


CMU Speakers - noisy speech

Original

NESC 3kbit/s (ours)

SSMGAN 1.6kbit/s

EVS 5.9kbit/s


CMU Speakers - clean speech

Original

NESC 3kbit/s (ours)

SSMGAN 1.6kbit/s

EVS 5.9kbit/s




Conditions of Use: