Audio-Visual Speech Recognition using SciPy

Helge Reikeras; Ben Herbst; Johan du Preez; Herman Engelbrecht

doi:10.25080/Majora-92bf1922-010

Audio-Visual Speech Recognition using SciPy

Helge Reikeras

Ben Herbst
Stellenbosch University

Johan du Preez
Stellenbosch University

Herman Engelbrecht
Stellenbosch University

Abstract

In audio-visual automatic speech recognition (AVASR) both acoustic and visual modalities of speech are used to identify what a person is saying. In this paper we propose a basic AVASR system implemented using SciPy, an open source Python library for scientific computing. AVASR research draws from the fields of signal processing, computer vision and machine learning, all of which are active fields of development in the SciPy community. As such, AVASR researchers using SciPy are able to benefit from a wide range of tools available in SciPy.

The performance of the system is tested using the Clemson University audio-visual experiments (CUAVE) database. We find that visual speech information is in itself not sufficient for automatic speech recognition. However, by integrating visual and acoustic speech information we are able to obtain better performance than what is possible with audio-only ASR.

Keywords

speech recognition, machine learning, computer vision, signal processing

DOI

10.25080/Majora-92bf1922-010

Bibtex entry

Full text PDF

Proceedings

Audio-Visual Speech Recognition using SciPy