Audio-Visual Speech Recognition using SciPy
Helge Reikeras
Ben Herbst
Johan du Preez
Herman Engelbrecht
In audio-visual automatic speech recognition (AVASR) both acoustic and
visual modalities of speech are used to identify what a person is saying. In
this paper we propose a basic AVASR system implemented using SciPy, an open
source Python library for scientific computing. AVASR research draws from
the fields of signal processing, computer vision and machine learning, all
of which are active fields of development in the SciPy community. As such,
AVASR researchers using SciPy are able to benefit from a wide range of tools
available in SciPy.
The performance of the system is tested using the Clemson University
audio-visual experiments (CUAVE) database. We find that visual speech
information is in itself not sufficient for automatic speech
recognition. However, by integrating visual and acoustic speech information
we are able to obtain better performance than what is possible with
audio-only ASR.
speech recognition, machine learning, computer vision, signal processing
DOI10.25080/Majora-92bf1922-010