pyAudioProcessing: Audio Processing, Feature Extraction, and Machine Learning Modeling
pyAudioProcessing is a Python based library for processing audio data, constructing
and extracting numerical features from audio, building and testing machine learning
models, and classifying data with existing pre-trained audio classification models or
custom user-built models. MATLAB is a popular language of choice for a vast amount of
research in the audio and speech processing domain. On the contrary, Python remains
the language of choice for a vast majority of machine learning research and
functionality. This library contains features built in Python that were originally
published in MATLAB. pyAudioProcessing allows the user to
compute various features from audio files including Gammatone Frequency Cepstral
Coefficients (GFCC), Mel Frequency Cepstral Coefficients (MFCC), spectral features,
chroma features, and others such as beat-based and cepstrum-based features from audio.
One can use these features along with one’s own classification backend or any of the
popular scikit-learn classifiers that have been integrated into pyAudioProcessing.
Cleaning functions to strip unwanted portions from the audio are another offering of the library.
It further contains integrations with other audio functionalities such as frequency and time-series
visualizations and audio format conversions. This software aims to provide
machine learning engineers, data scientists, researchers, and students with a set of baseline models
to classify audio. The library is available at https://github.com/jsingh811/pyAudioProcessing
and is under GPL-3.0 license.
pyAudioProcessing, audio processing, audio data, audio classification, audio feature extraction, gfcc, mfcc, spectral features, spectrogram, chroma
DOI10.25080/majora-212e5952-017