Conference site ยป Proceedings

vak: a neural network framework for researchers studying animal acoustic communication

David Nicholson
Independent researcher, Baltimore, Maryland, USA

Yarden Cohen
Weizmann Institute of Science, Rehovot, Israel

Abstract

How is speech like birdsong? What do we mean when we say an animal learns their vocalizations? Questions like these are answered by studying how animals communicate with sound. As in many other fields, the study of acoustic communication is being revolutionized by deep neural network models. These models enable answering questions that were previously impossible to address, in part because the models automate analysis of very large datasets. Acoustic communication researchers have developed multiple models for similar tasks, often implemented as research code with one of several libraries, such as Keras and Pytorch. This situation has created a real need for a framework that allows researchers to easily benchmark multiple models, and test new models, with their own data. To address this need, we developed vak (https://github.com/vocalpy/vak), a neural network framework designed for acoustic communication researchers. (\textquotedbl{}vak\textquotedbl{} is pronounced like \textquotedbl{}talk\textquotedbl{} or \textquotedbl{}squawk\textquotedbl{} and was chosen for its similarity to the Latin root voc, as in \textquotedbl{}vocal\textquotedbl{}.) Here we describe the design of the vak, and explain how the framework makes it easy for researchers to apply neural network models to their own data. We highlight enhancements made in version 1.0 that significantly improve user experience with the library. To provide researchers without expertise in deep learning access to these models, vak can be run via a command-line interface that uses configuration files. Vak can also be used directly in scripts by scientist-coders. To achieve this, vak adapts design patterns and an API from other domain-specific PyTorch libraries such as torchvision, with modules representing neural network operations, models, datasets, and transformations for pre- and post-processing. vak also leverages the Lightning library as a backend, so that vak developers and users can focus on the domain. We provide proof-of-concept results showing how vak can be used to test new models and compare existing models from multiple model families. In closing we discuss our roadmap for development and vision for the community of users.

Keywords

animal acoustic communication, bioacoustics, neural networks

DOI

10.25080/gerudo-f2bc6f59-008

Bibtex entry

Full text PDF