Conference site » Proceedings

Training machine learning models faster with Dask

Joesph Holt
University of Wisconsin–Madison

Scott Sievert
University of Wisconsin–Madison

Abstract

Machine learning (ML) relies on stochastic algorithms, all of which rely on gradient approximations with \textquotedbl{}batch size\textquotedbl{} examples. Growing the batch size as the optimization proceeds is a simple and usable method to reduce the training time, provided that the number of workers grows with the batch size. In this work, we provide a package that trains PyTorch models on Dask clusters, and can grow the batch size if desired. Our simulations indicate that for a particular model that uses GPUs for a popular image classification task, the training time can be reduced from about 120 minutes with standard SGD to 45 minutes with a variable batch size method.

Keywords

machine learning, model training, distributed computation

DOI

10.25080/majora-1b6fd038-011

Bibtex entry

Full text PDF