Training machine learning models faster with Dask
Joesph Holt
Scott Sievert
Machine learning (ML) relies on stochastic algorithms, all of which rely on
gradient approximations with \textquotedbl{}batch size\textquotedbl{} examples. Growing the batch size
as the optimization proceeds is a simple and usable method to reduce the
training time, provided that the number of workers grows with the batch
size. In this work, we provide a package that trains PyTorch models on Dask
clusters, and can grow the batch size if desired. Our simulations indicate
that for a particular model that uses GPUs for a popular image
classification task, the training time can be reduced from about 120 minutes
with standard SGD to 45 minutes with a variable batch size method.
machine learning, model training, distributed computation
DOI10.25080/majora-1b6fd038-011