Conference site » Proceedings

cesium: Open-Source Platform for Time-Series Inference

Brett Naul
University of California, Berkeley

Stéfan van der Walt
University of California, Berkeley

Arien Crellin-Quick
University of California, Berkeley

Joshua S. Bloom
Lawrence Berkeley National Laboratory
University of California, Berkeley

Fernando Pérez
Lawrence Berkeley National Laboratory
University of California, Berkeley

Video: https://youtu.be/ZgHGCfwExw0

Abstract

Inference on time series data is a common requirement in many scientific disciplines and internet of things (IoT) applications, yet there are few resources available to domain scientists to easily, robustly, and repeatably build such complex inference workflows: traditional statistical models of time series are often too rigid to explain complex time domain behavior, while popular machine learning packages require already-featurized dataset inputs. Moreover, the software engineering tasks required to instantiate the computational platform are daunting. cesium is an end-to-end time series analysis framework, consisting of a Python library as well as a web front-end interface, that allows researchers to featurize raw data and apply modern machine learning techniques in a simple, reproducible, and extensible way. Users can apply out-of-the-box feature engineering workflows as well as save and replay their own analyses. Any steps taken in the front end can also be exported to a Jupyter notebook, so users can iterate between possible models within the front end and then fine-tune their analysis using the additional capabilities of the back-end library. The open-source packages make us of many use modern Python toolkits, including xarray, dask, Celery, Flask, and scikit-learn.

Keywords

time series, machine learning, reproducible science

DOI

10.25080/Majora-629e541a-004

Bibtex entry

Full text PDF