The Pandata Scalable Open-Source Analysis Stack

James A. Bednar; Martin Durant

doi:10.25080/gerudo-f2bc6f59-00b

The Pandata Scalable Open-Source Analysis Stack

James A. Bednar
Anaconda, Inc.

Martin Durant
Anaconda, Inc.

Abstract

As the scale of scientific data analysis continues to grow, traditional domain-specific tools often struggle with data of increasing size and complexity. These tools also face sustainability challenges due to a relatively narrow user base, a limited pool of contributors, and constrained funding sources. We introduce the Pandata open-source software stack as a solution, emphasizing the use of domain-independent tools at critical stages of the data life cycle, without compromising the depth of domain-specific analyses. This set of interoperable and compositional tools, including Dask, Xarray, Numba, hvPlot, Panel, and Jupyter, provides a versatile and sustainable model for data analysis and scientific computation. Collectively, the Pandata stack covers the landscape of data access, distributed computation, and interactive visualization across any domain or scale. See github.com/panstacks/pandata to get started using this stack or to help contribute to it.

Keywords

distributed computing, data visualization, workflows

DOI

10.25080/gerudo-f2bc6f59-00b

Bibtex entry

Full text PDF

Proceedings

The Pandata Scalable Open-Source Analysis Stack