Conference site » Proceedings

datreant: persistent, Pythonic trees for heterogeneous data

David L. Dotson
Arizona State University, Tempe, Arizona, USA

Sean L. Seyler
Arizona State University, Tempe, Arizona, USA

Max Linke
Max Planck Institut für Biophysik, Frankfurt, Germany

Richard J. Gowers
University of Manchester, Manchester, UK
University of Edinburgh, Edinburgh, UK

Oliver Beckstein
Arizona State University, Tempe, Arizona, USA

Video: https://youtu.be/enLHDZoch0U

Abstract

In science the filesystem often serves as a de facto database, with directory trees being the zeroth-order scientific data structure. But it can be tedious and error prone to work directly with the filesystem to retrieve and store heterogeneous datasets. datreant makes working with directory structures and files Pythonic with Treants: specially marked directories with distinguishing characteristics that can be discovered, queried, and filtered. Treants can be manipulated individually and in aggregate, with mechanisms for granular access to the directories and files in their trees. Disparate datasets stored in any format (CSV, HDF5, NetCDF, Feather, etc.) scattered throughout a filesystem can thus be manipulated as meta-datasets of Treants. datreant is modular and extensible by design to allow specialized applications to be built on top of it, with MDSynthesis as an example for working with molecular dynamics simulation data. http://datreant.org/

Keywords

data management, science, filesystems

DOI

10.25080/Majora-629e541a-007

Bibtex entry

Full text PDF