Conference site ยป Proceedings

signac: A Python framework for data and workflow management

Vyas Ramasubramani
Department of Chemical Engineering, University of Michigan, Ann Arbor

Carl S. Adorf
Department of Chemical Engineering, University of Michigan, Ann Arbor

Paul M. Dodd
Department of Chemical Engineering, University of Michigan, Ann Arbor

Bradley D. Dice
Department of Physics, University of Michigan, Ann Arbor

Sharon C. Glotzer
Department of Chemical Engineering, University of Michigan, Ann Arbor
Department of Materials Science and Engineering, University of Michigan, Ann Arbor
Department of Physics, University of Michigan, Ann Arbor
Biointerfaces Institute, University of Michigan, Ann Arbor

Video: https://youtu.be/CCKQH1M2uR4

Abstract

Computational research requires versatile data and workflow management tools that can easily adapt to the highly dynamic requirements of scientific investigations. Many existing tools require strict adherence to a particular usage pattern, so researchers often use less robust ad hoc solutions that they find easier to adopt. The resulting data fragmentation and methodological incompatibilities significantly impede research. Our talk showcases signac, an open-source Python framework that offers highly modular and scalable solutions for this problem. Named for the Pointillist painter Paul Signac, the framework's powerful workflow management tools enable users to construct and automate workflows that transition seamlessly from laptops to HPC clusters. Crucially, the underlying data model is completely independent of the workflow. The flexible, serverless, and schema-free signac database can be introduced into other workflows with essentially no overhead and no recourse to the signac workflow model. Additionally, the data model's simplicity makes it easy to parse the underlying data without using signac at all. This modularity and simplicity eliminates significant barriers for consistent data management across projects, facilitating improved provenance management and data sharing with minimal overhead.

Keywords

data management, database, data sharing, provenance, computational workflow, hpc

DOI

10.25080/Majora-4af1f417-016

Bibtex entry

Full text PDF