Conference site ยป Proceedings

Design of a Scientific Data Analysis Support Platform

Nathan Martindale
Oak Ridge National Laboratory

Jason Hite
Oak Ridge National Laboratory

Scott Stewart
Oak Ridge National Laboratory

Mark Adams
Oak Ridge National Laboratory

Abstract

Software data analytic workflows are a critical aspect of modern scientific research and play a crucial role in testing scientific hypotheses. A typical scientific data analysis life cycle in a research project must include several steps that may not be fundamental to testing the hypothesis, but are essential for reproducibility. This includes tasks that have analogs to software engineering practices such as versioning code, sharing code among research team members, maintaining a structured codebase, and tracking associated resources such as software environments. Tasks unique to scientific research include designing, implementing, and modifying code that tests a hypothesis. This work refers to this code as an experiment, which is defined as a software analog to physical experiments.

A software experiment manager should support tracking and reproducing individual experiment runs, organizing and presenting results, and storing and reloading intermediate data on long-running computations. A software experiment manager with these features would reduce the time a researcher spends on tedious busywork and would enable more effective collaboration. This work discusses the necessary design features in more depth, some of the existing software packages that support this workflow, and a custom developed open-source solution to address these needs.

Keywords

reproducible research, experiment life cycle, data analysis support

DOI

10.25080/majora-212e5952-01b

Bibtex entry

Full text PDF