Design of a Scientific Data Analysis Support Platform
Software data analytic workflows are a critical aspect of modern scientific
research and play a crucial role in testing scientific hypotheses. A typical
scientific data analysis life cycle in a research project must include
several steps that may not be fundamental to testing the hypothesis, but are
essential for reproducibility. This includes tasks that have analogs to
software engineering practices such as versioning code, sharing code among
research team members, maintaining a structured codebase, and tracking
associated resources such as software environments. Tasks unique to
scientific research include designing, implementing, and modifying code that
tests a hypothesis. This work refers to this code as an experiment, which
is defined as a software analog to physical experiments.
A software experiment manager should support tracking and reproducing
individual experiment runs, organizing and presenting results, and storing
and reloading intermediate data on long-running computations. A software
experiment manager with these features would reduce the time a researcher
spends on tedious busywork and would enable more effective collaboration.
This work discusses the necessary design features in more depth, some of the
existing software packages that support this workflow, and a custom
developed open-source solution to address these needs.
reproducible research, experiment life cycle, data analysis support
DOI10.25080/majora-212e5952-01b