Conference site » Proceedings

lpEdit: an editor to facilitate reproducible analysis via literate programming

Adam J Richards
Biostatistics \& Bioinformatics, Duke University Medical Center, Durham, NC, 27710, USA and Station d'Ecologie Experimentale du CNRS, Moulis, 09200, France.

Andrzej S. Kosinski
Biostatistics \& Bioinformatics, Duke University Medical Center, Durham, NC, 27710, USA.

Camille Bonneaud
Station d'Ecologie Experimentale du CNRS, Moulis, 09200, France and Centre for Ecology and Conservation, University of Exeter Cornwall, Penryn, UK.

Delphine Legrand
Station d'Ecologie Experimentale du CNRS, Moulis, 09200, France.

Kouros Owzar
Duke Cancer Institute, Duke University Medical Center, Durham, NC, 27710, USA.

Abstract

There is evidence to suggest that a surprising proportion of published experiments in science are difficult if not impossible to reproduce. The concepts of data sharing, leaving an audit trail and extensive documentation are fundamental to reproducible research, whether it is in the laboratory or as part of an analysis. In this work, we introduce a tool for documentation that aims to make analyses more reproducible in the general scientific community.

The application, lpEdit, is a cross-platform editor, written with PyQt4, that enables a broad range of scientists to carry out the analytic component of their work in a reproducible manner—through the use of literate programming. Literate programming mixes code and prose to produce a final report that reads like an article or book. lpEdit targets researchers getting started with statistics or programming, so the hurdles associated with setting up a proper pipeline are kept to a minimum and the learning burden is reduced through the use of templates and documentation. The documentation for lpEdit is centered around learning by example, and accordingly we use several increasingly involved examples to demonstrate the software’s capabilities.

We first consider applications of lpEdit to process analyses mixing R and Python code with the documentation system. Finally, we illustrate the use of lpEdit to conduct a reproducible functional analysis of high-throughput sequencing data, using the transcriptome of the butterfly species Pieris brassicae.

Keywords

reproducible research, text editor, RNA-seq

DOI

10.25080/Majora-8b375195-00e

Bibtex entry

Full text PDF