pyjanitor: A Cleaner API for Cleaning Data
Eric J. Ma
Zachary Barry
Sam Zuckerman
Zachary Sailer
The pandas library has become the de facto library
for data wrangling in the Python programming language.
However, inconsistencies in the pandas application programming interface (API),
while idiomatic due to historical use,
prevent use of expressive,
fluent programming idioms that enable self-documenting pandas code.
Here, we introduce pyjanitor,
an open source Python package that extends the pandas API with such idioms.
We describe its design and implementation of the package,
provide usage examples from a variety of domains,
and discuss the ways that the pyjanitor project has enabled
the inclusion of first-time contributors to open source projects.
data engineering, data science, data cleaning
DOI10.25080/Majora-7ddc1dd1-007