Conference site ยป Proceedings

popmon: Analysis Package for Dataset Shift Detection

Simon Brugman
ING Analytics Wholesale Banking

Tomas Sostak
Vinted

Pradyot Patil
ING Analytics Wholesale Banking

Max Baak
ING Analytics Wholesale Banking

Abstract

popmon is an open-source Python package to check the stability of a tabular dataset. popmon creates histograms of features binned in time-slices, and compares the stability of its profiles and distributions using statistical tests, both over time and with respect to a reference dataset. It works with numerical, ordinal and categorical features, on both pandas and Spark dataframes, and the histograms can be higher-dimensional, e.g. it can also track correlations between sets of features. popmon can automatically detect and alert on changes observed over time, such as trends, shifts, peaks, outliers, anomalies, changing correlations, etc., using monitoring business rules that are either static or dynamic. popmon results are presented in a self-contained report.

Keywords

dataset shift detection, population shift, covariate shift, histogramming, profiling

DOI

10.25080/majora-212e5952-01d

Bibtex entry

Full text PDF