Conference site ยป Proceedings

Text and data mining scientific articles with allofplos

Elizabeth Seiver

M Pacer
Netflix

Sebastian Bassi
Globant

Abstract

Mining scientific articles is hard when many of them are inaccessible behind paywalls. The Public Library of Science (PLOS) is a non-profit Open Access science publisher of the single largest journal (PLOS ONE), whose articles are all freely available to read and re-use. allofplos is a Python package for maintaining a constantly growing collection of PLOS's 230,000+ articles. It also efficiently parses these article files into Python data structures. This article will cover how allofplos keeps your articles up-to-date, and how to use it to easily access common article metadata and fuel your meta-research, with actual use cases from inside PLOS.

Keywords

Text and data mining, metascience, open access, science publishing, scientific articles, XML

DOI

10.25080/Majora-4af1f417-009

Bibtex entry

Full text PDF