Conference site ยป Proceedings

Awkward Array: JSON-like data, NumPy-like idioms

Jim Pivarski
Princeton University

Ianna Osborne
Princeton University

Pratyush Das
Institute of Engineering and Management

Anish Biswas
Manipal Institute of Technology

Peter Elmer
Princeton University

Video: https://youtu.be/WlnUF3LRBj4

Abstract

NumPy simplifies and accelerates mathematical calculations in Python, but only for rectilinear arrays of numbers. Awkward Array provides a similar interface for JSON-like data: slicing, masking, broadcasting, and performing vectorized math on the attributes of objects, unequal-length nested lists (i.e. ragged/jagged arrays), and heterogeneous data types.

Awkward Arrays are columnar data structures, like (and convertible to/from) Apache Arrow, with a focus on manipulation, rather than serialization/transport. These arrays can be passed between C++ and Python, and they can be used in functions that are JIT-compiled by Numba.

Development of a GPU backend is in progress, which would allow data analyses written in array-programming style to run on GPUs without modification.

Keywords

NumPy, Numba, Pandas, C++, Apache Arrow, Columnar data, AOS-to-SOA, Ragged array, Jagged array, JSON

DOI

10.25080/Majora-342d178e-00b

Bibtex entry

Full text PDF