Conference site ยป Proceedings

Cloudknot: A Python Library to Run your Existing Code on AWS Batch

Adam Richie-Halford
University of Washington, Seattle, WA

Ariel Rokem
University of Washington, Seattle, WA

Video: https://youtu.be/D9LPzqoZ3f8

Abstract

We introduce Cloudknot, a software library that simplifies cloud-based distributed computing by programmatically executing user-defined functions (UDFs) in AWS Batch. It takes as input a Python function, packages it as a container, creates all the necessary AWS constituent resources to submit jobs, monitors their execution and gathers the results, all from within the Python environment. Cloudknot minimizes the cognitive load of learning a new API by introducing only one new object and using the familiar map method. It overcomes limitations of previous similar libraries, such as Pywren, that runs UDFs on AWS Lambda, because most data science workloads exceed the current limits of AWS Lambda on execution time, RAM, and local storage.

Keywords

Cloud computing, Amazon Web Services, Distributed computing

DOI

10.25080/Majora-4af1f417-001

Bibtex entry

Full text PDF