Published March 18, 2023 | Version camera
Conference paper Open

Provenance-enhanced Root Cause Analysis for Jupyter Notebooks

  • 1. University of Amsterdam

Description

With Jupyter notebooks becoming more commonly used within scientific research, more Jupyter notebook-based use cases have evolved to be distributed. This trend makes it more challenging to analyze anomalies and debug notebooks. Provenance data is an ideal option that can create more context around anomalies and make it easier to find the root cause of the anomaly. However, provenance rarely gets investigated in the context of distributed Jupyter notebooks. In this paper, we propose a framework that integrates two data types, provenance and detected performance anomalies based on performance data. We use the combined information to visually show the enduser the provenance at the time of the anomaly and the root cause of the anomaly. We build and evaluate the framework with a notebook extended with anomaly-generating functions. The generated anomalies were automatically detected, and the combined information of provenance and anomaly creates a valuable subset of the provenance data around the time an anomaly occurred. Our experiments create a clear and confined context for the anomaly and enable the framework to find the root cause of performance anomalies in Jupyter notebooks.

Files

2002.conference.ucc.intel4.camera.pdf

Files (524.5 kB)

Name Size Download all
md5:435cc0f4e707f1e9897b9d19fb7e8337
524.5 kB Preview Download

Additional details

Funding

Blue Cloud – Blue-Cloud: Piloting innovative services for Marine Research & the Blue Economy 862409
European Commission
ARTICONF – smART socIal media eCOsytstem in a blockchaiN Federated environment 825134
European Commission
ENVRI-FAIR – ENVironmental Research Infrastructures building Fair services Accessible for society, Innovation and Research 824068
European Commission