Published July 8, 2023 | Version 1.0.0
Conference paper Open

A Predictive Model to Identify Effective Metrics for the Comprehension of Computational Notebooks

Creators

Description

we enhanced the measurement of the understandability level of notebook code by leveraging user comments within a software repository. As a case study, we started with 248,761 Kaggle Jupyter notebooks introduced in previous studies and their relevant metadata. To identify user comments associated with code comprehension within the notebooks, we utilized a fine-tuned DistillBERT transformer. We established a social-based criterion for measuring code understandability by considering the number of comments, their upvotes, the total views, and the total upvotes of the notebooks. This criterion has proven to be more effective than alternative methods, making it the ground truth for evaluating the code comprehension of our notebook set. In addition, we collected a total of 34 metrics for the notebooks, categorized as script-based and notebook-based metrics. These metrics were utilized as features in our dataset. Using the Random Forest classifier, our predictive model achieved 85% accuracy in predicting code comprehension levels in computational notebooks, identifying developer expertise and markdown facility utilization as key factors.

Files

PROMISE_CODE_DATA.zip

Files (21.8 MB)

Name Size Download all
md5:8e1bce9bcb1db1e2679a32d87737216c
21.8 MB Preview Download