A Predictive Model to Identify Effective Metrics for the Comprehension of Computational Notebooks

anonymous

doi:10.5281/zenodo.8126338

Published July 8, 2023 | Version 1.0.0

Conference paper Open

A Predictive Model to Identify Effective Metrics for the Comprehension of Computational Notebooks

anonymous

we enhanced the measurement of the understandability level of notebook code by leveraging user comments within a software repository. As a case study, we started with 248,761 Kaggle Jupyter notebooks introduced in previous studies and their relevant metadata. To identify user comments associated with code comprehension within the notebooks, we utilized a fine-tuned DistillBERT transformer. We established a social-based criterion for measuring code understandability by considering the number of comments, their upvotes, the total views, and the total upvotes of the notebooks. This criterion has proven to be more effective than alternative methods, making it the ground truth for evaluating the code comprehension of our notebook set. In addition, we collected a total of 34 metrics for the notebooks, categorized as script-based and notebook-based metrics. These metrics were utilized as features in our dataset. Using the Random Forest classifier, our predictive model achieved 85% accuracy in predicting code comprehension levels in computational notebooks, identifying developer expertise and markdown facility utilization as key factors.

Files

PROMISE_CODE_DATA.zip

Files (21.8 MB)

Name	Size	Download all
PROMISE_CODE_DATA.zip md5:8e1bce9bcb1db1e2679a32d87737216c	21.8 MB	Preview Download

	All versions	This version
Views	82	78
Downloads	30	29
Data volume	696.8 MB	675.0 MB

A Predictive Model to Identify Effective Metrics for the Comprehension of Computational Notebooks

Creators

Description

Files

PROMISE_CODE_DATA.zip

Files (21.8 MB)