A Predictive Model to Identify Effective Metrics for the Comprehension of Computational Notebooks
Creators
Description
we enhanced the measurement of the understandability level of notebook code by leveraging user comments within a software repository. As a case study, we started with 248,761 Kaggle Jupyter notebooks introduced in previous studies and their relevant metadata. To identify user comments associated with code comprehension within the notebooks, we utilized a fine-tuned DistillBERT transformer. We established a social-based criterion for measuring code understandability by considering the number of comments, their upvotes, the total views, and the total upvotes of the notebooks. This criterion has proven to be more effective than alternative methods, making it the ground truth for evaluating the code comprehension of our notebook set. In addition, we collected a total of 34 metrics for the notebooks, categorized as script-based and notebook-based metrics. These metrics were utilized as features in our dataset. Using the Random Forest classifier, our predictive model achieved 85% accuracy in predicting code comprehension levels in computational notebooks, identifying developer expertise and markdown facility utilization as key factors.
Files
PROMISE_CODE_DATA.zip
Files
(21.8 MB)
Name | Size | Download all |
---|---|---|
md5:8e1bce9bcb1db1e2679a32d87737216c
|
21.8 MB | Preview Download |