Title

Mining Learning Paths from Introductory Books to Programming in Python

Abstract

Python, one of the most prevalent programming languages today, is widely used in web development, data science, machine learning, and DevOps, among others. Despite its ability to build complex applications, Python has also gained popularity as an introductory programming language and is now widely taught in schools and university introductory courses to programming. This growing interest has prompted to propose a method for assessing Python competency levels, similar to how natural language proficiency is evaluated. This paper aims to evaluate and compare the proposed competency levels with common Python learning paths, particularly those found in introductory Python programming books. To this end, we analyzed a dozen introductory Python books to determine the order in which different Python constructs are introduced. We then examined the code constructs introduced, their timing, and the extent to which the proposed competency levels align with the learning paths presented in the books. Our findings reveal significant discrepancies in the order of introduction of Python constructs. We identify these discrepancies and discuss them in detail to extract valuable lessons learned. Initiatives like the one presented in this paper can contribute to validating competency levels and developing more straightforward and even individualized learning paths for computer programming.

Description

​ This dataset complements the research project titled "Mining Learning Paths from Introductory Books to Programming in Python" It consists of the extracted dataset from our proposed regular expression extraction tool and the result of RQs from out methodology. ​

Files Overview

  • Dataset.csv: dataset of all code constructs extracted from 12 introductory python textbooks.
  • RQ0-BookClassification.csv: detail on introductory book classification.
  • RQ1-Accuracy.csv: the dataset of manual evaluation the regular expression extraction tool.
  • RQ2-AmountOfAppearance.csv: the detail of the appearance of each code constructs in 12 books
  • RQ3-DistributionOfFO.csv: the detail of the first occurrence code constructs in the book compare to page percentage.
  • RQ3-BookScore.csv: the distance of sequence in textbooks compare to optimal sequence.
  • RQ3-SequenceOfCode.csv: the sequence of the code found in each textbook.
  • RQ4-DisagreementOfCode.csv: the detail of disagreement score on each code constructs.
  • RQ5-OutlierAgreement.csv: the dataset of outlier disagreement for discussion.