Published October 31, 2021 | Version 1.0.0
Journal article Open

Mapping Python Programs to Vectors using Recursive Neural Encodings

  • 1. The University of Sydney
  • 2. Grok Learning


Educational data mining involves the application of data mining techniques to student activity. However, in the context of computer programming, many data mining techniques can not be applied because they require vector-shaped input, whereas computer programs have the form of syntax trees. In this paper, we present ast2vec, a neural network that maps Python syntax trees to vectors and back, thereby enabling about a hundred data mining techniques that were previously not applicable. Ast2vec has been trained on almost half a million programs of novice programmers and is designed to be applied across learning tasks without re-training, meaning that users can apply it without any need for deep learning. We demonstrate the generality of ast2vec in three settings. First, we provide example analyses using ast2vec on a classroom-sized dataset, involving two novel techniques, namely progress-variance projection for visualization and a dynamical systems analysis for prediction. In these examples, we also explain how ast2vec can be utilized for educational decisions. Second, we consider the ability of ast2vec to recover the original syntax tree from its vector representation on the training data and two other large-scale programming datasets. Finally, we evaluate the predictive capability of a linear dynamical system on top of ast2vec, obtaining similar results to techniques that work directly on syntax trees while being much faster (constant- instead of linear-time processing). We hope ast2vec can augment the educational data mining toolkit by making analyses of computer programs easier, richer, and more efficient.



Files (517.6 kB)

Name Size Download all
517.6 kB Preview Download

Additional details