Artifact for "Do Machine Learning Models Produce TypeScript Types that Type Check?"
Creators
- 1. Northeastern University
- 2. Northeaster University and Roblox Research
Description
Abstract
Type migration is the process of adding types to untyped code to gain assurance at compile
time. TypeScript and other gradual type systems facilitate type migration by allowing programmers
to start with imprecise types and gradually strengthen them. However, adding types is a manual
effort and several migrations on large, industry codebases have been reported to have taken several
years. In the research community, there has been significant interest in using machine learning to
automate TypeScript type migration. Existing machine learning models report a high degree of
accuracy in predicting individual TypeScript type annotations. However, in this paper we argue
that accuracy can be misleading, and we should address a different question: can an automatic type
migration tool produce code that passes the TypeScript type checker?
We present TypeWeaver, a TypeScript type migration tool that can be used with an arbitrary
type prediction model. We evaluate TypeWeaver with three models from the literature: DeepTyper,
a recurrent neural network; LambdaNet, a graph neural network; and InCoder, a general-purpose,
multi-language transformer that supports fill-in-the-middle tasks. Our tool automates several steps
that are necessary for using a type prediction model, including (1) importing types for a project’s
dependencies; (2) migrating JavaScript modules to TypeScript notation; (3) inserting predicted
type annotations into the program to produce TypeScript when needed; and (4) rejecting non-type
predictions when needed.
We evaluate TypeWeaver on a dataset of 513 JavaScript packages, including packages that
have never been typed before. With the best type prediction model, we find that only 21% of
packages type check, but more encouragingly, 69% of files type check successfully.
Overview: What does the artifact comprise?
- Benchmarks are source code (NPM packages)
- Results are log files, source code (TypeScript), and CSV files
- Code is Python, TypeScript, and R
Artifact Requirements
- Hardware: a GPU with at least 14 GB of VRAM
- Software:
- Linux
- Python +3.6 and the tqdm package
- Podman with the NVIDIA container toolkit
The latest version of the software can be found on GitHub: https://github.com/nuprl/TypeWeaver
Files
Files
(666.6 MB)
Name | Size | Download all |
---|---|---|
md5:415b9ed685645c24f95d500d099c4c08
|
666.6 MB | Download |