Published September 26, 2024
| Version 1.0
Dataset
Open
CodeSCAN: ScreenCast ANalysis for Video Programming Tutorials
Creators
Description
CodeSCAN is the first large-scale and diverse dataset of coding screenshots with pixel-perfect annotations. It features:
- 24 popular programming languages (according to Github)
- 100 random repositories per language (with MIT, BSD-3 or WTFPL License), i.e. 2.400 repositories in total
- Per repository we use 5 files, i.e. 12.000 files in total
- ~100 different themes and 25 different fonts
- Diverse layouts changes, such as menu bar visibility, sidebar position, output window content, etc.
- Numerous realistic interactions such as searching, typing and selecting within a file, etc.
Check our project page for details. The dataset is for academic research use only.
Files
codescan.zip
Files
(7.9 GB)
Name | Size | Download all |
---|---|---|
md5:6cc76d8865d37c1e381cd5eb741a57f6
|
7.9 GB | Preview Download |
Additional details
Related works
- Is supplement to
- arXiv:2409.18556 (arXiv)
Software
- Repository URL
- https://a-nau.github.io/codescan/