Published January 25, 2026 | Version v2
Dataset Open

Public Source Code with Artistic Intent

Description

This dataset is provided by Software Heritage Archive. We designed a heuristic to identify and extract repositories that are related to artistic coding practices. Through iterative discussions with experts in the field of generative art, we defined a list of 32 signals (topics) used to detect art-related source code. The list is available online: 

Art in humanity's code repository

NOTE: Some of the source repositories are not longer available on their original hosting platforms, however their contents are preserved and it can be accessible through the Software Heritage Archive

This dataset was created in July 2025.

 

Files

generative_art_dataset.json

Files (1.8 GB)

Name Size Download all
md5:b7e6107d2da52777279a5991527c7f4d
1.8 GB Preview Download

Additional details

Software

Repository URL
https://github.com/sparkrew/art-in-swh
Programming language
Python , JavaScript