collab-uniba/KGTorrent: First release (v. 1.0.0) of KGTorrent
Description
Given their growing popularity among data scientists, computational notebooks - and in particular Jupyter notebooks - are being increasingly studied by researchers worldwide. Generally, the aim is to understand how they are typically used, identify possible flaws, and inform the design of extensions and updates of the tool. To ease these kind of research endeavors, we collected and shared a large dataset of 248,761 Jupyter notebooks from Kaggle, named KGTorrent.
Kaggle is a web platform hosting machine learning competitions that enables the creation and execution of Jupyter notebooks in a containerized computational environment. By leveraging Meta Kaggle, a dataset that is publicly available on the platform, we also built a companion MySQL database containing metadata on the notebooks in our dataset.
This repository hosts the Python scripts we developed to create KGTorrent. By leveraging the latest version of Meta Kaggle, the same scripts can also be used to refresh the collection.
For further details, please visit the full documentation of KGTorrent and the official KGTorrent GitHub repository.
Files
collab-uniba/KGTorrent-v1.0.0.zip
Files
(12.9 MB)
Name | Size | Download all |
---|---|---|
md5:e44b057633fbcacb12a2bcec76ad0a66
|
12.9 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/collab-uniba/KGTorrent/tree/v1.0.0 (URL)