Published February 9, 2017 | Version v1
Dataset Open

Sets of mutually similar public GitHub repositories (October 2016)

  • 1. source{d}

Description

The format is JSON, the list of lists. Each list is the group of very similar repositories (Weighted Jaccard Similarity threshold 0.8~0.9).

Files

github_duplicates.json

Files (86.8 MB)

Name Size Download all
md5:4f00537d941e736f8f5efd5cdbb3e946
86.8 MB Preview Download

Additional details