285293
doi
10.5281/zenodo.285293
oai:zenodo.org:285293
user-empirical-software-engineering
Programming language keyword frequencies extracted from 16,000,000 public GitHub repositories (October 2016)
Markovtsev Vadim
source{d}
url:https://data.world/vmarkovtsev/github-lng-keyword-frequencies
info:eu-repo/semantics/openAccess
Creative Commons Attribution Non Commercial 4.0 International
https://creativecommons.org/licenses/by-nc/4.0/legalcode
source code
github
open source
programming
compilers
programming language
<p>Origin</p>
<p>16,000,000 repositories on GitHub as of October 2016, classified with github/linguist and parsed with Pygments. Token.Keyword tokens were filtered and MapReduce-d. Fuzzy duplicate repositories were discarded.</p>
<p>Some languages, e.g. Haskell, are parsed wrong, resulting in <strong>many</strong> keywords. Still they were not removed since we are not familiar with such languages.</p>
<p>Format</p>
<p>Triples [language name]\t[keyword]\t[frequency]</p>
<p>Tabs and new lines in keywords are escaped as \t and \n respectively.</p>
Zenodo
2017-02-09
info:eu-repo/semantics/other
768905
user-empirical-software-engineering
1579893926.955145
47355505
md5:0202530008708c280cc7e641f6754596
https://zenodo.org/records/285293/files/keywords.tsv
public
https://data.world/vmarkovtsev/github-lng-keyword-frequencies
Is identical to
url
isVersionOf
doi