Microsoft Academic Graph
Contributors
Others:
- 1. CISPA Helmholtz Center for Information Security
Description
This is the Microsoft Academic Graph data from 2021-09-13. To get this, you'd normally jump through these hoops: https://docs.microsoft.com/en-us/academic-services/graph/get-started-setup-provisioning
As required by ODC-BY, I acknowledge Microsoft Academic using the URI https://aka.ms/msracad.
You can find out more about the data schema of the Microsoft Academic Graph at: https://web.archive.org/web/20220218202531/https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema
Since Microsoft docs are covered by different licensing terms, the documentation cannot be provided along with the data.
There were no changes to the files except compressing them with zstd (-T8 -19). This results in a smaller packed size, but still more data than the previous version.
The compressed files will expand to the following sizes (output of zstd -l):
Compressed Uncompressed Ratio Filename
1.39 MiB 5.30 MiB 3.822 Affiliations.txt.zst
4.45 MiB 15.7 MiB 3.518 AuthorExtendedAttributes.txt.zst
4.29 GiB 17.4 GiB 4.052 Authors.txt.zst
575 KiB 2.55 MiB 4.530 ConferenceInstances.txt.zst
126 KiB 453 KiB 3.598 ConferenceSeries.txt.zst
1.56 MiB 5.96 MiB 3.820 Journals.txt.zst
12.5 GiB 51.7 GiB 4.137 PaperAuthorAffiliations.txt.zst
687 MiB 2.76 GiB 4.116 PaperExtendedAttributes.txt.zst
7.32 GiB 40.5 GiB 5.530 PaperReferences.txt.zst
1.23 MiB 9.72 MiB 7.894 PaperResources.txt.zst
18.5 GiB 72.0 GiB 3.898 Papers.txt.zst
5.59 GiB 34.7 GiB 6.203 PaperUrls.txt.zst
--------------------------------------------------
48.9 GiB 219 GiB 4.484 XXH64 12 files
This data is not the whole set what you get to download, there is much more (roughly 160GiB compressed), but the upload quota only permits this much. The additional data is retained and you may ask for it. The additional data is huge, so be prepared to provide sftp, rsync or similar access to drop the files in.
If you want to donate an update but lack the bandwidth to download and repack the set, feel free to contact me (details via my ORCiD page), once you have gone through the provisioning steps. I'll either grab the set directly from the azure storage (you might have to give me access rights) or provide an sftp/rsync drop for you to dump the data in.
The data for version 2021-09-13 was kindly contributed by Rudolf Siegel
Files
Files
(52.5 GB)
Name | Size | Download all |
---|---|---|
md5:659c2c819bcebabf486fd695f9bbfcc0
|
1.5 MB | Download |
md5:f6647f03d2053a8a7bfd0c4e547c8bd8
|
4.7 MB | Download |
md5:6bd33d23fa6cea940721ae95401193fe
|
4.6 GB | Download |
md5:87d8035ceb2d417d761cd07f0f72ac5b
|
589.3 kB | Download |
md5:0c0e6e4155ead01a51c4c2f92895a872
|
128.9 kB | Download |
md5:c5138797c0a81698303436529593e605
|
1.6 MB | Download |
md5:175417381b63b92d7f96236e88ec2d2e
|
13.4 GB | Download |
md5:7905dc391880a3b117d747c0a705055a
|
720.2 MB | Download |
md5:c5e43b72092fab604a511725cf2e8800
|
7.9 GB | Download |
md5:3ed5237ab8b201163b04537cf954bec9
|
1.3 MB | Download |
md5:e861bb95423d8c5d36f0da7e79a1c1c1
|
19.8 GB | Download |
md5:44c86c2c602382f5688255d2d3cbf6bb
|
6.0 GB | Download |
Additional details
Related works
- Is compiled by
- 10.1145/2740908.2742839 (DOI)