Published May 2, 2022 | Version 2021-09-13
Dataset Open

Microsoft Academic Graph

  • 1. Microsoft Academic
  • 1. CISPA Helmholtz Center for Information Security

Description

This is the Microsoft Academic Graph data from 2021-09-13. To get this, you'd normally jump through these hoops: https://docs.microsoft.com/en-us/academic-services/graph/get-started-setup-provisioning

As required by ODC-BY, I acknowledge Microsoft Academic using the URI https://aka.ms/msracad.

You can find out more about the data schema of the Microsoft Academic Graph at: https://web.archive.org/web/20220218202531/https://docs.microsoft.com/en-us/academic-services/graph/reference-data-schema
Since Microsoft docs are covered by different licensing terms, the documentation cannot be provided along with the data.

There were no changes to the files except compressing them with zstd (-T8 -19). This results in a smaller packed size, but still more data than the previous version.

The compressed files will expand to the following sizes (output of zstd -l):

Compressed  Uncompressed  Ratio Filename                         
  1.39 MiB      5.30 MiB  3.822 Affiliations.txt.zst             
  4.45 MiB      15.7 MiB  3.518 AuthorExtendedAttributes.txt.zst 
  4.29 GiB      17.4 GiB  4.052 Authors.txt.zst                  
   575 KiB      2.55 MiB  4.530 ConferenceInstances.txt.zst      
   126 KiB       453 KiB  3.598 ConferenceSeries.txt.zst         
  1.56 MiB      5.96 MiB  3.820 Journals.txt.zst                 
  12.5 GiB      51.7 GiB  4.137 PaperAuthorAffiliations.txt.zst  
   687 MiB      2.76 GiB  4.116 PaperExtendedAttributes.txt.zst  
  7.32 GiB      40.5 GiB  5.530 PaperReferences.txt.zst          
  1.23 MiB      9.72 MiB  7.894 PaperResources.txt.zst           
  18.5 GiB      72.0 GiB  3.898 Papers.txt.zst                   
  5.59 GiB      34.7 GiB  6.203 PaperUrls.txt.zst                
--------------------------------------------------               
  48.9 GiB       219 GiB  4.484  XXH64  12 files                 

This data is not the whole set what you get to download, there is much more (roughly 160GiB compressed), but the upload quota only permits this much. The additional data is retained and you may ask for it. The additional data is huge, so be prepared to provide sftp, rsync or similar access to drop the files in.

If you want to donate an update but lack the bandwidth to download and repack the set, feel free to contact me (details via my ORCiD page), once you have gone through the provisioning steps. I'll either grab the set directly from the azure storage (you might have to give me access rights) or provide an sftp/rsync drop for you to dump the data in.

The data for version 2021-09-13 was kindly contributed by Rudolf Siegel

Files

Files (52.5 GB)

Name Size Download all
md5:659c2c819bcebabf486fd695f9bbfcc0
1.5 MB Download
md5:f6647f03d2053a8a7bfd0c4e547c8bd8
4.7 MB Download
md5:6bd33d23fa6cea940721ae95401193fe
4.6 GB Download
md5:87d8035ceb2d417d761cd07f0f72ac5b
589.3 kB Download
md5:0c0e6e4155ead01a51c4c2f92895a872
128.9 kB Download
md5:c5138797c0a81698303436529593e605
1.6 MB Download
md5:175417381b63b92d7f96236e88ec2d2e
13.4 GB Download
md5:7905dc391880a3b117d747c0a705055a
720.2 MB Download
md5:c5e43b72092fab604a511725cf2e8800
7.9 GB Download
md5:3ed5237ab8b201163b04537cf954bec9
1.3 MB Download
md5:e861bb95423d8c5d36f0da7e79a1c1c1
19.8 GB Download
md5:44c86c2c602382f5688255d2d3cbf6bb
6.0 GB Download

Additional details

Related works

Is compiled by
10.1145/2740908.2742839 (DOI)