Dataset Open Access
<?xml version='1.0' encoding='UTF-8'?> <record xmlns="http://www.loc.gov/MARC21/slim"> <leader>00000nmm##2200000uu#4500</leader> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">scholarly data</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">citations</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">papers</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">arXiv.org</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">digital libraries</subfield> </datafield> <datafield tag="653" ind1=" " ind2=" "> <subfield code="a">dataset</subfield> </datafield> <controlfield tag="005">20211110155722.0</controlfield> <controlfield tag="001">2609187</controlfield> <datafield tag="700" ind1=" " ind2=" "> <subfield code="u">University of Freiburg</subfield> <subfield code="0">(orcid)0000-0001-5458-8645</subfield> <subfield code="a">Färber, Michael</subfield> </datafield> <datafield tag="856" ind1="4" ind2=" "> <subfield code="s">22891008306</subfield> <subfield code="z">md5:ebdbfdcb65636a0c1d0ffae827dde8b0</subfield> <subfield code="u">https://zenodo.org/record/2609187/files/arxiv_v2.1.tar.gz</subfield> </datafield> <datafield tag="542" ind1=" " ind2=" "> <subfield code="l">open</subfield> </datafield> <datafield tag="260" ind1=" " ind2=" "> <subfield code="c">2019-02-01</subfield> </datafield> <datafield tag="909" ind1="C" ind2="O"> <subfield code="p">openaire_data</subfield> <subfield code="p">user-natural-language-processing</subfield> <subfield code="p">user-scholarly-data</subfield> <subfield code="p">user-bibliometrics</subfield> <subfield code="o">oai:zenodo.org:2609187</subfield> </datafield> <datafield tag="100" ind1=" " ind2=" "> <subfield code="u">University of Freiburg</subfield> <subfield code="a">Saier, Tarek</subfield> </datafield> <datafield tag="245" ind1=" " ind2=" "> <subfield code="a">Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">user-bibliometrics</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">user-natural-language-processing</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">user-scholarly-data</subfield> </datafield> <datafield tag="540" ind1=" " ind2=" "> <subfield code="a">Other (Attribution)</subfield> </datafield> <datafield tag="650" ind1="1" ind2="7"> <subfield code="a">cc-by</subfield> <subfield code="2">opendefinition.org</subfield> </datafield> <datafield tag="520" ind1=" " ind2=" "> <subfield code="a"><p>We propose a new data set based on <strong>all publications from all scientific fields available on arXiv.org</strong>. Apart from providing the <strong>papers&#39; plain text</strong>, <strong>in-text citations</strong> were annotated via global identifiers. As far as possible, cited publications were linked to the <strong>Microsoft Academic Graph</strong>. Our data set consists of <strong>over one million documents</strong> and <strong>29.2 million citation contexts</strong>. The data set, which is made freely available for research purposes, not only can enhance the future evaluation of researchpaper-based and citation context-based approaches but also serve as a basis for novel ideas to analyze papers.</p> <p>More information can be found in our paper <a href="http://ceur-ws.org/Vol-2345/paper2.pdf">Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks</a>.</p> <p>See <a href="https://github.com/IllDepence/unarXive">https://github.com/IllDepence/unarXive</a> for the source code which has been used for creating the data set.</p></subfield> </datafield> <datafield tag="773" ind1=" " ind2=" "> <subfield code="n">url</subfield> <subfield code="i">isDocumentedBy</subfield> <subfield code="a">http://ceur-ws.org/Vol-2345/paper2.pdf</subfield> </datafield> <datafield tag="773" ind1=" " ind2=" "> <subfield code="n">doi</subfield> <subfield code="i">isVersionOf</subfield> <subfield code="a">10.5281/zenodo.2553522</subfield> </datafield> <datafield tag="024" ind1=" " ind2=" "> <subfield code="a">10.5281/zenodo.2609187</subfield> <subfield code="2">doi</subfield> </datafield> <datafield tag="980" ind1=" " ind2=" "> <subfield code="a">dataset</subfield> </datafield> </record>
All versions | This version | |
---|---|---|
Views | 3,437 | 557 |
Downloads | 24,197 | 11,642 |
Data volume | 494.3 TB | 266.5 TB |
Unique views | 2,750 | 494 |
Unique downloads | 3,067 | 902 |