2024-03-28T09:11:12Z
https://zenodo.org/oai2d
oai:zenodo.org:3835877
2020-05-25T11:54:06Z
openaire_data
user-regio
Michael Paris
Robert Jäschke
2020-05-20
<p>The dataset comprises the URLs and corresponding time stamps of the GAW snapshot harvested in 2018-06. </p>
<p>The compressed directory <em>2018-06.zip</em> contains 100 files. Each file contains two columns delimited by a tab. The time stamps are given in the format of the java equivalent yyyyMMddHHmmss and are parsed as:</p>
<pre>DateTime.parse(timeStamp, DateTimeFormat.forPattern("yyyyMMddHHmmss")).getMillis</pre>
https://doi.org/10.5281/zenodo.3835877
oai:zenodo.org:3835877
Zenodo
https://doi.org/10.5281/zenodo.3842878
https://zenodo.org/communities/regio
https://doi.org/10.5281/zenodo.3835876
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
GAW URLs and time stamps 2018-06
info:eu-repo/semantics/other
oai:zenodo.org:3842939
2020-05-25T20:20:29Z
openaire_data
user-regio
Michael Paris
Robert Jäschke
2020-05-25
<p>The dataset comprises the URLs and corresponding time stamps of the GAW snapshot harvested in 2017-06. </p>
<p>The compressed directory <em>2017-06.zip</em> contains 100 files. Each file contains two columns delimited by a tab. The time stamps are given in the format of the java equivalent yyyyMMddHHmmss and are parsed as:</p>
<pre>DateTime.parse(timeStamp, DateTimeFormat.forPattern("yyyyMMddHHmmss")).getMillis</pre>
https://doi.org/10.5281/zenodo.3842939
oai:zenodo.org:3842939
Zenodo
https://doi.org/10.5281/zenodo.3843022
https://zenodo.org/communities/regio
https://doi.org/10.5281/zenodo.3842938
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
GAW URLs and time stamps 2017-06
info:eu-repo/semantics/other
oai:zenodo.org:3843022
2020-05-25T20:20:29Z
openaire_data
user-regio
Michael Paris
Robert Jäschke
2020-05-25
<p>The dataset comprises the URLs and corresponding time stamps of the GAW snapshot harvested in 2016-12. </p>
<p>The compressed directory <em>2016-12.zip</em> contains 100 files. Each file contains two columns delimited by a tab. The time stamps are given in the format of the java equivalent yyyyMMddHHmmss and are parsed as:</p>
<pre>DateTime.parse(timeStamp, DateTimeFormat.forPattern("yyyyMMddHHmmss")).getMillis</pre>
https://doi.org/10.5281/zenodo.3843022
oai:zenodo.org:3843022
Zenodo
https://doi.org/10.5281/zenodo.3843104
https://zenodo.org/communities/regio
https://doi.org/10.5281/zenodo.3843021
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
GAW URLs and time stamps 2016-12
info:eu-repo/semantics/other
oai:zenodo.org:3843296
2020-05-25T20:20:29Z
openaire_data
user-regio
Michael Paris
Robert Jäschke
2020-05-25
<p>The dataset comprises the URLs and corresponding time stamps of the GAW snapshot harvested in 2014-05. </p>
<p>The compressed directory <em>2014-05.zip</em> contains 100 files. Each file contains two columns delimited by a tab. The time stamps are given in the format of the java equivalent yyyyMMddHHmmss and are parsed as:</p>
<pre>DateTime.parse(timeStamp, DateTimeFormat.forPattern("yyyyMMddHHmmss")).getMillis</pre>
https://doi.org/10.5281/zenodo.3843296
oai:zenodo.org:3843296
Zenodo
https://doi.org/10.5281/zenodo.3843288
https://zenodo.org/communities/regio
https://doi.org/10.5281/zenodo.3843295
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
GAW URLs and time stamps 2014-05
info:eu-repo/semantics/other
oai:zenodo.org:3826695
2020-05-20T08:20:23Z
openaire_data
user-regio
Michael Paris
Robert Jäschke
2020-05-20
<p>The dataset comprises the URLs and corresponding time stamps of the GAW snapshot harvested in 2013-12. </p>
<p>Download the files into a new directory and reassemble the files by running from inside the dir:</p>
<p><em>zip -F 2013-12_part.zip --out 2013-12.zip</em></p>
<p>Now decompress the the file by running:</p>
<p><em>unzip 2013-12.zip</em></p>
<p>The directory <em>2013-12</em> contains 100 files, a total of 16 GB. Each file contains two columns delimited by a tab. The time stamps are given in the format of the java equivalent yyyyMMddHHmmss and are parsed as:</p>
<pre>DateTime.parse(timeStamp, DateTimeFormat.forPattern("yyyyMMddHHmmss")).getMillis</pre>
<p> </p>
https://doi.org/10.5281/zenodo.3826695
oai:zenodo.org:3826695
Zenodo
https://zenodo.org/communities/regio
https://doi.org/10.5281/zenodo.3826694
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
GAW URLs and time stamps 2013-12
info:eu-repo/semantics/other
oai:zenodo.org:3843104
2020-05-25T20:20:29Z
openaire_data
user-regio
Michael Paris
Robert Jäschke
2020-05-25
<p>The dataset comprises the URLs and corresponding time stamps of the GAW snapshot harvested in 2016-06. </p>
<p>The compressed directory <em>2016-06.zip</em> contains 100 files. Each file contains two columns delimited by a tab. The time stamps are given in the format of the java equivalent yyyyMMddHHmmss and are parsed as:</p>
<pre>DateTime.parse(timeStamp, DateTimeFormat.forPattern("yyyyMMddHHmmss")).getMillis</pre>
https://doi.org/10.5281/zenodo.3843104
oai:zenodo.org:3843104
Zenodo
https://doi.org/10.5281/zenodo.3843022
https://zenodo.org/communities/regio
https://doi.org/10.5281/zenodo.3843103
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
GAW URLs and time stamps 2016-06
info:eu-repo/semantics/other
oai:zenodo.org:3843288
2020-05-25T20:20:29Z
openaire_data
user-regio
Michael Paris
Robert Jäschke
2020-05-25
<p>The dataset comprises the URLs and corresponding time stamps of the GAW snapshot harvested in 2014-12. </p>
<p>The compressed directory <em>2014-12.zip</em> contains 100 files. Each file contains two columns delimited by a tab. The time stamps are given in the format of the java equivalent yyyyMMddHHmmss and are parsed as:</p>
<pre>DateTime.parse(timeStamp, DateTimeFormat.forPattern("yyyyMMddHHmmss")).getMillis</pre>
https://doi.org/10.5281/zenodo.3843288
oai:zenodo.org:3843288
Zenodo
https://doi.org/10.5281/zenodo.3843255
https://zenodo.org/communities/regio
https://doi.org/10.5281/zenodo.3843287
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
GAW URLs and time stamps 2014-12
info:eu-repo/semantics/other
oai:zenodo.org:3835535
2020-05-25T11:41:15Z
openaire_data
user-regio
Michael Paris
Robert Jäschke
2020-05-20
<p>The dataset comprises the URLs and corresponding time stamps of the GAW snapshot harvested in 2019-12. </p>
<p>The compressed directory <em>2019-12.zip</em> contains 100 files. Each file contains two columns delimited by a tab. The time stamps are given in the format of the java equivalent yyyyMMddHHmmss and are parsed as:</p>
<pre>DateTime.parse(timeStamp, DateTimeFormat.forPattern("yyyyMMddHHmmss")).getMillis</pre>
https://doi.org/10.5281/zenodo.3835535
oai:zenodo.org:3835535
Zenodo
https://doi.org/10.5281/zenodo.3835796
https://zenodo.org/communities/regio
https://doi.org/10.5281/zenodo.3835534
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
GAW URLs and time stamps 2019-12
info:eu-repo/semantics/other
oai:zenodo.org:3835796
2020-05-25T20:20:29Z
openaire_data
user-regio
Michael Paris
Robert Jäschke
2020-05-20
<p>The dataset comprises the URLs and corresponding time stamps of the GAW snapshot harvested in 2019-06. </p>
<p>The compressed directory <em>2019-06.zip</em> contains 100 files. Each file contains two columns delimited by a tab. The time stamps are given in the format of the java equivalent yyyyMMddHHmmss and are parsed as:</p>
<pre>DateTime.parse(timeStamp, DateTimeFormat.forPattern("yyyyMMddHHmmss")).getMillis</pre>
https://doi.org/10.5281/zenodo.3835796
oai:zenodo.org:3835796
Zenodo
https://doi.org/10.5281/zenodo.3835833
https://doi.org/10.5281/zenodo.3835535
https://zenodo.org/communities/regio
https://doi.org/10.5281/zenodo.3835795
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
GAW URLs and time stamps 2019-06
info:eu-repo/semantics/other
oai:zenodo.org:3842878
2020-05-25T20:20:29Z
openaire_data
user-regio
Michael Paris
Robert Jäschke
2020-05-25
<p>The dataset comprises the URLs and corresponding time stamps of the GAW snapshot harvested in 2017-12. </p>
<p>The compressed directory <em>2017-12.zip</em> contains 100 files. Each file contains two columns delimited by a tab. The time stamps are given in the format of the java equivalent yyyyMMddHHmmss and are parsed as:</p>
<pre>DateTime.parse(timeStamp, DateTimeFormat.forPattern("yyyyMMddHHmmss")).getMillis</pre>
https://doi.org/10.5281/zenodo.3842878
oai:zenodo.org:3842878
Zenodo
https://doi.org/10.5281/zenodo.3842939
https://zenodo.org/communities/regio
https://doi.org/10.5281/zenodo.3842877
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
GAW URLs and time stamps 2017-12
info:eu-repo/semantics/other
oai:zenodo.org:3795574
2020-05-13T20:20:40Z
openaire_data
user-regio
Lars Ganser
Michael Paris
Robert Jäschke
2020-05-06
<p>The dataset has four columns; Name of the institution, Homepage, Latitude, Longitude. </p>
<p>The entries in each row are delimited by a semicolon.</p>
https://doi.org/10.5281/zenodo.3795574
oai:zenodo.org:3795574
Zenodo
https://zenodo.org/communities/regio
https://doi.org/10.5281/zenodo.3795573
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
location university academic institution
Geolocation of German Academic Institutions
info:eu-repo/semantics/other
oai:zenodo.org:3843507
2020-05-25T20:20:29Z
openaire_data
user-regio
Michael Paris
Robert Jäschke
2020-05-25
<p><br>
The dataset has been created in an effort to establish a knowledge base on the ``German Academic Web'' (GAW). Since 2012, semi-annual focused crawls of the web pages of universities and research institutes in Germany have been performed using Heritrix, the open source archival quality web crawler of the Internet Archive.</p>
<p>Starting from a list of given seeds, follows newly discovered hyperlinks and stores seen content in the standardised WARC file format.</p>
<p><br>
For each crawl, Heritrix was initialised with a conceptually invariant seed list of, on average, 150 domains of all German academic institutions with the right to award doctorates. The seed list is extracted from the current entries on <a href="https://de.wikipedia.org/wiki/Liste_der_Hochschulen_in_Deutschland">https://de.wikipedia.org/wiki/Liste_der_Hochschulen_in_Deutschland</a></p>
<p>The crawler follows a breadth-first policy on each host, thereby collecting all available pages reachable by links from the homepage. The scope was limited to crawl only pages from the seed domains and certain file types (mainly audio, video, and compressed files) were excluded using regular expressions. <br>
<br>
Along the crawl, the URL queues were monitored via a web UI. Hosts that appeared to be undesirable, such as e-learning systems or repositories, were `retired', that is, their URLs no longer crawled. However, previously harvested URLs from retired hosts were not removed.</p>
<p><br>
Most crawls were finished (manually) after roughly 100 million pages were collected (according to Heritrix' control console), which took roughly two weeks per crawl, on average. </p>
<p>The present data set presents an overview of the size of the GAW.</p>
https://doi.org/10.5281/zenodo.3843507
oai:zenodo.org:3843507
Zenodo
https://doi.org/10.5281/zenodo.3826695
https://doi.org/10.5281/zenodo.3843296
https://doi.org/10.5281/zenodo.3843288
https://doi.org/10.5281/zenodo.3843267
https://doi.org/10.5281/zenodo.3843255
https://doi.org/10.5281/zenodo.3843104
https://doi.org/10.5281/zenodo.3843022
https://doi.org/10.5281/zenodo.3842939
https://doi.org/10.5281/zenodo.3842878
https://doi.org/10.5281/zenodo.3835877
https://doi.org/10.5281/zenodo.3835833
https://doi.org/10.5281/zenodo.3835796
https://doi.org/10.5281/zenodo.3835535
https://zenodo.org/communities/regio
https://doi.org/10.5281/zenodo.3843506
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
Summary GAW
info:eu-repo/semantics/other
oai:zenodo.org:3835833
2020-05-25T11:52:47Z
openaire_data
user-regio
Michael Paris
Robert Jäschke
2020-05-20
<p>The dataset comprises the URLs and corresponding time stamps of the GAW snapshot harvested in 2018-12. </p>
<p>The compressed directory <em>2018-12.zip</em> contains 100 files. Each file contains two columns delimited by a tab. The time stamps are given in the format of the java equivalent yyyyMMddHHmmss and are parsed as:</p>
<pre>DateTime.parse(timeStamp, DateTimeFormat.forPattern("yyyyMMddHHmmss")).getMillis</pre>
https://doi.org/10.5281/zenodo.3835833
oai:zenodo.org:3835833
Zenodo
https://doi.org/10.5281/zenodo.3835877
https://zenodo.org/communities/regio
https://doi.org/10.5281/zenodo.3835832
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
GAW URLs and time stamps 2018-12
info:eu-repo/semantics/other
oai:zenodo.org:3843267
2020-05-25T20:20:29Z
openaire_data
user-regio
Michael Paris
Robert Jäschke
2020-05-25
<p>The dataset comprises the URLs and corresponding time stamps of the GAW snapshot harvested in 2015-05. </p>
<p>The compressed directory <em>2015-05.zip</em> contains 100 files. Each file contains two columns delimited by a tab. The time stamps are given in the format of the java equivalent yyyyMMddHHmmss and are parsed as:</p>
<pre>DateTime.parse(timeStamp, DateTimeFormat.forPattern("yyyyMMddHHmmss")).getMillis</pre>
https://doi.org/10.5281/zenodo.3843267
oai:zenodo.org:3843267
Zenodo
https://doi.org/10.5281/zenodo.3843255
https://zenodo.org/communities/regio
https://doi.org/10.5281/zenodo.3843266
info:eu-repo/semantics/openAccess
Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
GAW URLs and time stamps 2015-05
info:eu-repo/semantics/other