Published January 12, 2022
| Version v2
Dataset
Open
Transindex
Description
This object has been created as a part of the web harvesting project of the Eötvös Loránd University Department of Digital Humanities ELTE DH. Learn more about the workflow HERE about the software used HERE.The aim of the project is to make online news articles and their metadata suitable for research purposes. The archiving workflow is designed to prevent modification or manipulation of the downloaded content. The current version of the curated content with normalized formatting in standard TEI XML format with Schema.org encoded metadata is available HERE. The detailed description of the raw content is the following:
- The portal's archived content (from 2001-01-01 to 2021-05-22) in WARC format available HERE (crawled: 2021-05-21T10:01:38.592950 - 2021-05-22T20:50:22.079445).
Please fill in the following form before requesting access to this dataset:ACCES FORM
Files
README.md
Files
(146 Bytes)
Name | Size | Download all |
---|---|---|
md5:ef949fa712a4e184fd9d88f0e5007598
|
146 Bytes | Preview Download |
Additional details
Related works
- Has part
- Dataset: 10.5281/zenodo.4899469 (DOI)
- Dataset: 10.5281/zenodo.5828866 (DOI)
Dates
- Collected
-
2001-01-01/2021-05-22content publication date interval provided by source
References
Subjects
- Web archiving
- https://id.loc.gov/authorities/subjects/sh2007000528.html
- Web archives
- https://id.loc.gov/authorities/subjects/sh2004001365.html
- Data curation
- https://id.loc.gov/authorities/subjects/sh2015001855.html