Published March 1, 2021
| Version v1
Dataset
Open
Sources for a reproducible IT blog corpus
Authors/Creators
Description
The dataset entail homepages for several hundred IT-blogs and websites which have been hand-picked with the intention to represent discourses dedicated to questions at the intersection of technology and society from Germany and the United States.
The corresponding text collection can be reproduced with a method to duplicate the data by updating its contents and downloading it to the user’s local machine: see https://zenodo.org/record/4552529 and https://github.com/adbar/trafilatura.
Online searches on the text corpus are also available: https://www.dwds.de/d/korpora/it_blogs
Paper "A Reproducible IT-Blog Corpus": doi.org/10.5334/johd.35
Files
IT-Blogs-DE-Homepages.txt
Files
(17.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:4295d0768dfda17d1e3d239f00d17d04
|
14.0 kB | Preview Download |
|
md5:12f1c9032580e5d653e1b31c4b407ff6
|
3.8 kB | Preview Download |