Published February 22, 2022
| Version v1
Dataset
Open
University of Notre Dame News: A Reading
Description
I have done a bit of analysis -- reading -- against the set of news distributed by the University of Notre Dame, and below is some of what I learned.
Notes
Methods
All Distant Reader data sets ("study carrels") use the same method of creation. First, a set of narrative files of just about any type and any number are saved in a folder/directory. Second, the plain text is pulled from each file and saved. Third, feature extraction is done against the plain text to create tab-delimited indexes of bibliographics, email addresses, URLs, parts-of-speech, named-entities, and computed keywords. Fourth, all of the indexes are reduced to an SQLite database file. Finally, everything (the original files, the plain text files, the indexes, and SQLite database) is compressed into a zip file for distribution. The result is a platform- and network-independent data set that can be read and processed by any number of GUI applications, programming languages, or a Python module called the Distant Reader Toolbox.Files
index.zip
Files
(1.2 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:6fe87e4e1d8ce9ebac879e5c006da224
|
1.2 GB | Preview Download |
Additional details
Related works
- Is described by
- https://distantreader.org/ (URL)
- Is identical to
- http://carrels.distantreader.org/curated-notre_dame_news-2022/index.zip (URL)
- Is part of
- http://carrels.distantreader.org (URL)
- Is variant form of
- http://carrels.distantreader.org/curated-notre_dame_news-2022/ (URL)
Software
- Repository URL
- https://github.com/ericleasemorgan/reader-toolbox
- Programming language
- Python
- Development Status
- Active