Dataset Open Access

ChroniclItaly 3.0. A deep-learning, contextually enriched digital heritage collection of Italian immigrant newspapers published in the USA, 1898-1936.

Lorella Viola; Antonio Maria Fiscarelli

This open access collection includes the digitized front pages of 10 Italian language newspapers published in California, Connecticut, Pennsylvania, Vermont, and West Virginia. It totals 8,653 issues and contains 21,454,455 words. The titles are: L’ItaliaCronaca sovversivaLa libera parolaThe patriotLa ragioneLa rassegnaLa sentinella del West VirginiaL’IndipendenteLa Sentinella, and and La Tribuna del Connecticut. The material was collected from Chronicling America, an Internet-based, searchable database of U.S. newspapers published in the United States from 1789 to 1963 made available by the Library of Congress. The corpus features mainstream (prominenti), anarchic (sovversivi), and independent newspapers thus providing a very nuanced picture of the Italian immigrant community in the United States at the turn of the twentieth century. To promote transparency, the collection includes two versions of ChroniclItaly 3.0: unprocessed (as it was collected from Chronicling America) and processed (with pre-processing interventions). Users can also find the data-sets including all the outputs from all the enrichment steps and post-intervention: named entity recognition (NER), geo-coding, sentiment analysis, and network analysis in addition to the readme.txt file that helps users navigate the folders and the metadata file containing relevant information. The code used to perform all the interventions is available at this GitHub repository https://github.com/lorellav/DeXTER-DeepTextMiner. Finally, all the enrichment outputs can be explored in the interactive app DeXTER available at https://c2dh.shinyapps.io/dexter/.

Files (592.3 MB)
Name Size
ChroniclItaly_3.0_original.zip
md5:bd8e0beb55fbf03b0a3640fd25a53256
96.8 MB Download
ChroniclItaly_3.0_processed.zip
md5:3d3499654440243344b2c18cd3ebd071
84.9 MB Download
DeXTER_readme.txt
md5:71cf6c6c29ca70499fc1a7fc9840333c
749 Bytes Download
geocoding.rar
md5:8487a3ac67836b0b6ca70d7af3685a19
20.8 kB Download
Metadata_ChroniclItaly 3.0.xlsx
md5:64cd7a2b1c4cdadc49fb79b943ea0248
28.1 kB Download
NER_corpora_tagged.zip
md5:7b3622232b56e3fb019e8214f709eb6e
228.7 MB Download
NER_dataframes.zip
md5:7f3d00f7be56d9198e99699d6fdb6e6c
98.9 MB Download
NER_dataframes_intervention.zip
md5:19bb695150854026cf802ee0986ec5a7
57.8 MB Download
NER_sentiment.zip
md5:a53af1aa95d42836cbeb3250ee66d45b
19.0 MB Download
SNA.rar
md5:cbb9b38d57926890e5450b6d88d78fae
6.0 MB Download
105
13
views
downloads
All versions This version
Views 105105
Downloads 1313
Data volume 697.2 MB697.2 MB
Unique views 8484
Unique downloads 33

Share

Cite as