LIST

Vojtěch Kaše; Petra Heřmánková; Adéla Sobotková

doi:10.5281/zenodo.10146150

Published November 17, 2023 | Version v1.1

Dataset Open

LIST

1. University of West Bohemia
2. Aarhus University

The Latin Inscriptions in Space and Time (LIST) dataset is an aggregate of the Epigraphic Database Heidelberg (https://edh.ub.uni-heidelberg.de/); aggregated EDH on Zenodo and Epigraphic Database Clauss Slaby (http://www.manfredclauss.de/); aggregated EDCS on Zenodo epigraphic datasets created by the Social Dynamics in the Ancient Mediterranean Project (SDAM), 2019-2023, funded by the Aarhus University Forskningsfond Starting grant no. AUFF-E-2018-7-2. The LIST dataset consists of 525,870 inscriptions, enriched by 65 attributes. 77,091 inscriptions are overlapping between the two source datasets (i.e. EDH and EDCS); 3,316 inscriptions are exclusively from EDH; 445,463 inscriptions are exclusively from EDCS. 511,973 inscriptions have valid geospatial coordinates (the geometry attribute). This information is also used to determine the urban context of each inscription (i.e. whether it is in the neighbourhood (i.e. within a 5000m buffer) of a large city, medium city, or small city or rural (>5000m to any type of city; see the attributes urban_context, urban_context_city, and urban_context_pop). 206,570 inscriptions have a numerical date of origin expressed by means of an interval or singular year using the attributes not_before and not_after. The dataset also employs a machine learning model to classify the inscriptions covered exclusively by EDCS in terms of 22 categories employed by EDH, see Kaše, Heřmánková, Sobotkova 2021.

Formats

We publish the dataset in the parquet and geojson file format. A description of individual attributes is available in the Metadata.csv. Using geopandas library, you can load the data directly from Zenodo into your Python environment using the following command: LIST = gpd.read_parquet("https://zenodo.org/record/8431323/files/LIST_v1-0.parquet?download=1"). In R, the sfarrow and sf library hold tools (st_read_parquet(), read_sf()) to load a parquet and geojson respectively after you have downloaded the datasets locally. The scripts used to generate the dataset are available via GitHub: https://github.com/sdam-au/LI_ETL

The origin of existing attributes is further described in columns ‘dataset_source’, ‘source’, and ‘description’ in the attached Metadata.csv.

Further reading on the dataset creation and methodology:

Heřmánková, Petra, Vojtěch Kaše, and Adéla Sobotkova. “Inscriptions as Data: Digital Epigraphy in Macro-Historical Perspective.” Journal of Digital History 1, no. 1 (2021): 99. https://doi.org/10.1515/jdh-2021-1004.
Kaše, Vojtěch, Petra Heřmánková, and Adéla Sobotkova. “Classifying Latin Inscriptions of the Roman Empire: A Machine-Learning Approach.” Proceedings of the 2nd Workshop on Computational Humanities Research (CHR2021) 2989 (2021): 123–35.

Reading on applications of the datasets in research:

Glomb, Tomáš, Vojtěch Kaše, and Petra Heřmánková. “Popularity of the Cult of Asclepius in the Times of the Antonine Plague: Temporal Modeling of Epigraphic Evidence.” Journal of Archaeological Science: Reports 43 (2022): 103466. https://doi.org/10.1016/j.jasrep.2022.103466.
Kaše, Vojtěch, Petra Heřmánková, and Adéla Sobotková. “Division of Labor, Specialization and Diversity in the Ancient Roman Cities: A Quantitative Approach to Latin Epigraphy.” Edited by Peter F. Biehl. PLOS ONE 17, no. 6 (June 16, 2022): e0269869. https://doi.org/10.1371/journal.pone.0269869.

Notes on spatial attributes

Machine-readable spatial point geometries are provided within the geojson and parquet formats, as well as ‘Latitude’ and ‘Longitude’ columns, which contain geospatial decimal coordinates where these are known. Additional attributes exist that contain textual references to original location at different scales. The most reliable attribute with textual information on place of origin is the urban_context_city. This contains the ancient toponym of the largest city within a 5 km distance from the inscription findspot, using cities from Hanson’s 2016 list. After these universal attributes, the remaining columns are source-dependent, and exist only for either EDH or EDCS subsets. ‘pleiades_id’ column, for example, cross references the inscription findspot to geospatial location in the Pleiades but only in the EDH subset. ‘place’ attribute exists for data from EDCS (Ort) and contains ancient as well as modern place names referring to the findspot or region of provenance separated by “/”. This column requires additional cleaning before computational analysis. Attributes with _clean affix indicate that the text string has been stripped of symbols (such as ?), and most refer to aspects of provenance in the EDH subset of inscriptions.

List of all spatial attributes:

‘geometry’ spatial point coordinate pair, ready for computational use in R or Python ‘latitude’ and ‘longitude’ attributes contain geospatial coordinates
‘urban_context_city’ attribute contains a name (ancient toponym) of the city determining the urban context, based on Hanson 2016.
‘province’ attribute contains province names as they appear in EDCS. This attribute contains data only for inscriptions appearing in EDCS, for inscriptions appearing solely in EDH this attribute is empty.
‘pleiades_id’ provides a referent for the geographic location in Pleiades (https://pleiades.stoa.org/), provided by EDH. In EDCS this attribute is empty.
‘province_label_clean’ attribute contains province names as they appear in EDH. This attribute contains data only for inscriptions appearing in EDH, for inscriptions appearing solely in EDCS this attribute is empty.
‘findspot_ancient_clean’, ‘findspot_modern_clean’, ‘country_clean’, ‘modern_region_clean’, and ‘present_location’ are additional EDH metadata, for their description see the attached Metadata file.

Disclaimer

The original data is provided by the third party indicated as the data source (see the ‘data_source’ column in the Metadata.csv). SDAM did not create the original data, vouch for its accuracy, or guarantee that it is the most recent data available from the data provider. For many or all of the data, the data is by its nature approximate and will contain some inaccuracies or missing values. The data may contain errors introduced by the data provider(s) and/or by SDAM. We always recommend checking the accuracy directly in the primary source, i.e. the editio princeps of the inscription in question.

Files

LI_metadata.csv

Files (1.3 GB)

Name	Size
LI_metadata.csv md5:aaf32f58f4e908083bfa779e6a593026	19.7 kB	Preview Download
LIST_v1-1.geojson md5:d5f9ba7b7792fcf8a98759e38001e1ac	1.2 GB	Preview Download
LIST_v1-1.parquet md5:d2cb15a5ad35653b868b141a709efe7e	110.0 MB	Download

	All versions	This version
Views	3,646	136
Downloads	2,592	414
Data volume	602.3 GB	87.4 GB

LIST

Authors/Creators

Description

Files

LI_metadata.csv

Files (1.3 GB)