GIST
Description
The Greek Inscriptions in Space and Time (GIST) dataset represents a comprehensive collection of ancient Greek inscriptions, enriched by temporal and spatial metadata. The dataset was created by the Social Dynamics in the Ancient Mediterranean Project (SDAM), 2019-2023, funded by the Aarhus University Forskningsfond Starting grant no. AUFF-E-2018-7-2.
The GIST dataset is mainly based on Greek inscriptions from the dataset of Searchable Greek Inscriptions [PHI](https://inscriptions.packhum.org/) and I.PHI dataset published by the Pythia Project (Sommerschield, T. et al. 2021). Furthermore, the attributes were enriched by LOD from the Trismegistos Project, Hansen and Nielsen's (2004) Inventory of Archaic and Classical Greek City-States and Hanson's (2016) Cities Database. The text of the inscriptions was lemmatised using the AGILe lemmatiser (de Graaf et al. 2022). The rights to these data are held by the respective original projects.
The GIST dataset consists of 217,863 inscriptions, enriched by 36 attributes. The individual inscriptions have been cleaned, preprocessed and enriched with additional data, such as date in a numeric format and geolocation. The origin of existing attributes is further described in columns 'dataset_source'
, 'attribute_source'
, 'created_by_script'
and 'description'
in the attached Metadata.csv or available via GitHub.
180,061 inscriptions have valid geospatial coordinates (the `geometry`
attribute). This information is also used to determine the Roman urban context of each inscription (i.e. whether it is in the neighbourhood (i.e. within a 5000m buffer) of a large city, medium city, or small city or rural (>5000m to any type of city; see the attributes `urban_context`
, `urban_context_city`
, and `urban_context_pop`
) and for their mapping on an ancient Greek polis (if there is any within the 5000m buffer; see the attributes `polis_context_name`
, `polis_context_size`
, and `polis_context_fame`
).
131,677 inscriptions have a numerical date of origin expressed by means of an interval or singular year using the attributes `not_before`
and `not_after`
.
The scripts used to generate the dataset and their metadata are available via GitHub.
Formats
We publish the dataset in Parquet and GeoJSON file formats. A description of individual attributes is available in the Metadata.csv. Using `geopandas`
library, you can load the data directly from Zenodo into your Python environment using the following command:
`GIST = gpd.read_file("https://zenodo.org/records/10127597/files/GIST_v1-0.geojson?download=1", driver="GeoJSON")`
.
In R, the sfarrow and sf libraries hold tools, i.e., st_read_parquet(), read_sf(), to load a parquet and geojson, respectively, after you have downloaded the datasets locally.
Further reading:
- de Graaf, E., Stopponi, S., Bos, J., Peels-Matthey, S. & Nissim, M. (2022). AGILe: The First Lemmatizer for Ancient Greek Inscriptions. Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), Marseille, 20-25 June 2022. pp. 5334–5344. http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.571.pdf
- Nielsen, T. H., & Hansen, M. H. (Eds.) (2004). An Inventory of Archaic and Classical Poleis. Oxford University Press. Digitised data available through https://polis.stanford.edu
- Hanson, J. W. (2016). Cities Database (OXREP databases). Version 1.0. http://oxrep.classics.ox.ac.uk/databases/cities/. DOI: https://doi.org/10.5287/bodleian:eqapevAn8
- Kaše, V. & Glomb, T. (2022). The History of Graeco-Roman Religions in the Light of Cultural Evolution: A computational text analysis of ancient Greek inscriptions (submitted). [pdf]
- Kaše, V. & Glomb, T. (2023). Affluence, Agricultural Productivity and the Rise of Moralizing Religion in the Ancient Mediterranean. Religion, Brain & Behavior 13/2, 202-206. https://doi.org/10.1080/2153599X.2022.2065350 [link]
- Sommerschield, T., Assael, Y., Shillingford, B., Bordbar, M., Pavlopoulos, J., Chatzipanagiotou, M., Androutsopoulos, I., Prag, J., & de Freitas, N. (2021). I.PHI dataset: Ancient Greek inscriptions. https://github.com/sommerschield/iphi
Notes on spatial attributes
Machine-readable spatial point geometries are provided within the GeoJSON and parquet formats, as well as 'latitude' and 'longitude' columns, which contain geospatial decimal coordinates where these are known. Other attributes that contain spatial information have been generated from other sources. These include TMgeo_name, which provides the ID of the inscription location as presented in Trismegistos. Information on associated ancient cities within a 5 km buffer of inscription location is within the polis_ and urban_context_ attributes. 'polis-' attributes contain the name, identifier, and the rank of an associated polis from the Hansen/Nielsen's Inventory of Archaic and Classical Greek City-States (Oxford 2005), specifically a digital version of the inventory created by Joshua Ober and his team, hosted by the Stanford University library (https://polis.stanford.edu). Information on Roman-period urban contexts is present in the 'urban_context' attributes. These attributes, based on Hanson's 2016 list (http://oxrep.classics.ox.ac.uk/databases/cities/), include the rank of the associated city (the largest one within 5 km distance), ancient toponym, and population estimate.
List of all spatial attributes:
- 'geometry' - contains spatial point coordinate pair, ready for use in R or Python
- 'latitude' and 'longitude' - contain angular coordinates in decimal numeric format (EPSG4326)
- 'TMgeo_name' - id of geographic location for inscription findspot from Trismegistos
- 'polis_context_name' - the textual component of the ancient polis identifier from the digital Greek polis inventory
- 'polis_context_size' - 1 to 5 ranking, 5 is largest, based on HN estimates. Range 0-5. 1= 0-25 km sq.; 2 = 25-100 km sq, 3 = 100-200 km sq; 4 = 200-500 km sq; 5 = 500 km sq or more. 0 = no evidence for size. HN Appendix 9, with additions from Hansen 2008 and from Emily Mackil (per litt).
- 'polis_context_fame' - Number of columns of text in the HN inventory (by 1/8 column), as proxy for prominence of a given place. The range is 0.12-20.87. For display, the range will be reduced to a 1-5 ranking: 0.12-.037 = 1, 0.5-0.87 = 2, 1.0-2.87 = 3, 3.0-5.87 = 4, 6.0-20.87 = 5.
- 'urban_context' - specifies the rank of a Roman city within 5 km distance of an inscription (if one exists) on the basis of population estimated by Hanson 2016. The scale is: small, medium, large.
- 'urban_context_city' - contains the name (ancient toponym) of a city within 5 km distance of an inscription (if one exists). The city dataset is based on Hanson 2016. If the inscription's findspot fell within 5 km distance of multiple Roman cities, the largest was selected.
- 'urban_context_popest' - estimated population of the associated city from Hanson 2016, 2019
Disclaimer
Please be aware that the records in this dataset are aggregated from pre-existing sources, and additional attributes are generated on the basis of third-party data (see data provenance in the 'data_source' column in the Metadata.csv). SDAM did not create the original data, vouch for its accuracy, or guarantee that it is the most recent data available from the original data provider. Many variables contain values that are, by nature, approximate and may contain some inaccuracies or missing values. The data may also contain errors introduced by the data provider(s) and/or by SDAM. The openness of our processing scripts should facilitate the fast discovery of any such errors or discrepancies. We highly recommend checking attribute accuracy with the primary source, i.e. the *editio princeps* of the inscription in question. For derived data (e.g. urban_context), please review the associated scripts to understand their limitations.
Please contact the authors in case of any questions!