Enki: AI for Archaeology – Datasets for Automatic Site Recognition
Description
Enki: AI for Archaeology – Datasets for Automatic Site Recognition
Artificial intelligence meets archaeology with Enki, a deep learning supervised system designed to support the discovery of hidden archaeological sites. This first version of Enki dataset, released under the CC-BY-SA 4.0 license, enables the exploration and analysis of satellite and aerial images to identify potential traces of possible ancient settlements.
Available Datasets
We have released three high-quality datasets for training, validating, and testing automatic recognition models:
✅ Normalized_Tell_combined_350 (Training Set)
• 21,654 images
• Resolution: 350×350 px
• Size: 1.70 GB
✅ Val_norm (Validation Set)
• 1,290 images
• Resolution: 350×350 px
• Size: 287 MB
✅ Test_2025_02_05 (Test Set)
• 1,459 images
• Resolution: 3023×3023 px
• Size: 12.05 GB
These datasets have been specifically designed to optimize the training of computer vision models specializing in the identification of tells, anthropogenic settlements, and other archaeological structures that are invisible to the naked eye.
Usage and Contributions
The datasets can be used to develop and test machine learning models for computational archaeology, remote sensing, and landscape analysis. The source code and pre-trained models of Enki are available on GitHub:
We encourage researchers to contribute, improve, and apply these tools to expand the potential of artificial intelligence in uncovering the past.
📌 License: CC-BY-SA 4.0
Main scientific Credits: The images of many of the sites used to train the model in these datasets are based on mappings from the Ancient Near East (ANE) Project, a comprehensive spatial database of archaeological sites in the Near East. The ANE Project, licensed under CC-BY-SA 4.0, provides an essential foundation for research in computational archaeology and remote sensing. We extend our gratitude to the creators of ANE for their valuable contribution to archaeological research. More details can be found here: ANE on Zenodo.
Fair Use Statement for Satellite Imagery
The datasets provided in this repository include image data sourced from Google Satellite and Bing Satellite for research purposes. The use of these images is justified under the principles of fair use based on the following considerations:
1. Educational and Research Purpose
• The images are utilized exclusively for scientific research, archaeological analysis, and non-commercial academic study. The primary goal is to advance machine learning applications in computational archaeology and automated site detection.
2. Transformative Use
• The satellite imagery has been processed, modified, and analyzed to create training datasets that significantly differ from the original raw images. These transformations include normalization, segmentation, labeling, and AI-driven feature extraction, making them part of a novel and distinct dataset for archaeological research.
3. Limited and Non-Commercial Use
• The datasets do not redistribute raw or unaltered satellite images. Instead, they consist of processed image datasetsspecifically tailored for AI-based archaeological site detection.
• No commercial benefit is derived from the use of these images. The datasets are shared under a Creative Commons CC-BY-SA 4.0 license to promote open research.
4. Attribution and Compliance with Terms of Service
• Google and Bing Satellite imagery remain the property of their respective providers.
• Users of these datasets are encouraged to adhere to the terms of service of Google Maps, Google Earth, Bing Maps, and other satellite data providers when using or referencing the original sources.
• Any publication or derivative work based on these datasets should acknowledge the original satellite data providerswhere applicable.
5. Public Interest and Scientific Advancement
• The research aims to enhance archaeological discovery methods using AI, contributing to the scientific community and cultural heritage preservation.
• By sharing these processed datasets openly, we enable greater transparency, reproducibility, and collaborative research in the fields of remote sensing, machine learning, and digital archaeology.
📌 For further information on satellite imagery terms of use, refer to:
Files
Normalized_Tell_combined_350.zip
Additional details
Dates
- Available
-
2025-03-01Dataset published
- Created
-
2024-08-04Dataset creation