Published April 4, 2023 | Version v1
Presentation Open

Chasing social complexity through body ornaments in the recent prehistory of Iberia. Implemantation of an archaeochemical tool for prehistoric data analysis and predictive modelling

Authors/Creators

Description

1. Introduction

Since the expansion of the Neolithic across the Mediterranean, with the arrival and consolidation of village life and agriculture, long-distance exchange experiences unusual growth, and exotic items become an important means of displaying new roles and social differences within and between communities. The use and association of exotic items with specific individuals increase as social complexity grows and it becomes more necessary to enhance social differences and exhibit the status of the bearer. The study of the geographical origin and the spatio-temporal distribution patterns of exotic raw materials and their products are trending topics in European archaeological research since they are considered key to understanding social interaction and the mobility patterns of individuals and goods at different scales. Variscite-like minerals are some of the most used materials for the elaboration of body adornments in prehistory that have both, an important weight in the archaeological record and a relevant research tradition in the Iberian Peninsula context. Thanks to the development of novel chemical analytical techniques, which are both portable and non-destructive, in the last ten years it has been possible to record data of thousands of items of personal adornment made out of different minerals from more than 900 archaeological sites on the Iberian Peninsula through different projects, which represents a first-rate experimental data set for the study of these subjects. Despite the existence of such relevant data sets, to date, there are both methodological and theoretical challenges in extracting knowledge from this type of resource due to the lack of comprehensive studies with a data-driven approach. It is still necessary to make exhaustive inventories of Iberian sources, mineralogical characterisation of items, and systematisation of scattered and unpublished data among other urgent tasks that will improve or reconsider provenance models used to explain the socio- economic dynamics in late prehistory. Through the use of different techniques of Computational archaeology such as Data mining and Machine Learning, this project aims to explore a data-driven approach to solve some of the main methodological challenges in the study of the socio-economic complexity in the late prehistory of the Iberian Peninsula and develop an Open Access approach for the publication of results by the necessity of digitalization of humanities. 3. Results and Discussion

2. Methods and Materials 

A Machine Learning pipeline that involved several data preparation techniques and algorithm experimentation routines has been developed to predict the mineral Group and Subgroup to which a sample obtained from p-XRF belongs using open-source tools (Ali, 2020; pandas development team, 2020; Pedregosa et al., 2011; Pérez & Granger, 2007) To train our model we have used a real-world dataset that comprises more than two thousand records of items of personal adornment made out of more than 25 different mineral classes from more than 900 archaeological sites on the Iberian Peninsula. The shape of our original dataset comprised 46 numerical features and one target with 26 categories. A total of 16050 data points were reached after the implementation of a mixed technique of case deletion, undersampling and oversampling to create synthetic data for minority classes(Carlson, 2017; Thai-Nghe, Gantner, & Schmidt-Thieme, 2010). Different models were developed using ten different algorithms and its performance was evaluated with a k=10 cross-validation procedure. After choosing the best model, an hyperparameter optimization routine was implemented and evaluated through different metrics (accuracy, F1 score, recall and precision). Finally, we validated our model using a real-world case and deployed it in a general-purpose web application

3. Results and Discussion

We have developed a model as complex as necessary but as simple as possible. From a technical point of view, our problem is quite simple once the modelling of the problem has allowed us to use machine learning algorithms whose performance is optimal and does not represent high computational costs. Our model reached 87-95% of scores in different metrics (Precision, Recall and F1) and we have deployed it in a web app for public use. However, high scores on various metrics achieved through various training and testing routines aside, our model, as an ongoing project, has been re-evaluated as our data set grows and new questions arise. The challenges of using AI-derived techniques and a data-driven approach to solve archaeological questions still raise several issues. Our results remain open questions

• How do AI-derived methods reconfigure archaeological research?

• How to present final results when the data continue to grow?

• How is the workflow of an archaeological research team transformed when the

core of the research is an AI and data-driven approach?

4. References

Ali, M. (2020). PyCaret: An open-source, low-code machine learning library in Python.Retrieved from https://www.pycaret.org

Carlson, D. L. (2017). Quantitative Methods in Archaeology using R (1 st ed.). Cambridge: CAMBRIDGE UNIVERSITY PRESS. https://doi.org/10.1017/9781139628730

pandas development team, T. (2020). pandas-dev/pandas: Pandas. Zenodo. https://doi.org/10.5281/zenodo.3509134

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., …Duchesnay, E. (2011). Scikit-learn: Machine Learning in {P}ython. Journal of Machine Learning Research, 12, 2825–2830.

Pérez, F., & Granger, B. E. (2007). {IP}ython: a System for Interactive Scientific Computing. Computing in Science and Engineering, 9(3), 21–29. https://doi.org/10.1109/MCSE.2007.53

Thai-Nghe, N., Gantner, Z., & Schmidt-Thieme, L. (2010). Cost-sensitive learning methods for imbalanced data. In The 2010 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). https://doi.org/10.1109/IJCNN.2010.5596486

 

 

Files

Certificate of Attendance CAA 2023.pdf

Files (15.5 MB)

Name Size Download all
md5:10186bff986d495c468f39fbe9a0e474
15.3 MB Download
md5:d073013e37400b0b8a2de7e6a60198bb
171.3 kB Preview Download