De ﬁ ning what we study: The contribution of machine automation in archaeological research

In the 21st century, advances in computer science have impacted archaeology, most recently in the development of automated algorithms. Like most technology, these methods have been the source of ongoing debate, particularly in their utility for archaeology. Here, I focus on a contribution of automation and machine learning in archaeology that is often overlooked: the ability of computer algorithms to codify unambiguous, semantically consistent de ﬁ nitions. Archaeology has long struggled with establishing consistent characterizations of the phenomena it studies. As such, I argue that the procedures used for automated methods are useful for archae-ologists – even outside of automated analyses – by allowing for the creation of consistent de ﬁ nitions which permit for reproducible research designs.


Introduction
There is no denying that methods involving machine learning have made substantial waves in archaeological research in the past decade (Davis, 2019;Lambers, 2018;Traviglia and Torsello, 2017). With methodological innovation, debates often ensue, generally focusing on the newest advancement and their usefulness for certain research tasks. There is a longstanding debate regarding automated methods and their utility, accuracy, and place within archaeological inquiry (Bennett et al., 2014;Casana, 2014;Davis, 2019;Hanson, 2010;Orengo and Garcia-Molsosa, 2019;Parcak, 2009;Vershoof-van der Vart and Lambers, 2019), with some arguing that these methods are currently too problematic to be taken seriously and others believing they are some of the most important contributions to archaeology in the 21st century. However, there are significant contributions that can be made by these methods that go beyond automation itself. Specifically, computer automation's contributions delve into the formulation of archaeological definitions and metalanguage.
Here, I argue that the process of developing machine learning approaches in archaeology can be useful to researchers more broadly for developing semantically consistent metalanguage. Due to the diversity of archaeological information and its collection methods, the issue of data compatibility is well-established (Binding et al., 2008;Snow, 2006;Wise and Miller, 1997). Computer automation presents one means by which to address this long-standing problem. Yet, within the debate on computer automation in archaeology, rarely is the importance of semantic consistency considered. It is my hope that this article contributes to the ongoing debate surrounding machine learning and automation in archaeological analysis.

The issue of metalanguage
Since archaeology's inception, researchers have argued over definitions. This has led to the turning of complicated phenomena into simple phrases; and these phrases then become even more complicated than the phenomena they are supposed to define. Take the definition of "settlement" or "site" as an example. Over the course of archaeological history, there have been dozens, if not hundreds of different definitions (Dunnell, 1992;Parsons, 1972;Trigger, 1967). The notions of "settlement" or "site" foundational concepts for archaeologistsare still very much abstruse in meaning, not unlike other terms like "culture" (e.g., Descola, 2013;Kroeber and Kluckhone, 1952;Osgood, 1951;Harris, 2018Harris, [1970). The ambiguity surrounding such concepts makes their study a dubious task, as each researcher will have to explain the very nature of how they define these phenomena.
This problem of metalanguage goes well beyond "settlements" or definitions of large, encompassing concepts studied by anthropologists and archaeologists. The definition of specific objects also suffers this problem. Take a ceramic sherd. We can say many things about it (i.e., what it is made of, whether or not it is decorated, how it was produced, etc.). But if asked to describe to a non-archaeologist how to identify a ceramic sherd, specifically in the field, it becomes more complicated. You must first define what a ceramic sherd is: it is a fraction of a container. Then, to identify these sherds in the field, you must also explain the variability that they can have: every material they could be made of, different properties that different sherds can exhibit, the presence or absence of defining characteristics like ceramic bases, rims, etc., and how to distinguish them from other non-archaeological materials that look similar to sherds in a given landscape. Obviously, we can come to agreed-upon definitions of what ceramics are, as demonstrated by the significant advances in the fields of petrography, chemistry, morphometry, etc. (Rice, 2015). Nonetheless, when describing ceramics macroscopically for field-based detection, definitions are often regionally specific, and identifying these nearly universal materials in different locations becomes difficult, even for experienced archaeologists. What then do automated machine approaches to archaeology bring to this problem of metalanguage? The answer lies in semantic consistency.

Semantic consistency and computer automation
In order for a computer to identify patterns in data in meaningful ways, it must first be programmed by an analyst. Specifically, the computer must learn: a) patterns that fit the target of analysis (i.e., an archaeological feature); b) patterns that do not fit with the target of analysis; and c) how to distinguish between a) and b). This may sound obvious, but as the above examples indicate this is more complicated than it seems. Yet, machine learning approaches for archaeological pattern recognition, site detection, and other identification tasks have made great progress in recent years precisely because of the creation of clear, straightforward archaeological definitions.
Computer code must be unambiguous and logical, and shortfalls in either of these aspects result in failureeither in terms of inaccurate outputs or in the failure of the entire procedure. In creating a computer algorithm for detecting a class of archaeological deposits, researchers must define the exact parameters of the features they are attempting to identify in a dataset. To return to the example of a ceramic sherd, recent developments in automated detection make a compelling case for the importance of machine learning in archaeology.
Orengo and Garcia-Molsosa (2019) developed an automated pottery sherd detection method using drone imagery. This process detected a greater number of sherds than manual analysts on the ground, and part of the reason is semantic consistency. In Orengo and Garcia-Molsosa's (2019) case, they define ceramics on the basis of specific size, texture, color, and elevation difference from surrounding ground surfaces. While not a perfect method, these results prove cost effective and comparably accurate with manual survey techniques. This resulted in greater control for problematic environmental variables (e.g., vegetation cover) and faster recording rates of artifacts. Additionally, this computerized method alleviates one of the major problems with manual methods of image analysis for survey: different analysts will identify some things while overlooking others (Hawkins et al., 2003;Schon, 2000;Quintus et al., 2017). In training a computer, because of semantic consistency, the biases in identification are straightforward and identifiable, and can thus be improved upon by future research teams. In manual analysis, these biases are implicit and not always possible to remedy (let alone identify consistently).
Another example of how consistency in definitions can improve archaeology comes from the identification of mounds. Mounds are perhaps one of the most frequently studied archaeological structures, globally (e.g., Bini et al., 2018;Freeland et al., 2016;Larsen et al., 2017;Menze et al., 2006;Trier et al., 2015). They also provide key information about political organization, spirituality, and social structures (e.g., Anderson, 2004;Arnold, 2002;Boivin, 2004;Gamble, 2017;Sherratt, 1990). To identify a mound, we must begin with its defining characteristics: it is a three-dimensional topographic anomaly in a landscape. There are many types of mounds, however, some of which are not related to archaeological or historical contexts. If we use an example of burial mounds from the American Southeast, we can get more specific: a mound is a topographic anomaly which primarily contains rectangular, triangular, and trapezoidal elevation profiles. Burial mounds also display an elevation change of generally no more than 5 m in height (and this changes by subregion) (Davis et al., 2019a(Davis et al., , 2019b. In addition to 3-D profiles (including slope) and elevation thresholds, mounds are also distinguishable by their asymmetry (Kvamme, 2013), overall size, compactness, and 2-D shape (see Table 1). Each value of these characteristics will be regionally distinct to a certain degreemaking the adoption of an overarching definition difficult (if not impossible)but the characteristics themselves can be applied to identify mounded features in other regions. As shown in Table 1, attempts to automate the detection of "mounds" in different parts of the world have been increasing in accuracy in recent years.
Establishing semantically consistent definitions is difficult to capture by simple thresholding alone, however. For example, the issue of taphonomic and post-depositional processes means that the very nature of a mound's size or shape may change over time. The solution to this lies in what Magnini and Bettineschi (2019) term "Diachronic Semantic Models." In this framework, transformation of materials through time are incorporated into our formalized definitions. Researchers can then take these definitions and disseminate them throughout the archaeological community to provide "a common ground" in interpreting the archaeological record (Magnini and Bettineschi, 2019:13-14).
Using such approaches to define "mounds" permit researchers to detect features automatically with reasonably high accuracy and precision around the world (see Table 1). Furthermore, Magnini and Bettineschi's (2019) work demonstrates a solution to the problem of applying automation methods to contextually diverse components of the archaeological record. Yet, the purpose of this article is not to emphasize the many successes of automated techniques, nor is it to highlight the many limitations that still face these methods (for more information see Davis, 2019;Lambers, 2018;Luo et al., 2019;Opitz and Herrmann, 2018). Rather, this paper's emphasis is the role of these methods in the development of unambiguous, semantically consistent archaeological definitions.
Automated approaches have resulted in dozens of new definitions that can be used by researchers around the world, and which can produce replicable results (Fig. 1). The common argument made by opponents of automated archaeology is that manual analysis is more accurate. While this may be the case in some instances (although recent developments challenge this claim [e.g., Freeland et al., 2016;Guyot et al., 2018;Witharana et al., 2018]), manual evaluations are not precise (i.e., they are not always reproducible and may result in different conclusions based on who is manually evaluating a given dataset). This results in a "correct", but otherwise patchy analysis of archaeological features. Manual analysis nonetheless remains an important and necessary step for validating acquired results.
In contrast, because of their semantic consistency, computer algorithms can be implemented on different machines by a variety of researchers and produce the same output values. This presents an extraordinary achievement for archaeologists, and regardless of one's stance on automation, itself, this should be perceived as a positive development. This stems from the fact that flaws in a researcher's definition of a particular feature are recognizable by others and can thus be remedied by additional work. In manual evaluationsespecially of image datathe specifics of how a researcher (or team of researchers) generated their data is usually quite ambiguous. As such, different analysts cannot fully replicate the procedures.
The issue of semantic consistency in manual archaeological analysis can be solved by looking at the ways such definitions are methodically defined in machine learning literature (Fig. 2). In machine learning research, analysts must quantify and explicate different variables and threshold values chosen for their particular algorithm. This permits for other researchers to replicate the experiment using the same or different datasets. Manual analysts should do the same, being sure to elucidate the exact steps and reasoning for how objects and ideas are defined. It is not enough to say that features were detected by experts, but rather the variables used by the experts to arrive at that identification are needed.

The importance of semantic consistency beyond computer automation
The necessity of clear, agreed-upon terminology demanded by automated archaeological methods is a significant and important contribution to current archaeological practice. Regardless of whether a researcher agrees with automation or machine learning in archaeology, it is not debatable, in the author's mind, that the ability to lucidly define archaeological terms and objectives in a manner that can be reproduced by others (machine or human) is extremely important. In fact, has been argued for over 30 years that the establishment of clear and explicit language to guide scientific discovery is needed in archaeological practice (e.g., Binding et al., 2008;Dallas, 2016;Gardin, 1980). For example, the issue of semantic consistency has been persistent in the creation of archaeological databases in the form of metadata (Binding et al., 2008;Huggett, 2014Schlader, 2002;Wise and Miller, 1997; also see Dallas, 2016). Much akin to the development of definitions and algorithms for computerized analysis, metadata effectively comprises information pertaining to attributes, descriptions, and common context between different datasets (Schlader, 2002;. For databases to be compatible, metadata must be thorough (and codified) to enable researchers to locate and compare different information based on search-terms. As such, the development of ontologiesor shared conceptswithin both database creation and different computer analysis techniques, have permitted for better sharing of information and improved performance of data analysis (Arvor et al., 2013(Arvor et al., , 2019Binding et al., 2008;Rajbhandari et al., 2019; also see Magnini and Bettineschi, 2019).
Thus, for opponents of machine learning in archaeology, it should be noted that the argument here does not advocate such approaches as a panacea for archaeological practice. Rather, expert knowledge is crucial, as most proponents of manual analysis emphasize (e.g., Casana, 2014Casana, , 2020Quintus et al., 2017). As Gardin (1989:19) wrote: "the study of knowledge structures in any given field … is a matter for experts in that field alone …". My argument is simply that the rigorous manner with which computer automation methods define objects of study should be extended into manual analyses. This will help to enhance the dissemination of expert knowledge in reproducible ways between researchers.
Especially in the age of "big data", wherein datasets become larger and more complicated, it is essential not just to devise efficient means to analyze this information, but also to do so systematically. Manual analyses result in implicit, interobserver biases and errors by analysts (Davis, 2019;Gnaden and Holdaway, 2000;Hawkins et al., 2003;Luo et al., 2019). This makes evaluating datasets across research teams difficult, and replication of results almost impossible in some instances. With computer learning, biases in analysis methods are clear, allowing for replication of results and modification by future researchers when definitions are viewed as insufficient. In other words, definitions are demarcated by code and thus permit for others to build on those definitions (or change them completely) in ways that are readable and explicit.

Conclusion
Here, I have argued that archaeologists can move closer to semantic consistency by engaging more closely with literature on machine learning and computer automation. Future scholars should begin explicitly defining the materials that they study in a manner consistent with computer automation (i.e., definitions with reproducible characteristics via quantified and/or specific identifiable parameters). This does not mean that all work should become automated, as this is oftentimes unnecessary or inappropriate. Rather, by establishing semantic consistency outside of purely automated archaeological research, analysis of archaeological materials, in general, will greatly improve.
If we treat the collection of archaeological data as an algorithmi.e., a methodical and replicable processthen we can formulate a code, or formal set of definitions, to study the archaeological record. While Fig. 1. Diagram showing the relationship between automated and manual analysis and expert knowledge in the creation of semantic consistency and subsequently reliable conclusions. The greatest semantic consistency (signified by black portion of arrow) requires expert knowledge to be automated in some capacity, in addition to manual verification of that process. In practice, this does not require actual automation, but rather the creation of an algorithm (or codified definition) of some object being studied. For expert knowledge to be broadly disseminated, analysis must engage with both automated (i.e., standardized, unambiguous, and logical procedures) and manual methods. computer automation and machine learning still have a way to go before being easily accessible for researchers and consistent in their analytical power, the utility of generating consistent definitions for archaeological research is an undeniable benefit offered by this methodological school. As such, non-computerized archaeological practice can benefit greatly by actively engaging with computer automation literature.

Declaration of competing interest
The author has no conflicts of interest to disclose.

Fig. 2.
Illustration of how expert knowledge informs manual and automated analyses. The process of defining objects for automated procedures requires expert knowledge to be codified systematically, combined with quantified characteristics and thresholds. Manual analysis, in contrast, often relies on implicit thresholds, unsystematic incorporation of expert knowledge, and characteristics are sometimes unclearly defined.

D.S. Davis
Digital Applications in Archaeology and Cultural Heritage 18 (2020) e00152