Sex estimation: a comparison of techniques based on binary logistic, probit and cumulative probit regression, linear and quadratic discriminant analysis, neural networks, and naïve Bayes classification using ordinal variables

The performance of seven classification methods, binary logistic (BLR), probit (PR) and cumulative probit (CPR) regression, linear (LDA) and quadratic (QDA) discriminant analysis, artificial neural networks (ANN), and naïve Bayes classification (NBC), is examined in skeletal sex estimation. These methods were tested using cranial and pelvic sexually dimorphic traits recorded on a modern documented collection, the Athens Collection. For their implementation, an R package has been written to perform cross-validated (CV) sex classification and give the discriminant function of each of the methods studied. A simple algorithm that combines two discriminant functions is also proposed. It was found that the differences in the classification performance between BLR, PR, CPR, LDA, QDA, ANN, and NBC are overall small. However, LDA is simpler and more flexible than CPR, QDA, and ANN and has a small but clear advantage over BLR, NBC, and PR. Consequently, LDA may be preferred in skeletal sex estimation. Finally, it is striking that the combination of pelvic and cranial traits via their discriminant functions, determined either by BLR or LDA, removes practically any population-specificity and yields much better predictions than the individual functions; in fact, the prediction accuracy increases above 97%.


Introduction
Sex estimation is a key aspect of any forensic anthropological analysis as sex is one of the three key parameters used in the identification of an unknown individual, along with age and stature [1][2][3]. Even though sexual dimorphism in humans is small, there are morphological and metric differences between male and female skeletons, which have been used in sex estimation. Morphological methods focus primarily on the pelvis and secondarily on the cranium, and they assess the degree of expression of specific traits. Based on the qualitative assessment of this degree, they assign the individual to the male or female sex [4][5][6]. More recently, statistical analysis has started being employed, whereby the degree of expression of selected pelvic and cranial traits, recorded in an ordinal scale, is used as input variable in sex prediction using binary logistic regression analysis and/or discriminant analysis [7,8]. This statistical approach allows not only the prediction of sex but also the estimation of the probability of an individual being male or female.
Discriminant analysis and, recently, machine learning techniques, such as artificial neural networks (ANN) and naïve Bayes classification (NBC), have also been used in metric sex estimation, whereby the variables are measurements that capture both size and shape differences between males and Electronic supplementary material The online version of this article (https://doi.org/10.1007/s00414-019-02148-4) contains supplementary material, which is available to authorized users. females [9][10][11][12][13][14][15][16][17][18][19]. A restriction of metric methods compared with morphological ones is that they are of limited applicability to fragmented partially preserved remains.
A comparison of different techniques used for sex classification based on ordinal variables was carried out by Walker [8]. Walker [8] used cranial traits and examined the performance of four multivariate techniques: k-nearest neighbor, binary logistic regression (BLR), linear (LDA), and quadratic (QDA) discriminant analysis. He found that QDA, and especially k-nearest neighbor analysis, performed poorly in terms of the sex bias criterion since they produced discriminant functions the classification error rates of which were much greater for one sex than the other. In contrast, BLR and LDA had both low misclassification and low sex bias rates. From these two techniques, Walker [8] chose BLR as the best technique, mainly because it relies on fewer assumptions than LDA. Comparisons between artificial neural networks (ANN) and the classical methods LDA, QDA, and BLR have been recently carried out and found that ANN perform better than the other methods. However, all these comparisons concern metric and not ordinal data [15][16][17].
The current paper examines the performance of the above classification methods, BLR, LDA, QDA, ANN, and NBC, as well as two alternatives of BLR, the probit (PR) and the cumulative probit (CPR) regression, in sex estimation using ordinal predictors. From these methods, the most popular is the first one, since it allows the direct development of mathematical relationships between sex traits and sex. These relationships can then be easily applied for sex estimation in different assemblages. In contrast, discriminant analyses, neural networks and, naïve Bayes classification require that the training sample be used each time they are employed in sex estimation and this is a major disadvantage. For this reason, in the present paper, apart from the evaluation of these seven methods, we develop simple mathematical relationships for each of them that can be used for sex estimation, as is already the case for binary logistic regression. In addition, we explore the performance of a new approach that combines two different discriminant functions for sex prediction.

Binary logistic and probit regression
Logistic and probit models are common statistical tools for the analysis of dichotomous data. In the context of sex prediction, binary logistic regression (BLR) presumes that the sex traits (predictors) have been properly coded, usually in an ordinal scale. Similarly, the sex of the individual may also be coded as a binary variable, say 0 for male and 1 for female. Consider the sex traits X 1 , X 2 , …, binary logistic regression establishes a relationship among these variables and the probability P(sex = 1) that an individual is female (sex = 1) based on the following expression [20]: where c 0 , c 1 , c 2 , … are (adjustable) parameters usually determined using the maximum likelihood estimation provided that a training sample with documented sex vs. X 1 , X 2 , … data is available. When c 0 , c 1 , c 2 , … have been calculated, the above equation can be used to calculate P(sex = 1). Thus, if then Therefore, if P(sex = 1) > 0.5, the individual is female; otherwise, the individual is male with a probability equal to P(sex = 0) = 1-P(sex = 1).
Tables with c 0 , c 1 , c 2 , … values or the expression y = c 0 + c 1 X 1 + c 2 X 2 + ⋯ can be found in the literature and they can be used to compute P(sex = 1) by means of Eq. (3) and, therefore, estimate sex from specific traits. Such a well-known table is the one given by Walker [8] with y equations for estimating sex from cranial traits based on American/English and Native American assemblages. In this table, as well as in all relevant tables, the percentage of correctly classified males/females is also presented. This is an important piece of information because it evaluates the performance of the regression equations when applied to a certain sample. This information is obtained by means of leave-one-out cross-validation (LOOCV). According to this technique, we remove from the dataset the first case, apply binary logistic regression to the remaining cases, and, based on the obtained regression equation, we estimate the sex of the held-out case. This procedure is repeated for the second case and so on until the last case. Then we count the correct predictions and express them as a percentage of correctly classified cases.
In probit regression (PR), the inverse of the standard normal cumulative distribution Φ −1 of the probability P(sex = 1) is modeled as a linear combination of the predictors [20,21]: where the adjustable parameters c 0 , c 1 , c 2 , … may also be determined using the maximum likelihood method as in BLR. When c 0 , c 1 , c 2 , … are known, the above equation can be used to calculate P(sex = 1) since: where Φ is the standard normal cumulative distribution function. This function may be easily calculated using for example the NORMSDIST function in Excel or the pnorm function of R. BLR and PR are not subject to strict assumptions. Thus, they do not require a linear relationship between the dependent and independent variables; the predictors do not need to be normally distributed; they can handle both categorical and continuous variables, whereas homoscedasticity is not required [20]. In addition, both methods are sensitive to outliers but this is not a serious issue when using ordinal data. The difference between logistic and probit models lies in the assumptions about the distribution of the errors in order to use maximum likelihood estimation. In PR, the error terms should be normally distributed, whereas in BLR, they should have a logistic distribution.

Cumulative probit regression
The use of binary logistic and probit models in sex assessment and in particular the validity of the estimated probabilities P(sex = 1) was criticized by Konigsberg and Hens [21] within the frames of Bayes' Theorem. These authors proposed as a better alternative the cumulative probit model/regression (CPR). Here, we applied the cumulative probit model as follows. First, consider the simple case of just one sex indicator, X 1 , which can be coded using a J-point ordinal scale. That is, it can take the integer values i = 1, 2, ..., J. If we model the ordinal variable X 1 as a function of the binary variable sex (independent variable), the fitting probit model is expressed not by just one equation but from (J − 1) equations of the form: where P ci is the cumulative probability of category i, that is, the probability of X 1 falling in category i or below. Therefore, where P 1 , P 2 , …, P i is the probability of X 1 falling exactly in category 1, 2, …, i, respectively. In fact, P 1 , P 2 , …, P i are conditional probabilities and, in particular, P i is the probability conditional on sex that an individual is in the ith category of the sex indicator X 1 . When we fit the probit model to the ordinal response X 1 using sex as an independent variable, we obtain the adjustable parameters a 1 , a 2 , …, a J − 1 , b and then the conditional probabilities may be calculated from the following system of equations: Therefore, for i > 1, we have Now, if the observed score of X 1 for an individual is i, the probability that the individual is a female may be estimated by applying Bayes' Theorem [21].
where P M is the prior probability that an individual is male and P F is the prior probability that an individual is female. These quantities may be estimated, as in discriminant analysis, from P M = n M /n, P F = n F /n where n M , n F is the number of males and females, respectively, and n is the total number of cases. Alternatively, they may be estimated via an optimization procedure that searches for the P M , P F values that yield the optimum prediction performance in the training sample.
At this point, we should clarify the following. In the original paper by Konigsberg and Hens [21], function Φ(x) is defined as the standard normal integral from x to infinity, whereas in the present paper, Φ(x) is the standard normal integral from -infinity to x. Thus, their sum is equal to 1. If we take this into account, as well as the different notation used by Konigsberg and Hens [21], Eqs. (9) and (10) are identical to Eqs. (8) and (9) proposed in [21]. Therefore, up to this point, i.e., for the simple case of just one sex indicator, the method described above is close to the original approach proposed by Konigsberg and Hens. For many sex indicators, the following variation has been adopted. When there are many sex traits, X 1 , X 2 , …, Eq. (10) is still valid but now P i = P I is the probability conditional on sex that an individual is in the state I composed of the i 1 th category of X 1 , i 2 th category of X 2 , and so on. In this case, we may calculate the conditional probabilities P 1 , P 2 , …, P J for each sex indicator X 1 , X 2 , … and then P I may be estimated under the assumption of conditional independence. This is the main assumption of this approach. Note that the assumption of conditional independence is widely used in statistical studies, like in the naïve Bayes classification as well as in the Bayesian age estimation. In particular, we should stress that the naïve Bayes classification uses also the extension of Eq. (10) for many sex traits, X 1 , X 2 , … to estimate sex probabilities (see below).
The above approach is simple and applicable using any software that fits the probit model. An alternative approach, closer to the Konigsberg-Hens study, could be the following. Instead of fitting a univariate probit model, Eq. (8), we fit a multivariate cumulative probit model. This fitting gives the adjustable parameters a 1 , a 2 , …, a J − 1 , b of all traits X 1 , X 2 , … and the probability P(sex = 1| X 1 = i 1 , X 2 = i 2 , …) is estimated after the generalization of Eqs. (9) and (10) that now contain multivariate normal integrals in place of the univariate standard normal integral Φ.

Discriminant analysis
An alternative approach for sex classification and prediction is discriminant analysis (DA). DA is based on the Mahalanobis distance between x = (X 1 , X 2 , …, X m ) and the mean vectors μ 0 and μ 1 , where x is a vector of sex traits (variables) and μ 0 and μ 1 are the mean vectors of the male and female subsamples.
There are two main variants of DA: quadratic (QDA) and linear (LDA) discriminant analysis. In QDA, the probability P(sex = 1) is estimated via the following relationships [22,23]: where k = 0 or 1 are the classes of male and female, respectively, Σ k is the covariance matrix for class k, and P k are the prior probabilities used. The last quantity is estimated from P k = n k /n, where n k is the number of cases k and n is the total number of cases. Note that the first term in Eq. (12) is the socalled Mahalanobis distance between x and μ k . It can be shown that Eqs. (11) and (12) may be expressed as Eqs. (2) and (3) where y is given by: QDA is reduced to LDA if we assume Σ k = Σ, where Σ is the pooled sample covariance matrix. In this case, Eq. (13) is simplified to Equations (13) and (14) can be used like Eq. (2) for sex estimation. That is, based on a training dataset, the probabilities P = P(sex = 1) are estimated for each case using linear or/ and quadratic discriminant analysis. Then the adjustable parameters a 0 , a i , b i , c ij may be calculated by means of linear or multilinear regression using log(P/(1 − P)) as a dependent variable and X i or X i , X 2 i , X i X j as independent variables. Sex prediction in the target sample is carried out via Eqs. (3) and (14) or/and (13).
Discriminant analysis is a parametric multivariate technique and, therefore, it is subject to several assumptions: The size of the smallest group should exceed the number of sex traits (independent variables); the variables should follow the multivariate normal distribution; the variance/ covariance matrices of variables should be homogenous across groups, whereas multi-collinearity among variables should be excluded. Note that the equality of covariance assumption is not required in QDA and this is the basic reason that LDA is a much less flexible classifier than QDA. Nonetheless, as Tabachnick and Fidell [20] point out, if classification is the primary goal, then most of the above requirements do not affect the classification performance, provided that there are no outliers. Deviation from these assumptions may distort only tests of statistical significance, just as in MANOVA, if such tests have been carried out.

Neural networks
An artificial neural network (ANN) is a system of interconnected neurons, which is used for classification via processes that mimic the way the human brain works [24,25]. ANN is one of the main tools used in machine learning and it can be used for sex classification and prediction. A simple neural network is shown in Fig. 1. When the neural network is used for sex classification, the leftmost layer of the network (input layer) consists of m input units that represent the traits X 1 , X 2 , …, X m , whereas the output layer has only one node, which is related to the sex variable. Note that this variable, i.e., the output of the network, takes values between 0 and 1 and, therefore, it is related to the probability P(sex = 1). In the example of Fig. 1, there is just one hidden layer with three nodes. It is called hidden layer because its values are not observed in the training set. The circles labeled "1" are called bias units.
In every neural network, the input values X 1 , X 2 , …, X m are transformed to an output value via proper weights, w i . Thus, a neural network with m input nodes and one hidden layer with p nodes represents the following function provided that the numbering of the weights is as in Fig. 1 [25, p., 143]: where Here, f is the activation function of the network and sex ≈ P(sex = 1), i.e., we may at least as a first approximation adopt that the ANN output is the probability P(sex = 1). In what concerns the function f, we have adopted the logistic function It is seen that if we know the number of inputs and the hidden nodes as well as the weights, the estimation of sex is straightforward via Eqs. (15) and (16). There are several algorithms to train a network and compute weights. The most interesting point is that all these algorithms do not assume any underlying pattern for the data.

Naïve Bayes classification
Naïve Bayes classification (NBC) also belongs to the field of machine learning. It is a simple probabilistic classification method based on Bayes' theorem with the "naive" assumption of conditional independence. Within the frames of the present study, NBC may be considered to be related to the cumulative probit regression method, CPR, described above, because it uses the same relationship, Eq. (10), and its extension for many sex traits, X 1 , X 2 , …, to estimate sex probabilities. However, in CPR, the conditional probabilities are computed via probit regression, whereas in NBC, they are calculated directly from the counts table of the training data.

Statistical methods
Fifteen R functions have been written to perform crossvalidated sex estimation based on the above classification methods: CVBLR, CVPR, CVCPR, CVLDA, CVQDA, CVNN, CVNBC, CPRopt, CPRpm, CPRkh, sdiscr, sdiscr2, discrNN, discrQDA, and discrCPR. The first five functions perform cross-validated binary logistic and probit regression, cumulative probit regression, and linear and quadratic discriminant analysis. The CVNN function is used to implement artificial neural networks and the CVNBC function implements naive Bayes classification. The CPRopt function searches for optimum priors that may be used for the cumulative probit regression, the CPRpm performs cumulative probit regression using predefined priors, and the CPRkh performs cumulative probit regression without cross-validation and stepwise variable reduction based on the Konigsberg-Hens [21] method. The next function, sdiscr, may be used to estimate the sex and the corresponding probability for one or more individuals using an equation of the form of Eqs. (2), (3), (5), (13), and (14), i.e., equations that may be determined from the CVBLR, CVPR, CVLDA, and CVQDA functions. The sdiscr2 function can combine two discriminant functions that arise either from the BLR or LDA methods. Finally, the discrNN function estimates the sex and the corresponding (approximate) probability by means of weighting factors obtained from the CVNN function using Eq. (15); the discrQDA function uses Eqs. (11) and (12) instead of Eq. (13) for predictions based on QDA, and the last function, discrCPR, can be used for predictions based on the CPR method. The last function may also be used to make predictions based on the NBC method.
Binary logistic and probit regression are implemented using the basic glm function in combination to the CVbinary function of the DAAG library to perform leave-one-out crossvalidation (LOOCV). In addition, the stepAIC function of the MASS library is used for backward stepwise model selection, i.e., for variable reduction, which is a crucial step for building a simpler model without losing the predictive power of the data. Linear and quadratic discriminant analyses are performed via the lda and qda functions of the MASS library. These functions can also perform LOOCV. For variable Fig. 1 Simple neural network for sex classification with two input nodes and one hidden layer with three nodes reduction, the stepclass function of the klaR library is used adopting the backward technique and the accuracy (AC) as performance measure. For the CPR method, we used the polr function of the MASS library. The CVCPR function may perform LOOCV but it does not have the option for variable reduction. Note that in the CVCPR function, the probability p(sex = 1) for many sex indicators X 1 , X 2 , … is estimated under the assumption of conditional independence. To test the alternative approach of fitting a multivariate cumulative probit model, i.e., the Konigsberg-Hens [21] method, we used the mvord function of the mvord library in the CPRkh function. In this function, the cumulative distribution function of the multivariate normal distribution was calculated using the pmvnorm function of the mvtnorm library. The basic code is similar to that proposed by Konigsberg-Frankenberg [26]. Note that, due to the mvord function, this approach may be very slow at certain datasets and this is the reason we have not included the option of cross-validation.
For the implementation of ANN, the R function nnet of the homonymous library was adopted. This function is based on the feed-forward algorithm, which is a rather fast algorithm. However, we should point out that, irrespective of the algorithm used for the estimation of the weights, different runs of an ANN algorithm may give different weights, especially if their number is large enough. For this reason, for the neural network training, we used 20 repetitions and the networks with the optimum performance were selected for sex prediction. It is evident that the ANN performance depends upon the number of the hidden nodes. This number was varied from 1 to 40. An important consideration when using ANN is overfitting issues and, for this reason, it is important to perform LOOCV. Note that nnet does not have the option to perform LOOCV, and for this reason in the present study, we have written the relevant code. In what concerns variable reduction, this may be achieved qualitatively via Garson's plot, which shows the relative importance of the input variables. However, in the present study, we instead chose to use in the input layer the variables obtained from BLR/LDA/ QDA.
The problem of variable reduction appears also in NBC and it was approached as in ANN. For the implementation of NBC, the naïve_bayes function of the naivebayes library was used.
Finally, we attempted to combine two discriminant functions based on different skeletal traits, and in particular one based on cranial sex traits and another based on pelvic traits, to test if this combination would improve the performance of the individual functions. It is known that models/discriminant functions based on cranial traits exhibit strong populationspecificity; thus, they have a rather poor prediction performance especially if the discriminant function has been obtained from a different population. In contrast, discriminant functions based on pelvic traits give much better predictions (see the "Results and discussion" section). If two such functions give the same prediction for the sex of an individual, then we can be quite confident for the estimated sex. The problem occurs when the two functions/models give different predictions. For example, consider that the discriminant function based on cranial traits predicts that the sex of an individual is male with a probability say P1(sex = 0) = 0.9, whereas the discriminant function based on pelvic traits predicts that the sex of this individual is female with a probability say P2(sex = 1) = 0.6. In this case, for sex estimation, we need a selection rule. There are several choices: We may select the sex based exclusively on the probability of the best model, i.e., the model based on pelvic traits. However, this choice excludes completely the model based on cranial traits. Alternatively, we may disregard the performance of the discriminant functions and make the selection based exclusively on the predicted probabilities. Thus, in the above example, we select the prediction of the discriminant function that uses cranial traits because simply the probability of the individual being male is greater than that of being female (0.9 > 0.6). To take into account the performance of the discriminant functions, we may multiply the above probabilities by proper weighting factors that come from the performance of the discriminant functions at the target population.
In the present study, we found that the simple selection rule that focuses exclusively on the predicted probabilities is very effective when one function is based on cranial traits and the other on pelvic traits. In addition, we found that this selection rule gives results identical to those obtained from the formula for the joint probability of the individual being a female (as suggested by an anonymous reviewer): P1(sex = 1) × P2(sex = 1)/(P1(sex = 1) × P2(sex = 1) + P1(sex = 0) × P2(sex = 0)). It is evident that the above two options to combine two discriminant functions give identical results about the classification but not about the probabilities since the first option selects the maximum probability, whereas the second option computes a (new) joint probability. Although these two selection options can be applied manually, an R script has been written and is implemented in the sdiscr2 function.
More details and the code of the R functions are given as Online Resources 1 and 2.

Materials
The Athens Collection, housed at the Department of Animal and Human Physiology at the National and Kapodistrian University of Athens, Greece, consists of 225 skeletons. Most of this material has documented age, sex, occupation, and cause of death and represents individuals who lived in the second half of the twentieth century and were buried in cemeteries in the area of Athens [27]. Of this collection, 191 skeletons (106 males and 85 females) were studied in the context of this paper, excluding subadults, individuals with insufficient documentation, and individuals with pathological lesions or taphonomic damage that could inhibit the correct recording of the sex traits under examination. Note that the Athens Collection is divided in two parts; the skeletons that formed the original core of this collection have the coding WLH, while the skeletons that were added in the collection subsequently received a code of ABH. From these data, the 132 skeletons with ABH code were used for training and the 59 skeletons with WLH code were used for prediction.
The pelvic traits recorded included the ventral arc (VA), subpubic concavity (SPC), and medial aspect of the ischiopubic ramus (MAR) and their scoring followed the 5grade scheme proposed by Klales et al. [7]. The cranial traits included the mental eminence (ME), supraorbital margin (ORB), supraorbital ridge/glabella (GL), nuchal crest (NC), and mastoid process (MA), recorded as described by Walker [8] using also a 5-grade scheme. Part of this dataset (120 cases) is provided as Online Resource 3 in order to practice using the provided R code. The entire dataset is available upon request by the authors.

Results and discussion
The results obtained are presented in Tables 1, 2 Tables 4 and 5 present selected results concerning the classification performance of the neural networks when the number p of the nodes of the hidden layer is equal to 1 and when p gets its optimum value, i.e., the value that yields optimum sex classification. Table 6 shows the discriminant functions of LDA that result in the optimum sex classification for the Athens Collection and, finally, Table 7 shows the percentage of correctly classified individuals when two discriminant functions, one using pelvic and the other cranial traits, are combined. Note that some small differences in the percentages observed between Table 7 and Tables 1 and 2 are due to the different way that the missing values are eliminated in the relevant functions. The results concerning the CPR and NBC methods are presented in Tables S1 to S4 and Fig. S1. In addition, Online Resource 4 contains the complete results obtained from the neural networks (Tables S5 to S7 and Fig. S2). Note that in all ANN results, the value 0.5 was used for the weights decay.
From Tables 1, 2, and 3, we obtain that the differences between BLR, PR, LDA, and QDA are overall small; LDA gives slightly better results than BLR in 36% of the cases given in these tables, the opposite is valid in 26% of the cases, and the two methods give identical results in the remaining 38% of the cases. However, if we examine the sex prediction on the target sample, we observe that LDA generally gives better and more balanced results between sexes. For the differences between LDA and QDA, the above figures are 24%, 18%, and 58%, respectively. It is seen that although QDA includes many more terms than LDA in its discriminant function, Eqs. (13) and (14), it does not give better results. In fact, given this large number of parameters, we readily conclude that QDA does not offer a better alternative to either LDA or BLR.  From these tables we also observe that the logit model produces results very similar to probit regression. Therefore, the choice of probit versus logit depends largely on individual preferences. The same holds for the cumulative probit model as well as for NBC (Fig. S1, Tables S1 to S4). Although there are differences between the results obtained from BLR, CPR, and NBC, these differences are rather random and do not indicate that any of these models provides better predictions. Note that the results of the cumulative probit model are not improved, at least in most of the cases examined in this study, if we optimize the prior probabilities, P M , P F (Table S2). In addition, they are not improved even if we use the original, Konigsberg-Hens method (Table S3).

CV (T) CV (M) CV (F) Prediction (T) Prediction (M) Prediction (F) CV (T) CV (M) CV (F) Prediction (T) Prediction (M) Prediction (F) Prediction (T) Prediction (M) Prediction
In what concerns the ANN method, its performance strongly depends upon the number p of the nodes in the hidden layer. Best results are usually obtained for p ranging from 10 to 40. This means that the discriminant functions expressed via Eq. (15) include a great number of terms, which makes their use difficult, especially if we have to publish such equations for use by other scholars. If, in addition, we compare the prediction performance of ANN with that of LDA using the combination of traits for which LDA gives the best results, we readily conclude that LDA should be preferred over ANN.
Therefore, from the classification methods studied, LDA has a small but clear advantage over the other approaches and it may be preferred in sex estimation. However, one critical question concerning the applicability of LDA is the fulfillment of the various assumptions that should be met for the method to give reliable results. The most important of these assumptions are the normality, equality of variances/covariances, and the lack of multi-collinearity and outliers. Note that for BLR, the only assumptions that should be taken into account are the lack of multi-collinearity of the sex traits and the lack of outliers. The use of ordinal variables minimizes the existence of outliers in all methods. In R, multi-collinearity may be tested using the vif function of the car library. The application of this function showed that there is no multicollinearity among the sex traits except for ORB when it is examined in combination with pelvic traits. Therefore, only the results concerning the eight variables in Table 3 should be treated cautiously. In what concerns normality and equality of variances/covariances across groups, both are not valid when the predictors (sex traits) are ordinal variables. The reason is that ordinal data cannot be drawn from a multivariate Gaussian distribution and the Box's M test that is usually adopted to test the homogeneity of variance-covariance matrices is very sensitive to violations of normality, leading to the rejection of this assumption. Note that in R, the Box's M test may be performed by means of the boxM function of the biotools library and its application to the datasets under study showed that, indeed, the homogeneity assumption is violated. Thus, two basic assumptions underlying LDA application are violated. However, the improved prediction performance of LDA is a strong indication that the method is fairly robust to violations of these assumptions when classification is the primary goal. This is in line with Tabachnick and Fidell [20], who also point out that if a 95% accuracy in classification is achieved, "you hardly worry about the shape of distributions" (20; p. 381). Thus, the fact that the accuracy of LDA is comparable with and often better than that of BLR shows that this method can be applied for sex classification without tests for the underlying pattern of the data, as in the BLR and ANN methods.    Table 3 Percentage   Table 6 summarizes the discriminant functions of LDA that result in optimum sex classification for the Athens Collection. As expected, pelvic traits result in higher correct sex classification percentages in relation to cranial traits. It is also interesting to observe that the combination of pelvic and cranial traits improves the predictions, especially the total sex prediction, but it gives rather unbalanced results per sex as it overestimates males and underestimates females.
The combination of different sex traits is straightforward by means of the methods under study. The combination of discriminant functions already published in the literature may be achieved by means of the algorithm described in the "Statistical methods" section. Some of the results obtained are presented in Table 7. In particular, this table shows the percentage of correctly classified individuals by means of the Klales et al. [7] BLR discriminant function, various Walker's [8] BLR discriminant functions, their combination, and the corresponding LDA discriminant functions of the present work. We observe that Walker's [8] equations have a very poor prediction performance, which in turn shows that cranial sex traits exhibit very strong population-specificity, as supported by previous studies [28,29]. The Klales et al. [7] [30]. However, the most interesting finding of Table 7 is the results obtained from the combination of discriminant functions. It is seen that in most cases, the total sex classification rate is greater than 98%, whereas the sex bias is decreased below 4% and these figures are practically the same as those obtained from the LDA That is, the combination of pelvic and cranial traits via their discriminant functions removes practically any populationspecificity and yields much better predictions than the individual functions, especially those obtained from the literature. This is very clearly shown in case 14, Table 7, whereby we combined the Klales et al. [7] equation with Walker's [8] equation using only the glabella and mental eminence. It can be seen that this combination achieved a correct sex classification of 98.9% in total, while the application of the individual equations achieved a correct classification in 93.2% for the pelvic traits and 75% for the cranial ones. At the same time, the sex bias for these two functions is − 10% and 43%, respectively, whereas the proposed method reduces the bias to 2.5%. In addition, the results of the combined function are identical to those obtained when we derive equations for sex prediction based on the Athens Collection itself (case 5, Table 7). To summarize the above results, the differences in the classification performance between BLR, PR, CPR, LDA, QDA, ANN, and NBC are overall small. However, LDA is more simple and flexible than CPR, QDA, and ANN and has a small but clear advantage over BLR, PR, and NBC. Despite being a parametric multivariate technique, LDA is fairly robust to violations of relevant assumptions, and may be preferred in sex classification problems. These results concern cases where pelvic and cranial traits are examined independently. The most striking result of the current study is that the proposed approaches to combine pelvic and cranial traits via their discriminant functions, either from LDA or BLR, yield better predictions than the individual functions (correct classification rates over 98% for pooled sexes and sex bias below 3), free from population-specificity issues.