Advances in Osteometric Sorting: Utilizing Diaphyseal CSG Properties for Lower Limb Skeletal Pair‐Matching

Pair‐matching of bilateral elements is a major component of resolving commingled remains both in forensic and bioarchaeological contexts. This study presents a new method of osteometric pair‐matching of the lower limbs which relies on 3D digital models of the femur and tibia bones. The proposed method, which is accompanied by a freely available open‐source implementation, automatically computes a number of osteometric variables including cross‐sectional geometric properties from an assemblage of left and right bone antimeres and calculates probabilistically the appropriate matching pairs as well as single elements, whose bones antimere is not present in the given assemblage. The method has been extensively tested on a skeletal sample comprising 396 femurs and 422 tibias from the Athens collection. Our results in testing commingled assemblages with no disparity show that the method’s sensitivity is 1 for sorting femurs and 0.997 for sorting tibias, whereas in assemblages with moderate disparity the sensitivity is 0.999 and 0.992 respectively. Our results further indicate that sensitivity is unaffected by the size of the commingled assemblage although the percentage of identified true matching pairs drops as the number of commingled elements increases. This means that all identified antimeres matched to an individual are still very accurately sorted despite not every individual being identified in very large assemblages. The proposed method can facilitate the sorting process of commingled remains both accurately and efficiently, while leaving a very small percentage of unsorted elements that may require further techniques for further individualization.

KEYWORDS: forensic anthropology, commingled burials, osteometric sorting, CSG properties, pair-matching, lower limbs, skeletal 3D models Resolving commingled skeletal assemblages is a major challenge in forensic anthropology or bioarchaeology. In such contexts, articular congruence between adjoining bones and pairmatching of bilateral elements are essential steps of the individualization of human remains. While spatial analysis can significantly contribute to the reassociation of skeletal remains of a commingled burial (1), sorting commingled assemblages often relies on different sorting methods due to taphonomic disturbance or lack of field data. Classically, articular congruence and pairmatching have relied on visual observation examining bone associations through intrinsic morphological similarities/dissimilarities (2), which is by definition subjective and can be time consuming especially when dealing with large assemblages. Other methods include inter-bone comparisons of DNA (3), bone weight (4), ultra-violet fluorescence (5), and osteometric sorting (6).
In general, osteometric sorting could be described as a way of quantifying bone morphology and using statistical models as a more reliable and efficient substitute for associating through visual observation. Since Byrd and Adams described their original method, osteometric sorting has attracted a lot of research interest with newer studies proposing statistical refinements of previous methods (7)(8)(9)(10)(11) or incorporating novel morphological quantification schemes (12,13). Apparently, osteometric sorting remains an active topic in forensic anthropology and available methods often segregate on whether they focus on bilateral pairmatching or different bone association.
The current work focuses on pair-matching of bilateral elements and proposes a novel sorting method for commingled assemblages of femurs and tibias. Although the proposed method is based on the original principle of osteometric pair-matching, which evaluates through size and robusticity parameters if two elements are likely to originate from the same individual (6,14,15), it takes a distinct approach outlined by three key points: • Although size is evaluated through the maximum distance, and the cross-sectional areas and perimeters, robusticity is considered through the parameters of the second moment of area for each cross-section. Furthermore, shape related to diaphyseal bending is also evaluated through the dihedral angles between cross-sectional planes.
• The bilateral pair-matching is based on the difference of the absolute values of each variable between the left and right side elements under consideration and each variable is evaluated independently according to its mean (l) and standard deviation (SD) of the differences of absolute values of the matching elements from the reference sample. The resulting cumulative z-score is utilized as a probabilistic score to identify matching and mismatching pairs.
• The population reference sample descriptive statistics are taken into account for initial per variable exclusion of definite mismatched pairs, which further enhances the accuracy of the proposed method by greatly suppressing the false positive rate of matched pairs.

Materials and Methods
The material of the present study is based on the skeletal elements of the lower limbs of 230 adult individuals (age-at-death range: 18-99 years) from the Athens Collection, which is housed at the Department of Animal and Human Physiology of the National and Kapodistrian University of Athens. More specifically, 396 femurs (197 left and 199 right) derived from 214 individuals, 119 males and 95 females, (182 individuals with matched pairs; 101 males, 81 females) and another 422 tibias (209 left and 213 right) derived from 219 individuals, 121 males and 98 females, (203 individuals with matched pairs; 109 males, 94 females) were utilized. The samples were selected according to their completeness and good preservation status. Samples with pathological conditions were also included to maximize the observed variation of the sample and only cases of extreme pathological dysplasia or prosthetic implants were excluded. All skeletal elements were digitized into 3D triangular mesh models by means of photogrammetry with the Photoscan Pro v1.4 software (Agisoft LLC, Russia). The produced 3D models have a rated accuracy of~0.2 mm along their maximum length distance (16). Each skeletal element has been processed with the long-bone-diaphyseal-CSG-Toolkit, which facilitates the computation of the cross-sectional geometric (CSG) properties directly from 3D models of femur, tibia, and humerus bones (17). The CSG-Toolkit, which comprises a set of functions for the GNU Octave programming language (18), establishes the anatomical orientation of any given sample, it extracts five crosssections along its shaft at 20%, 35%, 50%, 65%, and 80% of its maximum length, and for every cross-section it calculates the perimeter, the area, and the I X , I Y , I XY , I MIN , I MAX , and h m parameters of the second moment of area. Furthermore, it calculates the maximum length of the sample as well as the cross-sectional normal vectors, which were used to calculate the dihedral angles between each pair of adjacent slicing planes. These dihedral angles and their total sum reflect the bending of the diaphyseal shaft. Further information regarding the CSG-Toolkit's usage and operation can be found in the original publication (17) and its supplementary validation report (19). In total, 46 variables were calculated for each skeletal element (femur or tibia), which are listed in Table 1. The results of the CSG-Toolkit have been assembled in tabular format for the femur and tibia bones separately using a custom built GNU Octave script, named "inspect_CSG.m", which has been added in the latest version of the CSG-Toolkit (20).
For each bone assemblage, all possible pair matches have been evaluated as the difference of the absolute values between the left and right side elements for each variable. Subsequently, descriptive statistics were applied on the matched pairs to calculate l and SD as well as the upper and lower boundaries extended by a margin of 0.2 SD. Corresponding histograms were created for matched and mismatched pairs with a bin size of 0.1 SD and a range of AE7 SD and AE11 SD for femur and tibia assemblages respectively. The lower and upper boundaries have been utilized as a threshold for excluding definite mismatched pairs, whose number was calculated for each individual variable. The proposed sorting method, which takes into account all 46 variables extracted from the 3D models, is briefly described in the following paragraph.
Load the corresponding variable descriptives (femur or tibia) and the extracted data from every skeletal element of the commingled assemblage. Left side and right elements must be segregated in two corresponding lists. Shuffle all samples into all possible left-right combinations and eliminate the cases where any variable falls outside the range between lower and upper boundaries as definite mismatched pairs. Remove all rejected cases from the list of shuffled combinations and make a list of remaining plausible matching pairs. Compare the produced list against the initially segregated left side and right side lists and any elements absent in the remaining plausible matching pairs are regarded to belong to individuals represented by a single side element. Subsequently, the remaining plausible pairs are clustered into separate smaller subgroups by exhaustively checking between the associations of left side and right side elements in the remaining list. This process does not necessarily lead to separate subgroups unless they are clearly isolated. For all remaining plausible pairs in every subgroup (if any), a cumulative zscore is calculated as a similarity-dissimilarity measure in order to rank the plausible matching pairs. The cumulative z-score is computed as: jLside var j À jRside var j À l var SD var A true matched pair is considered that with the lowest z-score among all plausible matching pairs between any two given left and right side elements as long as the lowest z-score is below a fixed value. This process is performed iteratively with the fixed z-score threshold increasing by a unit from 20 up to 30. Whenever a true matched pair is found, all associated plausible matching pairs are eliminated from the remaining list. If by the end of this process there are still plausible matching pairs in any of the clustered subgroups, these are re-clustered again by the same principle of isolated associations and each new subgroup (with remaining plausible pairs) is scanned twice for identifying true matched pairs, which on this occasion are considered these with the lowest z-score among all plausible matching pairs between any two given left and right side elements as long as the lowest z-score is less than the second lowest z-score by 5 units. At the end of this process, any remaining plausible matching pairs are considered as unresolved. The aforementioned sorting method has been implemented as a separate function, named "sort-ing_BE.m", which is freely available at https://github.com/ pr0m1th3as/osteometric-sorting, and may be used within the GNU Octave environment.
In order to evaluate the sorting accuracy of the proposed method, a number of different commingled scenarios were tested each with varying number of individuals but also with varying numbers of true matched cases and single side elements. The description of each scenario is given in Table 2 and each scenario was tested 30 times with different skeletal elements randomly selected from the available skeletal sample. The evaluation was performed separately for the femurs and the tibias and the sorting accuracy results of every iteration were accumulated accordingly. All data analyses have been performed with built GNU Octave scripts.

Results
The generated histograms of matched and mismatched pairs of the femur and the tibia for each variable are provided as a Appendix S1 in separate frequency plots grouped per variable and with top-bottom correspondence between matched and mismatched pairs. The descriptive statistics of the matched pairs of the femurs and tibias assemblages for each variable are given in Table 3 along with the exclusion percentage of the mismatched pairs, which can be safely regarded as true mismatched cases solely based on the particular variable. The descriptives (l, SD, lower and upper boundaries) for each bone are also provided in the respective.mat text data files at https://github.com/pr0m1th3a s/osteometric-sorting repository to facilitate pair-matching with the "sorting_BE.m" function. It should be noted that the excluded mismatched pairs do not necessarily regard the same pairs across all variables. Hence the total exclusion rate is much higher when all variables are accounted for. More specifically, 38,669 pairs out of 39,021 possible combinations of mismatched pairs from the available femur assemblage are correctly excluded as true mismatched cases based on the descriptive boundaries, which makes up to 99.10% of all tested mismatched pairs. As for the tibias, 43,364 pairs are correctly excluded as true mismatched cases, which constitutes the 97.86% of the 44,314 possible combinations of mismatched pairs. Accordingly, the baseline specificity (true negative rate, TNR) of the proposed bilateral pair-matching method is 0.991 and 0.9786 for the femur and tibia bone, respectively.
Applying the proposed sorting method on randomized samples according to different commingled mixtures of skeletal elements (see Table 2) also produced interesting results. These results are summarized in Tables 4 and 5 for the femur and tibia bones, respectively. These tables list the averaged values over 30 iterations for each tested scenario (sample mix). Best performance is achieved in scenarios where all individuals are represented by both sides (i100p100, i30p30, i60p60). The sorted pairs of femurs always correspond to true matched pairs, hence resulting in true positive rate (TPR) of 1, and the numbers of identified individuals are quite close to the true number of actual pairs (~96/100,~30/30,~59/60). The sorting of tibias in the same scenarios yield similar numbers of identified individuals but with lesser accuracy (respective TPR is 0.994, 0.997, 0.998), hence the correctly identified individuals are~94/100,~29/30, and 57/60, respectively. In scenarios where commingled samples comprise both matched pairs and single elements, the sensitivity is reduced. In scenarios with small ratio of single over paired elements in the evaluated sample mix, this reduction is negligible both for femurs (i120p100: 0.999; i60p50: 0.998) and tibias (i120p100: 0.988; i60p50: 0.995), but it is quite notable in the extreme scenario of assemblages with only single elements, where the TPR fell to 0.993 and 0.992 for femurs and tibias, respectively.

Discussion
Most of the work on osteometric sorting to date has been focusing on the power of exclusion as a measure of evaluating the performance of the proposed methods (6,(8)(9)(10)14,21,22). This trend heavily relies on the fact that the process of elimination is invaluable in sorting commingled remains (14,23). Hence, most studies have adopted the convention that a "positive" outcome, when testing a given pair of bones, is the rejection of the hypothesis that the two bones belong to the same individual, whereas a "negative" outcome is the failure to reject this hypothesis (9), that is, the given pair of bones could belong to the same individual (plausible match). On the contrary, the proposed method focuses on correctly associating bilateral elements that actually belong to the same individual. As a result, the TPR and TNR reported in the present study have opposite interpretations of what is usually suggested for evaluating osteometric tests for paired elements (21). The exclusion power of the proposed method, as expressed by TNR scores, is much higher than the original pair-matching osteometric sorting model (6) and two newer variants which have been evaluated in recent study on a sample from the Forensic Data Bank, USS Oklahoma, and the reference sample from within the Defense POW/MIA Accounting Agency (22). The reported exclusion rates ranged between 88.6 and 94.3% for the femur and from 86.2% to 96% for the tibia in their study, whereas the present method yielded 99.1% and 97.86%, respectively. Of course, the different working samples between the two studies might have some influence on   merging the 46 variables of the proposed method would be inappropriate since they comprise different measurement units. Moreover, any model of global summation does not allow for individual variable statistical testing and consequently disregards the origin of variation information (11). As a result, global summation methods of left-right differences ignore idiosyncratic differences between individual pairs that would otherwise be useful for detecting true mismatched pairs. Detecting idiosyncratic differences and similarities in the morphology of bilateral elements has long been established as a key issue in sorting commingled bones (23)(24)(25). Standard measurements are not sufficient in capturing this kind of variation, which is one major limitation of the available pair-matching osteometric sorting models that rely on such measurements. The advent of 3D technologies in osteological work over the previous years has also given rise to new promising methods in osteometric sorting, which incorporate landmark-based geometric morphometric methods (GMM) (12) or mesh-to-mesh comparisons (13) providing a better way of quantifying such idiosyncratic differences between bilateral elements. The newly proposed method of the present work builds on the strengths of previous methods while bypassing their limitations. Hence, it incorporates a large number of extracted variables, which represent different attributes of morphological traits, and their bimodal utilization, which switches between independent variable evaluation for eliminating false pair matches and multivariate scoring for successfully identifying true pair matches. This multivariate bimodal approach is behind the very high accuracy achieved in sorting bilateral elements of femurs and tibia.
The rationale behind testing multiple scenarios of different mixtures of bilateral elements has been twofold: to measure the performance with respect to the sample size of a commingled assemblage; and, to evaluate the effect of disparity of an assemblage in the proposed method's performance. Particularly focused on the second objective was the inclusion of the two extreme scenarios, namely i100p0 and i100p50, whose mixtures of single-sided and paired elements correspond to virtually impossible and highly improbable real-life cases of commingled assemblages respectively. Overall, the results have shown that femurs can be more effectively reassociated rather than tibias, despite previous studies reporting the opposite (21,22). This may be attributed to the different variables being utilized, which capture much more diverse patterns of morphological variation as compared to standard measurements used in osteometric sorting. Even more important, however, is the fact that both femurs and tibias can be effectively sorted with excellent accuracy, where TPR is above 0.998 and 0.987 for femurs and tibias respectively with the exception of the extreme scenarios. Most notably, all femur assemblages without disparity (scenarios i100p100, i30p30, and i60p60) constantly yielded a TPR of 1, which translates to that all sorted pairs identified to belong to the same individual were actually correct regardless of sample size. Nevertheless, it was also shown that the larger a commingled assemblage the lower the percentage of correctly identified pair matches. Sorting tibias follows the same pattern although to some lesser accuracy being observed. These findings point out that the reduction of TPR appears to be a function of disparity rather than size of the commingled assemblage, which becomes more evident in the two extreme scenarios, but size plays a role in the percentage of the sorted individuals as the number of unsorted plausible pairs rises according to the size of the commingled assemblage being tested.
The present results show that the newly proposed sorting algorithm provides increased overall performance over existing osteometric sorting methods. There are two contributing factors to this; the first being the utilized variables, which enable quantization of morphological variation that is easy to visually observe but hard to measure with standard measurements, and the second being the multivariate probabilistic approach, which preserves the origin of variation information allowing for a better similarity/dissimilarity measure between plausible pairs. Despite its improved performance, the proposed method still poses certain constraints that may impact its application in a forensic context. The most obvious is that it requires 3D digitization of the skeletal elements prior to sorting any commingled assemblage, which apart from the need for specialized equipment it can also be more laborious than taking standard osteometric measurements depending on the available equipment but also on the number of measurements. To put this in context, the 3D digitization used in the present study required about 20 min of labor time for each bone (evenly divided between taking the necessary photographs and manual postprocessing in Photoscan) and another 30 min of computer processing time in a decent workstation. Utilizing a structured light scanning system can dramatically reduce the processing time but still requires about the same amount of labor time for a comparable quality 3D model. On the upside, this is mitigated to a certain extent by the fact that both 3D digitization and the proposed sorting method itself are highly automated procedures that allow for minimum if any user error bias. This relaxes the need for highly trained practitioners for taking accurate osteometric measurements and it may prove even more valuable when collaboration is necessary in large commingled assemblages, where inter-observer measurement error might further influence the outcome of the applied osteometric sorting. Once the skeletal digitization is performed, which is mostly a matter of logistics, osteometric sorting with the proposed algorithm and the accompanying software can be accomplished in minutes. The latest version of the CSG-Toolkit took about 2.5 sec for each model and the "sorting_BE.m" function took 4.2 sec to process the largest testing assemblage (i120p100) evaluated in this work on a moderately aged laptop (DELL Latitude E6430 with a 3rd gen Intel Core i73520M processor). Perhaps the most important limitation of the proposed sorting method, at least in its present implementation, is that it can only be applied on complete bones. This limitation, however, derives from the CSG-Toolkit's own limitation, which can only work with complete long bones to extract the currently utilized variables, rather than the sorting algorithm itself, which could easily be adopted to work with a subset of these variables or even a different set of variables provided that population reference descriptives are available. Nevertheless, in presence of complete skeletal elements, the proposed method can be initially utilized to successfully resolve even large-scale commingled assemblages to a fair extend before incorporating other biological indicators, such as sex or age, for resolving the remaining plausible matches. Future work aims to further development of the CSG-Toolkit to enable extracting measurements from incomplete bones. This will allow for greater flexibility of the introduced sorting method making it useful on a broader range of commingled cases. To this end, all software implementation is freely available under a suitable open-source license and any contribution is highly welcome.

Supporting Information
Additional Supporting Information may be found in the online version of this article: Appendix S1. Histograms of matched and mismatched pairs of femur and tibia elements per variable. Figure S1. Max diameter and perimeter distributions of |left| -|right| values from the femur sample. Figure S2. Dihedral angle and diaphyseal bending distributions of |left| -|right| values from the femur sample. Figure S3. Cross-sectional area distributions of |left| -|right| values from the femur sample. Figure S4. Cross-sectional I x of 2nd moment of area distributions of |left| -|right| values from the femur sample. Figure S5. Cross-sectional I y of 2nd moment of area distributions of |left| -|right| values from the femur sample. Figure S6. Cross-sectional I xy of 2nd moment of area distributions of |left| -|right| values from the femur sample. Figure S7. Cross-sectional I min of 2nd moment of area distributions of |left| -|right| values from the femur sample. Figure S8. Cross-sectional I max of 2nd moment of area distributions of |left| -|right| values from the femur sample. Figure S9. Cross-sectional angle h m of 2nd moment of area distributions of |left| -|right| values from the femur sample. Figure S10. Max diameter and perimeter distributions of |left| -|right| values from the tibia sample. Figure S11. Dihedral angle and diaphyseal bending distributions of |left| -|right| values from the tibia sample. Figure S12. Cross-sectional area distributions of |left| -|right| values from the tibia sample. Figure S13. Cross-sectional I x of 2nd moment of area distributions of |left| -|right| values from the tibia sample. Figure S14. Cross-sectional I y of 2nd moment of area distributions of |left| -|right| values from the tibia sample. Figure S15. Cross-sectional I xy of 2nd moment of area distributions of |left| -|right| values from the tibia sample. Figure S16. Cross-sectional I min of 2nd moment of area distributions of |left| -|right| values from the tibia sample. Figure S17. Cross-sectional I max of 2nd moment of area distributions of |left| -|right| values from the tibia sample. Figure S18. Cross-sectional angle h m of 2nd moment of area distributions of |left| -|right| values from the tibia sample.