Identifying traits that enable lizard adaptation to different habitats

Species adapt differently to contrasting environments, such as open habitats with sparse vegetation and forested habitats with dense forest cover. We investigated colonization patterns in the open and forested environments in the diagonal of open formations and surrounding rain forests (i.e. Amazonia and Atlantic Forest) in Brazil, tested whether the diversification rates were affected by the environmental conditions and identified traits that enabled species to persist in those environments.


| INTRODUC TI ON
Understanding the processes that have shaped species' geographical distributions has intrigued scientists for centuries (Sclater, 1858;Wallace, 1854). Two main processes-adaptation and dispersal-can lead to a species occupying a novel habitat. Communities are formed where taxa evolve in situ and adapt to changes in their environment or where ancestors disperse into these areas from other regions.
Regardless of which process account for the presence in a community, successful occupancy of a particular habitat requires tolerance to prevailing abiotic conditions, the acquisition of resources and the avoidance of predation. Since organismal traits play a crucial role in these actions, explicit consideration of organismal traits is likely to yield clues that tell us how species colonize and persist in novel habitats.
Organismal traits mediate the potential for adaptation and dispersal, leading to colonization, which influences how far species can disperse, cross geographical barriers and persist in the new areas. Furthermore, the evolution of critical traits may enable a species to diversify in new environments (Stroud & Losos, 2016).
Diversification may also be facilitated by the absence of predators and competitors (Moore & Donoghue, 2007). Alternatively, species may adapt to a changing environment and persist through time in the same region, depending on the speed and intensity of environmental change and the standing genetic variation (Frank & Slatkin, 1992).
Intense environmental pressures can influence the development of similar traits among distantly related species. For instance, different plant families inhabiting similarly dry environments in different parts of the Earth have leaves adapted to water storage (Eggli & Nyffeler, 2009). Differentiating between in situ adaptation and colonization is difficult, particularly without abundant fossil evidence or other records showing the past geographical distribution of a lineage.
Identifying traits that enable colonization is an essential step towards understanding species diversification. Numerous studies have investigated the patterns of colonization, their direction and species recolonization in areas with different environmental characteristics (Antonelli et al., 2018;Dutech et al., 2003;Gagnon et al., 2019;Hughes et al., 2013). For many plant species, transitions between environmentally different areas were less common than expected by chance. This finding may be explained by the niche conservatism hypothesis, where such shifts are rare due to the tendency of species to retain their ancestral ecological niches (Crisp et al., 2009). Other evidence supports broad shifts in habitat over time and the persistence of species. For example, studies in the Neotropics demonstrate transitions from wet and forested ancestral habitat to drier and open new habitat for multiple species groups (Antonelli et al., 2018;Zizka et al., 2020). These studies suggest that investigation into other groups is essential to generalize these patterns. Importantly, these environments are heterogeneous and encompass patches of humid forested habitats (Mesquita et al., 2017). Although forested biomes (Figure 1) have higher precipitation and are composed chiefly of forested habitats (Franchito et al., 2009), they also include areas of open habitats (Moraes et al., 2020). Hence, it is crucial to account for habitat heterogeneity when assessing the drivers of adaptation and colonization within broad regions.
Both open and forested habitats have changed through time as a response to climate and landscape modifications. For example, Andean uplift (during Eocene and Miocene) likely influenced the climate in South America by increasing aridity in many regions (Armijo et al., 2015) and modifying the Amazon drainage patterns (Hoorn et al., 2010). In the Middle Miocene, the Cerrado biome went through changes in vegetation composition by expanding C4 plants, associated with the establishment of savanna-like vegetation worldwide (Edwards et al., 2010;Graham, 2011). The marine incursions in Middle-Late Miocene inundated the lowland parts of the Chaco region (Hulka et al., 2006) and are hypothesized to have caused the local extinction of species in these lowlands (Garda & Cannatella, 2007). Pleistocene climatic fluctuations, characterized by periods of high variation in temperature and precipitation (Hewitt, 1996)  . These climatic and landscape modifications likely offered an opportunity for species to disperse and colonize new areas. However, the colonization history is still understudied for many taxonomic groups in South America.
Lizards are useful for understanding how traits vary among environments given their ectothermic physiology (Deutsch et al., 2008;Sinervo et al., 2010). Lizards depend on external sources of heat to thermoregulate and achieve homeostasis (Deutsch et al., 2008;Huey et al., 2009Huey et al., , 2012. Brazil has 276 known lizard species from 15 taxonomic families (Costa & Bérnils, 2018), where most of the targeted DOF and the surrounding forested biomes are located.
Presumably, extant species in this region are the product of adaptations that allowed their ancestors to survive in these drastically different environments.
Previous investigations have shown the impact of the environment on trait variation across the globe, such as clutch size (number of eggs laid in each reproduction event-smaller clutches in Amazonian populations than in non-Amazonian populations; larger clutches in the Caatinga than in the Cerrado populations; Garda et al., 2012;Rand, 1982;Vitt & Colli, 1994), body size (in some species it decreases with latitude and increases with temperature, and increases latitudinally in others; Ashton & Feldman, 2003;Azócar et al., 2015;Oufiero et al., 2011) and coloration (light phenotypes in dunes and dark phenotypes in dark soil habitats; Rosenblum et al., 2010). Here, we investigate: (1) the direction of the colonization events among open and forested habitats in the Brazilian DOF and their forest counterparts (i.e. Amazonia and Atlantic Forest); (2) the differences in diversification rates between open and forested habitats; and (3) how traits could have facilitated species colonization and persistence in open and forested regions. Given the putative early origin of the forested habitats, we expect more forest dispersions into younger open habitats. We also expect that traits associated with microhabitat and temperature will significantly predict colonization of forested and open habitats due to differences in vegetation cover and solar radiation. Exploring the evolution of traits in a predictive framework will lead to an improved understanding of the degree to which traits facilitate the colonization of novel habitats.

| Data collection
We used a comprehensive list of Brazilian lizard species (Costa & Bérnils, 2018) to identify candidate taxa to examine the influence of traits on the colonization of open and forested habitats. For each species we derived habitat information from our field experience with these taxa and the International Union for Conservation of Nature and Natural Resources website (IUCN-https://www.iucnr edlist.org/). Since our objective was to identify traits that enabled species to colonize and persist in the open and forested habitats, we selected species associated with one or both habitats in the Brazilian DOF, Amazonia and Atlantic Forest.
We compiled lizard traits from Meiri (2018). Traits consisted of life history, morphology and physiology characteristics, such as diet, body size and body temperature respectively (Table 1). To fill in missing traits, we mainly used the Reptile Database website (Uetz et al., 2020), the Museu Virtual do Cerrado website (http:// www.mvc.unb.br/pesqu isa/espec ies/conhe ca-as-espec ies/jag) and the study by Mesquita et al. (2017). We used traits available for more than 50 species and were considered non-redundant (e.g. clutch size was excluded, while the smallest clutch size and largest clutch size were used). The traits not used were as follows: clutch size, smallest and largest mean clutch size, breeding age (months), youngest age at first breeding (months), oldest age at first breeding (months) and mean body temperature of active animals in the wild.

| Ancestral range estimation
To identify colonization patterns and estimate the diversification time of species in forested and open habitats, we combined species habitats and genetic data to estimate the probability of ancestral ranges. We used habitat information to create an occurrence matrix according to the presence and absence of each species in the forested habitats and the open habitats. We trimmed the consensus tree of the squamate phylogeny (Tonini et al., 2016) to contain only the species selected for this analysis using the function keep.tip from the R package 'ape' v5.4 (Paradis et al., 2004). To estimate species ancestral range, we used the R package 'BioGeoBEARS' v1.1.2 (Matzke, 2013). We tested six different models: DEC, a dispersal-extinction-cladogenesis model run using a Maximum Likelihood approach; DIVALIKE, a likelihood interpretation of the parsimony DIVA that generates dispersal-vicariance models; and BAYAREALIKE, a simplified likelihood interpretation of the Bayesian BAYAREA model; each with and without the parameter to control for the probability of founder-event speciation (+J parameter; DEC+J, DIVALIKE+J, BAYAREALIKE+J). We chose the model with the lowest Akaike information criterion (AIC) and AIC corrected for small sampling size (AICc) scores as the best model.

| Identifying shifts in the diversification rate associated with species distributions
We also investigated if the environmental pressure of contrasting habitats influenced species diversification rates. We estimated state-dependent speciation and extinction rates from the phylogeny using the GeoHiSSE (Hidden Geographic State Speciation and Extinction) function (Caetano et al., 2018) of the 'hisse' v1.9.19 package (Beaulieu & O'Meara, 2016). GeoHiSSE analysis uses habitats as traits, allows shared trait states to account for generalist species and includes hidden traits that allow shifts in the diversification rates to be related to unmeasured traits, not forcing the correlation between diversification rate shifts to the traits under study (Caetano et al., 2018).
The hidden trait is a subdivision of the species that were divided into habitats. In other words, within each habitat, there is another category separating the species in two or more states. For example, species within a geographical area could be separated into categories of migrants and no migrants. While the species trait associated with the hidden trait or the biological relevance of the hidden trait is unknown, the presence of a hidden trait implies that some factor associated with this subdivision is necessary to explain the different rates in the model and allows these rates to vary within geographical areas.
We used all species that occur only in the open habitats, only in the forested habitats and in both habitats from our dataset, and the same phylogeny (Tonini et al., 2016) used in the 'BioGeoBEARS' analysis (using the consensus tree). We fitted 12 models (Caetano et al., 2018) that varied from null models (considering all rates the same for all states), full models (all parameters are free), containing area-independent or area-dependent variation of the diversification rates (we considered the areas as habitats), and separating extinction and extirpation (range reduction) rates for endemic lineages (+extirpation) ( Table 2). We then evaluated the models using the AIC and AICc. We also used the marginal reconstruction algorithm to calculate model averages and diversification rates for each species with the MarginReconGeoSSE and GetModelAveRates functions within the 'hisse' package (Beaulieu & O'Meara, 2016).

| Identifying important traits to predict species occurrence in open and forested habitats
We aimed to determine if the occurrence of species in forested or open habitats could be predicted by species traits and phylogenetic relatedness using the random forest classification method. The advantage of using the random forest algorithm is that it considers different distributions of characters (continuous and categoricalincluding binary and multiple category characters), and thus is helpful in situations where it is unclear which combination of these characters will provide the most information relevant to the response variable (the occurrence of species in forested or open habitats). The random forest approach builds multiple decision trees to evaluate whether a particular trait can predict the response (Biau, 2012;Liaw & Wiener, 2002). It returns the mean decrease accuracy (MDA) estimate, which shows the decrease in the accuracy of the prediction function after removing a specific trait from the analysis, characterizing the most important traits for the predictive function. To possibly account for traits inherited by species that might be absent from our random forest model, we incorporated phylogenetic relatedness into the models as a proxy for shared but unscored traits. We trimmed the consensus tree of the squamate phylogeny (Tonini et al., 2016) to contain only the species present in our trait dataset using the function keep.tip from the 'ape' package (Paradis et al., 2004). Then we used the Phylogenetic Eigenvector Regression method (PVR; Diniz-Filho et al., 1998) to transform the phylogenetic tree into a pairwise distance matrix and convert the distances into eigenvalues using a principal coordinate analysis (PCoA) with the functions cophenetic and pcoa, respectively, from the 'ape' package (Paradis et al., 2004). We selected the first five axes with more cumulative eigenvalues as variables in the random forest analysis.
We then used the random forest classification method through the 'randomForest' v4.6-14 package (Liaw & Wiener, 2002) to create and evaluate the models. We used the MDA estimate to measure variable importance and the out-of-bag (OOB) error rate and the classification error to evaluate the model's overall quality. In addition to the MDA estimate, we also identified the relevant variables using a wrapper algorithm implemented in the function Boruta in the package 'Boruta' v7.0.0 (Kursa & Rudnicki, 2010

| Data summary
Our dataset was composed of 244 species and represented approximately 88% of the species that occur in Brazil (Costa & Bérnils, 2018). These species represented 15 of 38 taxonomic families recognized worldwide for lizards (Uetz et al., 2020) and all the families occurring in Brazil (Costa & Bérnils, 2018

| Ancestral range and transitions estimation
The estimation of species ancestral ranges was conducted

| Geography-dependent speciation
We found a significant dependence of the species' occurrence on the diversification rates. The best-fitting model of the GeoHiSSE analysis was model 10, which included area-dependent diversification, two hidden rate classes, and extirpation and extinction rates separated for endemic lineages ( Table 2). The best model included area-dependent diversification, which emphasizes the importance of the habitats on the diversification rates (  A and B). The species from the hidden trait A (in the three habitats) presented higher turnover rates and slightly higher extinction fraction rates than those in the hidden trait B (Table S1.1).
To compare the diversification rates among the habitats while integrating the hidden traits, we averaged the models according to their AIC weights. The speciation, turnover, extinction and extinction fraction rates for the species in both habitats were higher than those in forested and open habitats. On the other hand, the net diversification rate was higher for the species that occur in forested habitats, followed by the species that occur in open habitats. We found a lower net diversification rate for species occurring in both habitats, indicating similar rates of speciation and extinction (Figure 3).

| Random forest classification
We generated five random forest models to predict the occurrence and forested habitats: 10%). All the variables were relevant in this model ( Figure 4e). Although the data are different, overall, the models without imputed data performed better than those with the imputed data.
In these models, minimum and maximum body temperature, microhabitat, female SVL and diet were the essential variables for all models according to the MDA estimations (the second figure in Appendix 1 as Figure S1.2). The phylogenetic relatedness variables were also critical classifiers for the models, indicating the importance of other lineage-specific traits not included in our dataset.
Considering the relevant traits used in the best model (with the lowest OOB error rate-model 3), species from open habitats have higher minimum and maximum mean body temperatures ( Figure 5a,b respectively) than species from forested habitats, tend to be more saxicolous and terrestrial and had a more omnivorous and herbivorous diet. In contrast, species from the forested habitats are more semi-arboreal, semi-aquatic, arboreal and fossorial (Figure 5e), and have a more carnivorous diet (Figure 5d). The female SVLs were slightly different between the habitats, in which species in the forested habitats varied more in female size, and species in the open habitats had a narrower size variation (Figure 5c).

| DISCUSS ION
We examined lizard colonization patterns among habitats, determined if particular habitats influence species diversification, and identified traits that facilitated these colonization events and species survival in their current ranges. Our results suggest that the forested habitats were the ancestral range for the lizard species investigated in this study, and we also found evidence for a more typical range transition from the forested habitats to the open habitats.
Widely distributed species had higher speciation, turnover, extinction, and extinction fraction rates than species in forested or open habitats, but had also the lower net diversification rate. Our results identified five important species traits (i.e. minimum and maximum mean body temperature, microhabitat, female SVL and diet) that could have facilitated colonization and persistence of species to their current range and highlighted the importance of phylogenetic relatedness to explain the occurrence of the species.

| Historical biogeography, range transitions and influence of the habitats on the diversification rate
The open habitats in South America are younger than the forested habitats (Azevedo et al., 2020), consistent with our results showing that the forested habitats are the ancestral ranges for most lizards investigated.

| Important traits for colonization and persistence of species in different environments
Identifying traits that explain species distributions or highlighting how they fit different environments has previously been investigated for small species groups (e.g. Colli et al., 2003;Garda et al., 2012 (Figure 4). This result highlights the role of niche conservatism in Neotropical diversification, indicating that species retain their niche preferences even after speciation and tend to occur in similar environments (Wiens & Graham, 2005). Indeed, most species within some genera or even families are restricted to a single region.
While considering transitions between habitats using only extant species and their current distributions, it is essential to consider other factors. Species may have colonized the open or forested habitats from other regions or even other continents, not transitioning between these two habitats. For example, some of the species from the family Gekkonidae colonized South America directly from the African continent (Gamble et al., 2011).

| CON CLUS IONS
As reported for other groups, our results demonstrate a predominant pattern of transitions from forested to open habitats. Mean body temperature, female SVL, microhabitat and diet were important predictors of species occurrence in open or forested environments. We emphasize that our approach should be able to identify phenotypic transitions across different regions across the globe. The machine learning approach presented here corresponds to the first step of the comparative phylogenomic approaches proposed by Smith et al. (2020). Therefore, identifying phenotypic transitions among clades in a macroevolutionary perspective represents a framework to study the genetic basis for phenotypes, which is especially important for non-model organisms where traits responsible for adaptation are rarely known. The traits identified in this study can be used in the future comparative genomic investigations to understand the genomic basis of adaptation.

ACK N OWLED G EM ENTS
The authors thank the National Science Foundation (NSF) for financial support via a grant to BCC (DEB-1831319) and FTB (DED-1831241). GRC thanks CAPES, CNPq, Fundação de Apoio à Pesquisa do Distrito Federal (FAPDF) and USAID's PEER program under cooperative agreement AID-OAA-A-11-00012 for financial support.
They thank members of the Carstens Lab for comments and suggestions on the manuscript and colleagues on the Dimensions of Biodiversity grant. No permits were necessary for this work.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The trait dataset, files and scripts used in the analyses are available via the Dryad archive (https://doi.org/10.5061/dryad.t1g1j wt3c).

B I OS K E TCH
Flávia M. Lanna is a PhD student interested in macroecology, evolutionary biology, community phylogenetics and biogeography, with particular attention to Neotropical lizards. The authors share a common interest in phylogeography, biogeography and ecology of disparate groups of organisms such as reptiles, amphibians, mammals and plants, using genetic, morphological and environmental data combined with cutting-edge analytical techniques (e.g. machine learning). In particular, this study is part of a project that aims to understand the adaptation and diversification of organisms across dry environments in Brazil.
Author contributions: B.C.C. and F.M.L. conceived the ideas; G.R.C. and F.M.L. collected the data; all authors designed the methodology; F.M.L. analysed the data and led the writing with the assistance of all the authors.

S U PP O RTI N G I N FO R M ATI O N
Additional supporting information may be found in the online version of the article at the publisher's website.