A global atlas of the dominant bacteria found in soil

A global map of soil bacteria Soil bacteria play key roles in regulating terrestrial carbon dynamics, nutrient cycles, and plant productivity. However, the natural histories and distributions of these organisms remain largely undocumented. Delgado-Baquerizo et al. provide a survey of the dominant bacterial taxa found around the world. In soil collections from six continents, they found that only 2% of bacterial taxa account for nearly half of the soil bacterial communities across the globe. These dominant taxa could be clustered into ecological groups of co-occurring bacteria that share habitat preferences. The findings will allow for a more predictive understanding of soil bacterial diversity and distribution. Science, this issue p. 320 Relatively few soil bacterial taxa dominate terrestrial ecosystems worldwide, with predictable distributions and ecology. The immense diversity of soil bacterial communities has stymied efforts to characterize individual taxa and document their global distributions. We analyzed soils from 237 locations across six continents and found that only 2% of bacterial phylotypes (~500 phylotypes) consistently accounted for almost half of the soil bacterial communities worldwide. Despite the overwhelming diversity of bacterial communities, relatively few bacterial taxa are abundant in soils globally. We clustered these dominant taxa into ecological groups to build the first global atlas of soil bacterial taxa. Our study narrows down the immense number of bacterial taxa to a “most wanted” list that will be fruitful targets for genomic and cultivation-based efforts aimed at improving our understanding of soil microbes and their contributions to ecosystem functioning.

The immense diversity of soil bacterial communities has stymied efforts to characterize individual taxa and document their global distributions. We analyzed soils from 237 locations across six continents and found that only 2% of bacterial phylotypes (~500 phylotypes) consistently accounted for almost half of the soil bacterial communities worldwide. Despite the overwhelming diversity of bacterial communities, relatively few bacterial taxa are abundant in soils globally. We clustered these dominant taxa into ecological groups to build the first global atlas of soil bacterial taxa. Our study narrows down the immense number of bacterial taxa to a "most wanted" list that will be fruitful targets for genomic and cultivation-based efforts aimed at improving our understanding of soil microbes and their contributions to ecosystem functioning.
A lthough soil bacteria have been studied for more than a century, most of the diversity of soil bacteria remains undescribed. This is unsurprising given that soil bacteria rank among the most abundant and diverse group of organisms on Earth (1-4), challenging our capacity to understand their specific contributions to ecosystem processes, including nutrient and carbon cycling, plant production, and greenhouse gas emissions (1)(2)(3). Put simply, characterizing the ecological attributes (environmental preferences and functional traits) of the thousands of bacterial taxa found in soil is unfeasible. Most soil bacteria do not match those found in preexisting 16S ribosomal RNA (rRNA) gene databases (5), we have genomic information for relatively few of them (5)(6)(7), and the majority of soil bacteria have not been successfully cultivated in vitro (6,7). For these reasons, we lack a predictive understanding of the ecological attributes of most soil individual bacterial taxa, with their environmental preferences, traits, and metabolic capabilities remaining largely unknown.
Previous work has shown that only a small fraction of soil bacteria is typically shared between any pair of unique soil samples (4,8,9). However, we also know that, as with most "macrobial" communities (10), not all bacterial taxa are equally abundant in soil. There are often subsets of soil bacterial taxa that are far more abundant than others. For example, the genus Bradyrhizobium has been found to be dominant in forest soils from North America (11). Similarly, a lineage within the class Spartobacteria was found to be highly abundant in undisturbed grassland soils (12). Perhaps more important, many individual taxa that are highly abundant in individual soil samples may also be abundant across distinct soil samples, even when those soil samples are from sites located far apart (e.g., Candidatus Udaeobacter copiosus) (13). Therefore, a critical and logical next step to advance our understanding of soil bacterial communities is to identify the dominant bacterial phylotypes that are abundant and ubiquitous across soils, and determine their ecological attributes.
From the large body of literature using marker gene sequencing to characterize soil bacterial communities, we know which major phyla tend to be more abundant in soil (14) and we have a growing understanding of how various factors, including soil properties (e.g., pH) (15), climate (9,16), vegetation type (17), and nutrient availability (18), structure the composition of soil bacterial communities worldwide. What is currently missing is a detailed ecological understanding of common soil bacterial species, which we refer to as phylotypes (as bacterial species definitions can be problematic) (19). Understanding the ecological attributes of dominant phylotypes will increase our ability to successfully cultivate them in vitro and allow us to build a more predictive understanding of how soil bacterial communities vary across space, time, and in response to anthropogenic changes. For example, if we could identify those dominant phylotypes with strong preferences for a given set of environmental conditions (e.g., low or high pH), we could then use this information to predict their distributions and enrich for these dominant phylotypes in vitro. Ultimately, a better understanding of dominant soil bacterial taxa will improve our ability to actively manage soil bacterial communities to promote their functional capabilities.
Here we conducted a global analysis of the bacterial communities found in surface soils from 237 locations across six continents and 18 countries (fig. S1) to (i) identify the most dominant (i.e., most abundant and ubiquitous) soil bacterial phylotypes worldwide; (ii) determine which of these dominant phylotypes tend to co-occur and share similar environmental preferences; (iii) map the abundances of these ecological clusters of dominant soil bacteria across the globe; and (iv) assess the genomic attributes that differentiate phylotypes with distinct environmental preferences. The soils included in this study were selected to span a wide range of vegetation types, edaphic characteristics, and bioclimatic regions (arid, temperate, tropical, continental and polar) (20).
We first identified the most dominant bacterial phylotypes by 16S rRNA gene amplicon sequencing (20). Dominant phylotypes (taxa that share ≥97% sequence similarity across the amplified 16S rRNA gene region) include those that are highly abundant (top 10% most common phylotypes sorted by their percentage of 16S rRNA reads) (21) and ubiquitous (found in more than half of the 237 soil samples evaluated) (20). Not surprisingly, our global data set comprised bacterial communities that were highly variable with respect to their diversity and overall composition ( fig. S2). For example, observed phylotype richness ranged from 774 to 2869 phylotypes per sample, and there was a large amount of variability in the relative abundances of major phyla across the studied sites ( fig. S2). Also, as expected, only a small fraction of phylotypes was found to be shared across soil samples, and most phylotypes were relatively rare ( fig. S3). Based on our criteria, only 2% of the bacterial phylotypes (511 out of 25,224 phylotypes) were dominant ( Fig. 1A and table S1). However, this small number of phylotypes accounted for, on average, 41% of 16S rRNA gene sequences across all samples (Fig. 1A), although they collectively accounted for more than half of the bacterial communities in some environments (e.g., forests from arid environments; Fig. 1B). In other words, most soil bacterial phylotypes are rare and relatively few are abundant, but many of these are found across a wide range of soils.
Notably, 85% of the dominant phylotypes identified from our data set were also found to be dominant in the bacterial communities recovered from 123 global soils that were analyzed using a shotgun metagenomic approach (20) (table S1). This cross-validation indicates that our list of dominant phylotypes is not biased by polymerase chain reaction amplification or by our choice of primers, as most of the identified dominant phylotypes were shared between two independent sets of soils analyzed using two different approaches (amplicon versus shotgun metagenomic sequencing). In addition, we compared the results from our sample set with those soils analyzed via amplicon sequencing as part of the Earth Microbiome Project (EMP) (22). The majority of the dominant phylotypes in the EMP data set (80%)-identified using the same criteria explained above-were included within our list of dominant taxa (>97% similarity) (20). Also, the top 511 phylotypes, comparable to our top 511 dominant taxa, accounted for 0.5% of all bacterial phylotypes and 41% of all 16S rRNA gene reads in the EMP data set. Despite important methodological differences between the two data sets (20), this concordance between the results from EMP and our study reinforces our conclusion that a relatively small subset of bacterial phylotypes dominate soils across the globe.
On average, the dominant bacterial phylotypes identified from our data set were highly abundant in soils across multiple continents, eco-system types, and bioclimatic regions (Fig. 1B). The only exception was soil from tropical forests, where the dominant phylotypes accounted for only~20% of 16S rRNA gene sequences, which is likely a product of soils from tropical forests being under-represented in our database and/or tropical forest bacterial communities being very distinct from those found in other ecosystem types ( fig. S4). Together, our results suggest that soil bacterial communities, like plant communities (10), are typically dominated by a relatively small subset of phylotypes. As such, we focus all downstream analyses on the 511 phylotypes found to be the most abundant and ubiquitous in soils from across the globe.
The identified dominant phylotypes accurately predicted overall patterns in b-diversity for the "subdominant" component of the bacterial communities surveyed (98% of phylotypes; figs. S2 and S5 and Fig. 1C). That is, patterns in the distribution of the dominant bacterial phylotypes across the globe closely mirrored those observed for the remaining 98% of bacterial phylotypes. The most abundant and ubiquitous of these 511 phylotypes included Alphaproteobacteria (Bradyrhizobium sp., Sphingomonas sp., Rhodoplanes sp., Devosia sp., and Kaistobacter sp.), Betaproteobacteria (Methylibium sp. and Ramlibacter sp.), Actinobacteria (Streptomyces sp., Salinibacterium sp., and Mycobacterium sp.), Acidobacteria (Candidatus Solibacter sp. and order iii1-15), and Planctomycetes (order WD2101) (see table S1 for a complete list). Notably, less than 18% of the 511 phylotypes that we identified had a match to an available reference genome at the >97% 16S rRNA sequence similarity level, the level commonly used for delineating different bacterial species (23) (Fig. 2 and table S1). Approximately 42% of the dominant 511 phylotypes had no genome match even at the >90% 16S rRNA sequence similarity level, indicating that we do not have genomic information for taxa even within the same genus or family ( Fig. 2A and table S1). Further, only 45% of the identified 511 dominant phylotypes are related to cultivated isolates and <30% of the phylotypes have representative type strains at the >97% sequence similarity level ( Fig. 2B and table S1), which emphasizes the limited amount RESEARCH | REPORT of phenotypic information that we have available for these dominant phylotypes. Not surprisingly, phylotypes closely related to previously cultivated taxa tended to come from a few wellstudied taxonomic groups, mostly Proteobacteria and Actinobacteria, with only a few representatives available from other phyla (Figs. 1C and 2B and table S1), highlighting the well-known taxonomic biases of many preexisting culture collections (6).
After identifying the dominant 511 phylotypes, we used random forest modeling (24) to identify habitat preferences for each phylotype (20). Our statistical models included 15 environmental factors: climate (aridity index, minimum and maximum temperature, precipitation seasonality, and mean diurnal temperature range), ultraviolet (UV) radiation, net primary productivity, soil abiotic properties (soil texture; pH; total C, N, and P concentrations; and C:N ratio), and dominant ecosystem type (forests and grasslands) (20). We found that 53% (270) of the dominant 511 phylotypes had predictable habitat preferences [models explaining >30% of the variation; see (20) and table S1], with soil pH, climatic factors (aridity index, maximum temperature, and precipitation seasonality), and plant productivity consistently being the best predictors of their abundances across the globe (fig. S6). These findings are in line with previous research demonstrating that climatic factors and soil pH are often highly correlated with observed differences in overall soil bacterial community composition (4,8,9,15,16), but additionally, we found a strong link between microbial community composition and plant productivity ( fig. S7). We were unable to identify a strong ecological preference for the remaining 241 of the 511 phylotypes, which included representatives from a wide range of phyla and subphyla ( fig. S8). Our inability to predict the distributions of these 241 phylotypes could be related to the absence of key, but hard to measure, environmental predictors (e.g., soil C availability) or the fact that our models did not take into account specific associations between the bacteria and plants, fungi, or animals (e.g., pathogenhost or predator-prey interactions), which may be driving their distribution patterns. Alternatively, we may not have been able to identify the habitat preferences of these phylotypes because of low variability in their abundances across the samples (figs. S9 and S10). Indeed, the relative abundance of the group including all 241 undetermined phylotypes showed a much lower coefficient of variation than the relative abundance of those phylotypes for which we could identify their habitat preferences, as explained below ( fig. S9). This result suggests that the undetermined phylotypes, those with no clearly identifiable habitat preferences, represent a "core" group of dominant phylotypes that are ubiquitous across global soils with proportional abundances that are relatively invariant.
We then used semipartial correlations (Spearman) and clustering analyses (20) to identify groups of phylotypes with shared habitat preferences, restricting our analyses to those 270 phylotypes with predictable distribution patterns. We found that the phylotypes group into five reasonably well-defined ecological clusters sharing environmental preferences for (i) high pH; (ii) low pH; (iii) drylands; (iv) low plant productivity; and (v) dry-forest environments (Figs. 2B and 3A, fig.  S11, and table S1). These five clusters of phylotypes included 200 out of the 270 phylotypes for which we could identify their habitat preferences (table S1)  For the few phylotypes where taxonomic assignment did not correspond to tree topology, no manual corrections were made. Betaproteo., Betaproteobacteria; Alphaproteo., Alphaproteobacteria; Deltaproteo., Deltaproteobacteria; Plancto., Planctomycetes; Firmic., Firmicutes.

RESEARCH | REPORT
identified included phylotypes from multiple phyla, suggesting that habitat preferences are not linked to phylogeny at coarse levels of resolution ( fig. S8). The remaining 70 phylotypes were classified into three minor clusters, including a small cluster consisting of six phylotypes (high pH-forest preference; table S1 and fig. S11) and two clusters that included phylotypes with preferences including warm-forests, sites with low seasonal variation in precipitation, mesic environments, and soils of low phosphorus content (table S1 and fig. S11). These results suggest that the dominant bacterial phylotypes can be clustered into predictable ecological groups that share similar habitat preferences. To cross-validate the ecological clusters, we used correlation network analyses (20,25) to investigate whether bacterial phylotypes sharing similar habitat and environmental preferences tend to co-occur (Fig. 3B). Indeed, our network analyses indicated that bacterial phylotypes sharing a particular habitat preference (e.g., low pH) tend to co-occur with other phylotypes belonging to the same cluster more than we would expect by chance (P < 0.001 for all clusters; Fig. 3B and  fig. S12). We next sought to determine if we could identify genomic attributes that delineate bacteria as-signed to the individual ecological clusters. These analyses were restricted to the relatively small subset of bacterial phylotypes for which genomic data were available (>97% 16S rRNA sequence similarity to a reference genome). An insufficient number of representative unique genomes were available from phylotypes in four of the five major clusters identified (fig. S13). However, we had genomic data for 10 unique genomes out of 25 phylotypes assigned to the "drylands" cluster, including representatives of the Proteobacteria and Actinobacteria phyla ( fig. S13). We then identified functional genes that were overrepresented in this "drylands" cluster as compared to the genomes available for the other dominant taxa. A total of 72 genomes were included in this analysis, with 10 of these genomes belonging to the dryland cluster (20). We found that the genomes within this dryland cluster had significantly higher relative abundances of 18 genes (fig. S14) compared to genomes representative of phylotypes assigned to other ecological clusters. Notably, Mnh and Mrp genes, which encode membrane transport proteins responsible for the protonmediated efflux of monovalent cations (e.g., Na + , K+), were overrepresented in the "drylands" cluster ( fig. S14). These genes have frequently been linked to increased bacterial tolerance to alkaline or saline conditions and, more generally, a greater capacity to tolerate external changes in the osmotic environment (26). These adaptations are likely to be important for bacteria living in arid soils, which are often saline, have high pH values, and experience prolonged periods of low moisture availability (27). Given the low number of reference genomes available, these findings are not conclusive and are simply a "proof of concept." Nevertheless, our results highlight that it is possible to identify genomic attributes that differentiate soil bacteria with distinct environmental preferences. They also emphasize the importance of acquiring new genomes to further understand the ecological attributes of dominant soil bacterial taxa. As such, our results pave the way for leveraging genomic data to predict the spatial distributions of soil bacterial taxa, efforts that will be improved as the collections of reference genomes from these microorganisms increase in size.
Together, our results suggest that there are predictable clusters of co-occurring dominant bacterial phylotypes in soils from across the globe. This finding indicates that commonly available environmental information could be diagram with nodes (bacterial phylotypes) colored by each of the five major ecological clusters that were identified, highlighting that the phylotypes within each ecological cluster tend to co-occur more than expected by chance (statistical analyses presented in fig. S12).

RESEARCH | REPORT
used to build predictive maps of the global distributions of these bacterial clusters at a global scale. We did so for the four major ecological clusters (i.e., low pH, high pH, drylands, and low productivity, Fig. 4) (20) using the predictionoriented regression model Cubist (28) and information on 12 environmental variables for which we could acquire globally distributed information (20). Our models confirm that pH, aridity levels, and net primary productivity are major drivers of the low-pH, high-pH, dryland, and low-productivity clusters observed, respectively (Appendix S1). Notably, our maps (which accounted for 36 to 64% of the spatial variation in these clusters, Fig. 4) provide estimates of the regions where we would expect the groups of dominant soil bacterial phylotypes to be most abundant (Fig. 4). As expected, the dryland and low-productivity clusters were relatively abundant in dryland and low-productivity regions across the globe, and the low-and high-pH clusters were particularly abundant in areas known for their low-or high-pH soils, respectively. This global inventory of dominant soil bacterial phylotypes represents a small subset of phylotypes that account for almost half of the 16S rRNA sequences recovered from soils. We show that we can predict the environmental preferences for more than half of these dominant phylotypes, making it possible to predict how future envi-ronmental change will affect the spatial distribution of these taxa. Following Grime's mass ratio hypothesis (10), we would expect that identifying the physiological attributes of these dominant taxa will be critical for improving our understanding of the microbial controls on some key soil processes, including those that regulate soil C and nutrient cycling (1)(2)(3)29). Also, given the strong links between the distribution of bacterial phylotypes and their functional attributes across the globe (8,12), and the observed associations between dominant and subdominant phylotypes ( fig. S5), we expect that these dominant bacteria will be critical drivers, or indicators, of key soil processes worldwide. We also found that habitat preferences were not predictable from phylum-level identity alone, given that all of the ecological clusters included phylotypes from multiple phyla. This suggests that phylotypes from diverse taxa share some phenotypic traits (e.g., osmoregulatory capabilities) or life-history strategies (29,30) that allow them to survive under particular environmental conditions. By narrowing down the number of phylotypes to be targeted in future studies from tens of thousands to a few hundred, our study paves the way for a more predictive understanding of soil bacterial communities, which is critical for accurately forecasting the ecological consequences of ongoing global environmental change. (A to D) Predicted global distribution of the relative abundances of the four major ecological clusters of bacterial phylotypes sharing habitat preferences for high pH, low pH, drylands, and low plant productivity. R 2 (percentage of variation explained by the models) as follows: (i) high-pH cluster, R 2 = 0.53, P < 0.001; (ii) low-pH cluster, R 2 = 0.36, P < 0.001; (iii) drylands cluster, R 2 = 0.64, P < 0.001; and (iv) lowproductivity cluster, R 2 = 0.40, P < 0.001. The scale bar represents the standardized abundance (z-score) of each ecological cluster. An independent cross-validation for these maps is available in (20).