Supplementary files for bioRxiv preprint "Public human microbiome data dominated by highly developed countries," by Abdill et al. 2021. * supp_table01.csv - Supplementary table 1. Samples per tag. * tag: The name of a single tag that appears on at least one BioSample entry. * samples: The number of human microbiome samples with a value for the tag. * coverage: The fraction of human microbiome samples with a value for the tag. * supp_table02.csv - Supplementary table 2. Country-level data. * alpha2: The country's two-letter country code, as defined in ISO 3166-1. * country: The name of the country. * region: The country's region, of those defined in the United Nations Sustainable Development Goals. * samples: The total human microbiome samples attributed to the country. * LDC: Whether the country is classified as a United Nations "least developed country." * population: The estimated 2020 population of the country, in thousands. * perc_sample: The proportion of all world samples from this country. * perc_population: The proportion of world population in this country. * unscaled_diff: A ratio calculated using perc_sample and perc_population, as described in the methods section. * scaled_diff: The values in unscaled_diff, with positive values scaled to stretch from 0 to 100, and negative numbers to stretch from 0 to -100. (See Methods.) * supp_table03.csv - Supplementary Table 3. Samples by NCBI taxon. * code: The NCBI identifier for a single taxon. * taxname: The name of the taxon. * count: The total number of human microbiome samples classified within each taxon. * country_counts.csv - Sample counts by country. * code: The ISO-3166-1 alpha-2 code of a country * samples: The total human microbiome samples associated with that country * region_years.csv - Samples per region per year. * region - The name of a geographic region * year - A single year in which samples were released * samples - The count of human microbiome samples released in that year that were associated with a country or territory within the region. * running total - The cumulative total samples associated with the region, ending with the specified year. * samples.csv - A list of all samples evaluated in the study. * srs: The unique ID assigned to this BioSample. * project: The ID of the BioProject in which this BioSample is filed. * host: The inferred host from which the sample was taken. (See Methods.) * srr: The ID of one of the sequencing runs associated with this BioSample. * library_strategy: Mirrors an attribute retrieved from NCBI regarding the sequencing run. * library_source: Mirrors an attribute retrieved from NCBI regarding the sequencing run. * taxon: The ID of the NCBI taxon in which the BioSample is classified. * pubdate: The date on which this sample was released. * geo_loc_name: The value of the "geo_loc_name" tag associated with this BioSample. * acceptable_hosts.csv - A list of all "host" values observed in BioSample entries that were manually flagged as indicating the sample was from a human. * figures.md - R code used to generate the figures in the manuscript, plus the SQL queries used to generate the data files used in the figures. * biosample_data.zip - An archive containing a directory of XML files as they were exported from the BioSample website. Each file contains the search results for a single NCBI taxon; the file name indicates the taxon ID. * code.zip - An archive containing the Python 3 scripts used to query the NCBI APIs for information related to the BioSample entries defined in the files in the biosample_data directory.