Published July 21, 2025 | Version v1
Workflow Open

Genome-resolved metagenomics reveals microbiome diversity across 48 tick species

Authors/Creators

Description

Ticks are arthropod vectors capable of transmitting a wide spectrum of pathogens affecting humans and animals. However, we have relatively limited information of their genomic characteristics and the diversity of associated microbiomes. Here, we employed long- and short-read sequencing on 1,479 samples from 48 tick species across eight genera from China to determine their genome and associated pathogens and microbiome. Through de novo assembly, we reconstructed 7,783 bacterial genomes, representing 1,373 bacterial species of which, 712 genomes represented 32 potentially pathogenic species. Computational analysis found nutritional endosymbionts to be prevalent and highly specific to tick genera. The microbiome genome-wide association study (mGWAS) revealed host genetic variants linked to pathogen diversity, abundance, and key biological pathways essential to tick biology, including blood-feeding and pathogen invasion. These findings provide a resource for studying the host-microbe interactions within ticks, paving the way for strategies to control tick populations and tick-borne diseases.

Protocols readme:

1.Genome assembly and genome annotation for Nanopore sequencing data.bash
We processed Nanopore sequencing data using Porechop, NanoFilt, and Filtlong for adapter trimming and quality filtering, followed by assembly with Flye and polishing with Racon. Illumina reads were used for additional polishing via Pilon and NextPolish, and assemblies were merged using QuickMerge. Bacterial contigs were removed using Kraken2, and assembly quality was assessed with QUAST and BUSCO. Gene annotation combined transcriptome-based (Trinity, PASA), ab initio (Augustus), and homology-based (TBLASTN, GenomeThreader) predictions, integrated into a final gene set using EvidenceModeler.

2.Phylogenetic analysis and divergence time estimation for tick genome.bash
Single-copy orthologous proteins from nine arthropod species were identified using OrthoFinder, yielding 3,150 core proteins. Protein sequences from Nanopore-based genome assemblies were mapped to these cores, aligned with MUSCLE, and conserved regions were selected using Gblocks. A phylogenetic tree was constructed with IQ-TREE, and divergence times were estimated using r8s.

3.Assembly and phylogenetic tree construction for tick mitochondrial genome.bash
Raw Illumina reads were filtered with AfterQC, and mitochondrial genomes were assembled using MitoZ. For each tick species, up to 50 samples were used, and the longest assembled mitochondrial genome per species was selected. Twenty-three mitochondrial genes were aligned using MUSCLE, conserved regions were extracted with Gblocks, and phylogenetic trees were built with IQ-TREE. Mitochondrial dissimilarity was derived from the tree, and geographic distances between samples were calculated using a spherical distance formula based on latitude and longitude.

4.Genome assembly and binning for metagenomic sequencing data.bash
Quality-controlled Illumina reads were aligned to tick genomes using Bowtie2, and host-mapped pairs were removed with Samtools. Samples with over 10 Gbp of remaining data were assembled using MEGAHIT, while those with less were assembled using SPAdes with the --meta option. Binning was performed with MetaBAT2, and genome quality was assessed using CheckM, retaining MAGs with ≥50% completeness and <10% contamination. Gene prediction was conducted with Prokka.

5.MAGs clustering and phylogenetic tree construction.bash
Average Nucleotide Identity (ANI) was calculated using fastANI, and MAGs were clustered using Louvain in Python. Genomes were evaluated based on completeness, contamination, contig number, circularity, presence of 16S gene, and tRNA count, with the highest-quality genome selected as the representative. Phylogenetic trees were constructed using PhyloPhlAn.

6.Taxonomically profiling and Ecotype analysis.bash
After host sequences were removed, microbial classification was performed with Kraken2, and samples were grouped into five ecotypes based on genus-level microbial profiles using hierarchical clustering and the Elbow Method. Representative samples were identified via isometric mapping. Diversity was assessed using the Shannon index, and dimensionality reduction was done with phateR. Bray-Curtis distances were calculated to evaluate the effects of tick species, hosts, habitats, and environmental factors on microbiota, with statistical testing conducted using PERMANOVA and the A3 package to identify significant influences (p < 0.05).

7.Potentially pathogenic bacteria identification and species classification.bash:
PhyloPhlAn was used to construct a microbial phylogenetic tree, and Kraken2 provided taxonomy annotation. Genomes identified as Rickettsia, Anaplasma, Ehrlichia, Coxiella, Francisella, Borrelia, Candidatus Midichloria, and Arsenophonus were classified as tick-associated bacteria. Species classification was based on CGASI, with ≥96.8% indicating the same species. Orthologous genes were identified using OrthoFinder, revealing more unique clusters in Anaplasma and Ehrlichia than in Rickettsia. Functional annotation was done with emapper (eggNOG), and virulence and vitamin biosynthesis proteins were detected using DIAMOND against VFDB and NCBI.

8.SNP identification.bash:
Due to their large sample sizes and research relevance, samples from Hae. longicornis, R. microplus, and D. silvarum were used for mGWAS. Reference genomes and GFF files were obtained from NCBI, and exonic sequences were extracted with BEDTools and de-redunded using SeqKit. Quality-controlled Illumina reads were aligned to the exons using Bowtie2, and variants were called using GATK4, with duplicate marking, summary metrics, and SNP filtering performed. VCF files were further filtered with VCFtools and converted to HapMap format using TASSEL.

 

Files