Published May 18, 2019 | Version 1.0.0
Software Open

CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data

  • 1. The University of Sydney
  • 2. Technical University of Denmark
  • 3. The Peter Doherty Institute for Infection and Immunity


CCMetagen is a software to identify taxa from metagenome data. This repository contains CCMetagen version 1.0.0, which was benchmarked with other software in the original CCMetagen publication.

High-throughput sequencing of DNA and RNA from environmental and host-associated samples (metagenomics and metatranscriptomics) is a powerful tool to assess which organisms are present in a sample. Taxonomic identification software usually align individual short sequence reads to a reference database, sometimes containing taxa with complete genomes only. This is a challenging task given that different species can share identical sequence regions and complete genome sequences are only available for a fraction of organisms. A recently developed approach to map sequence reads to reference databases involves weighing all high scoring read-mappings to the data base as a whole to produce better-informed alignments. We used this novel concept in read mapping to develop a highly accurate metagenomic classification pipeline named CCMetagen. Our pipeline substantially outperforms other commonly used software in identifying bacteria and fungi, and can efficiently use the entire NCBI nucleotide collection as a reference to detect species with incomplete genome data from all biological kingdoms. CCMetagen is user-friendly and the results can be easily integrated into microbial community analysis software for streamlined and automated microbiome studies.


Files (645.4 kB)

Name Size Download all
645.4 kB Preview Download

Additional details

Related works

Is documented by
Preprint: 10.1101/641332 (DOI)