Published April 14, 2025 | Version v2
Dataset Open

16SGOSeq: A curated bacterial and archaeal 16S rRNA Gene Oral Sequences dataset

  • 1. Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS)
  • 2. ROR icon Universidade de Santiago de Compostela
  • 3. Universidad de Santiago de Compostela

Description

In a given species, genomes and 16S rRNA gene sequences, along with their intragenomic copy numbers, can vary greatly across environments. The gene copy numbers are crucial for technologies which estimate microbial abundances based on gene counts, such as polymerase chain reaction and high-throughput sequencing. In these, taxa with fewer genes may be underestimated, while those with more genes might be overestimated. Therefore, it is essential to have accurate gene copy number databases specific to the niche under study.

The 16S rRNA Gene Oral Sequences dataset (16SGOSeq) contains the number of 16S rRNA genes and their variants in the complete genomes of the bacterial and archaeal species present in the human oral cavity. It includes 3,192 complete genomes of oral bacteria and 191 complete genomes of oral archaea, from which the 16S rRNA gene sequences were extracted, and the sequence variants were identified. For ease of use, a provided Python script allows for filtering sequences by taxonomy and calculating data averages, such as the mean number of genes per taxonomy group.

The oral-specific dataset of prokaryotic organisms presented here and the pipeline followed for its construction can be applied by clinical microbiologists, bioinformaticians, or microbial ecologists in future microbiome research.

Files

archaea_divergence.csv

Files (63.5 MB)

Name Size Download all
md5:3cfcfa82919e2b1b6bd53718df48ed81
55.1 kB Preview Download
md5:8fe803f8cc574c03e3fb7488cc74f343
41.6 kB Download
md5:ca14d223b362d23e2a4a104922b43fef
553.6 kB Download
md5:3ccb2d205779a78654d5675e23e39e26
423.9 kB Preview Download
md5:27a3298a8abf9705cc60a9c6d01b069a
405.9 kB Download
md5:f929c8fa952a60bc34078c3c52ae54b8
86.8 kB Download
md5:64c290a9a861712cc04152ad2b587531
5.2 MB Preview Download
md5:f2cdc966dd506dfa50c1d42b860447bb
2.0 MB Download
md5:bc30ae05ebcc680598e62e51c9834661
24.8 MB Download
md5:0c4f046ecac4ea9df79d804476d11d7d
14.2 MB Preview Download
md5:5b205a3c8c82da20959931b748af6df4
13.4 MB Download
md5:3fefc8ddca52b32abef975317f460073
2.0 MB Download
md5:01791e5cf1d6bcb910780ff181973565
29.9 kB Preview Download
md5:7ea76f49190d27b2ad38423e4d9b7e47
30.5 kB Preview Download
md5:a6f1d0cb21a40ef5e82cd1b5e7d06879
514 Bytes Download
md5:b2014e4bcc1eb5258c2777ed47170ab8
1.8 kB Download
md5:7c8f52678b974b621fdb8c4afaa9dfbc
160 Bytes Download
md5:02687221b4951b243522a87ddbc69fc5
323 Bytes Download
md5:0934c80cd2508adadd395c44681cf885
160.4 kB Download
md5:583ac43704869a22da0f2205a98ee0a3
17.6 kB Preview Download
md5:88787d1035512632582d0d6bc0ddb74c
1.7 kB Preview Download
md5:b9d977a4e948b359bb9d7dbb871c59fd
61.5 kB Preview Download
md5:475f1cb4d830d9ea3c39696b30c7eea2
7.8 kB Download

Additional details

Funding

Instituto de Salud Carlos III
PI24/00222

Software

Repository URL
https://gitlab.citius.gal/lara.vazquez/16sgoseq
Programming language
Python