16SGOSeq: A curated bacterial and archaeal 16S rRNA Gene Oral Sequences dataset
Creators
Description
In a given species, genomes and 16S rRNA gene sequences, along with their intragenomic copy numbers, can vary greatly across environments. The gene copy numbers are crucial for technologies which estimate microbial abundances based on gene counts, such as polymerase chain reaction and high-throughput sequencing. In these, taxa with fewer genes may be underestimated, while those with more genes might be overestimated. Therefore, it is essential to have accurate gene copy number databases specific to the niche under study.
The 16S rRNA Gene Oral Sequences dataset (16SGOSeq) contains the number of 16S rRNA genes and their variants in the complete genomes of the bacterial and archaeal species present in the human oral cavity. It includes 3,192 complete genomes of oral bacteria and 191 complete genomes of oral archaea, from which the 16S rRNA gene sequences were extracted, and the sequence variants were identified. For ease of use, a provided Python script allows for filtering sequences by taxonomy and calculating data averages, such as the mean number of genes per taxonomy group.
The oral-specific dataset of prokaryotic organisms presented here and the pipeline followed for its construction can be applied by clinical microbiologists, bioinformaticians, or microbial ecologists in future microbiome research.
Files
archaea_divergence.csv
Files
(63.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:3cfcfa82919e2b1b6bd53718df48ed81
|
55.1 kB | Preview Download |
|
md5:8fe803f8cc574c03e3fb7488cc74f343
|
41.6 kB | Download |
|
md5:ca14d223b362d23e2a4a104922b43fef
|
553.6 kB | Download |
|
md5:3ccb2d205779a78654d5675e23e39e26
|
423.9 kB | Preview Download |
|
md5:27a3298a8abf9705cc60a9c6d01b069a
|
405.9 kB | Download |
|
md5:f929c8fa952a60bc34078c3c52ae54b8
|
86.8 kB | Download |
|
md5:64c290a9a861712cc04152ad2b587531
|
5.2 MB | Preview Download |
|
md5:f2cdc966dd506dfa50c1d42b860447bb
|
2.0 MB | Download |
|
md5:bc30ae05ebcc680598e62e51c9834661
|
24.8 MB | Download |
|
md5:0c4f046ecac4ea9df79d804476d11d7d
|
14.2 MB | Preview Download |
|
md5:5b205a3c8c82da20959931b748af6df4
|
13.4 MB | Download |
|
md5:3fefc8ddca52b32abef975317f460073
|
2.0 MB | Download |
|
md5:01791e5cf1d6bcb910780ff181973565
|
29.9 kB | Preview Download |
|
md5:7ea76f49190d27b2ad38423e4d9b7e47
|
30.5 kB | Preview Download |
|
md5:a6f1d0cb21a40ef5e82cd1b5e7d06879
|
514 Bytes | Download |
|
md5:b2014e4bcc1eb5258c2777ed47170ab8
|
1.8 kB | Download |
|
md5:7c8f52678b974b621fdb8c4afaa9dfbc
|
160 Bytes | Download |
|
md5:02687221b4951b243522a87ddbc69fc5
|
323 Bytes | Download |
|
md5:0934c80cd2508adadd395c44681cf885
|
160.4 kB | Download |
|
md5:583ac43704869a22da0f2205a98ee0a3
|
17.6 kB | Preview Download |
|
md5:88787d1035512632582d0d6bc0ddb74c
|
1.7 kB | Preview Download |
|
md5:b9d977a4e948b359bb9d7dbb871c59fd
|
61.5 kB | Preview Download |
|
md5:475f1cb4d830d9ea3c39696b30c7eea2
|
7.8 kB | Download |
Additional details
Funding
- Instituto de Salud Carlos III
- PI24/00222
Software
- Repository URL
- https://gitlab.citius.gal/lara.vazquez/16sgoseq
- Programming language
- Python