Assessing the many aspects of protein-coding gene annotation quality with OMArk

Yannis Nevers; Victor Rossier; Christophe Dessimoz

doi:10.5281/zenodo.6462027

Published April 14, 2022 | Version v1

Dataset Open

Assessing the many aspects of protein-coding gene annotation quality with OMArk

1. Université de Lausanne

Dataset associated to the OMArk paper.

Contain four archives:

SuppTables

The Supplementary Table files referred to in the paper

OMAmerDB:

The OMAmer database constructed using the whole dataset of the OMA database (December 2021 Release) and used in the paper. An OMAmer database is necessary to run OMArk.

Simulation:
Proteomes with artificially introduced errors, contaminants or depleted completeness, used to assess OMArk's performance. The archive contains the generated proteomes (Simulated_Data*) and their OMArk quality assessments (OMArk_Results). They also contains the OMAmer results (OMAmer_Placements) that were used to run OMArk and BUSCO completeness assessments (BUSCO_Results).

*Note that for storage efficiency, only the non-redundant part of the data (added errors, added contamination, random fraction of proteomes) are stored there. The full modified proteome can be regenerated from these data and the source proteomes.

Reference Proteomes:

The UniProt Reference Proteomes (Proteomes) (2021_04) and their proteome quality assesment results according to OMArk. The archive also contains the OMAmer results (OMAmerResults) that were used to run OMArk (OMArk_Results), and BUSCO completeness assesments (BUSCO_Results).

Files

Files (34.2 GB)

Name	Size	Download all
OMAmerDB.tar.gz md5:f0415f2729554401b2a3f6ccc39c9d7d	10.3 GB	Download
Reference_Proteomes.tar.gz md5:ec8033ab23994b56eba519caa70951e1	19.6 GB	Download
Simulations.tar.gz md5:05f140d2683140b38936b9219a73d5fa	4.3 GB	Download
SuppTable.tar.gz md5:a78c3840ca45f897d56e5170074eac78	86.5 kB	Download

	All versions	This version
Views	1,669	298
Downloads	1,029	138
Data volume	5.8 TB	1.3 TB

Assessing the many aspects of protein-coding gene annotation quality with OMArk

Authors/Creators

Description

Files

Files (34.2 GB)