There is a newer version of the record available.

Published April 14, 2022 | Version v2
Dataset Open

Multifaceted quality assessment of gene repertoire annotation with OMArk

  • 1. Université de Lausanne
  • 2. Swiss Institute of BIoinformatics
  • 3. Université de Lausanne, Swiss Institute of BIoinformatics

Description

Dataset associated to the OMArk paper.

Contain five archives:

Supplementary_Tables

The Supplementary Table files referred to in the paper

OMAmerDB:

The OMAmer database constructed using the whole dataset of the OMA database (December 2021 Release) and used in the paper. An OMAmer database is necessary to run OMArk.

Simulation:
Proteomes with artificially introduced errors, contaminants or depleted completeness, used to assess OMArk's performance. The archive contains the generated proteomes (Simulated_Data) and their OMArk quality assessments (omark). They also contains the OMAmer results (OMAmerResults) that were used to run OMArk and BUSCO completeness assessments (BUSCO).

*Note that for storage efficiency, only the non-redundant part of the data (added errors, added contamination, random fraction of proteomes) are stored there. The full modified proteome can be regenerated from these data and the source proteomes.

Reference Proteomes:

The UniProt Reference Proteomes (Proteomes) (2021_04) and their proteome quality assesment results according to OMArk. The archive contains the source proteome FASTA (Source folder),  OMAmer results for these proteomes (omamer folder) , OMArk results (omark folder), and BUSCO completeness assesments (BUSCO folder). It also contains a subfolder that contains part of the Contamination detection experiment (Contamination folder).

Ensembl_Metazoa_AssemblyChange.

Contains Ensembl Metazoa proteomes with version change between version 52 and 54 as well as their quality assesment resuls for both version. The archive contains the source proteomes FASTA (Source folder), a Splice file that group together all proteins coded by the same gene (Splice folder), omamer results for the proteomes (omamer folder) and the omark results (omark folder)

Notebooks


Jupyter Notebooks that were used to perform the analysis described in the paper

 

Files

Files (25.8 GB)

Name Size Download all
md5:f2ba3979b325a5f3e20168a2aedcb7a8
341.6 MB Download
md5:cb33d161b0a374ff9f643165f9934011
5.3 MB Download
md5:6f77c79bfc82d6af54f8a3d1217d81ac
10.3 GB Download
md5:b54a5770711449fa608f6fc440c4f4f9
9.7 GB Download
md5:ed13636a155c0077799c5423a418b052
5.4 GB Download
md5:1ea9a0d6e0e979c37f7b62fe079eef29
188.7 kB Download