Published March 4, 2023 | Version 1.0
Dataset Open

Additional data and code for "You can move, but you can't hide: identification of mobile genetic elements with geNomad"

  • 1. DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA

Description

  • benchmark_data: Data used to train and evaluate the classification models.
  • giant_virus_data: Sequences and metadata of giant viruses identified in public metagenomes.
  • neural_network_training: Code used to train geNomad's neural network-based classification model.
  • provirus_data: Data used to train and evaluate the conditional random field model employed by geNomad to identify provirus regions.
  • reference_sequences: Sequences of chromosomes, plasmids, and viruses that were used to build geNomad's marker dataset and to generate the training data for the classification models.

Files

genomad_supplementary_data_code.zip

Files (8.6 GB)

Name Size Download all
md5:25ea62a5b626bdfd6790a36cb3b310b7
8.6 GB Preview Download