Published December 22, 2020 | Version 1.0
Dataset Open

Supporting data for Novel functional sequences uncovered through a bovine multi-assembly graph

  • 1. ETH Zürich
  • 2. ETH Zurich

Description

Description of the datasets

Data are organized as a folder and compressed with tar.gz.

You need to unzip the folder using the command tar -xzvf data.tar.gz. Unzipping will output a folder named data_tidy, which is organized as follow:

  • graph.gfa : Graph in GFA format constructed from 6 cattle assemblies
  • nonref.fa : Non-reference sequences extracted from the graph
  • nonref.fa.masked: Hard masked repetitive regions version of nonref.fa
  • nonref_woflanking.fa: Nonref.fa without flanking sequences
  • nonref_woflanking.fa.masked: Masked version of nonref_woflanking.fa
  • augustus_predict.gtf: Annotated gene models of Augustus from non-ref sequences
  • augustus_prot.fa: Protein fasta of the predicted gene models from Augustus
  • breeds_assembled.gtf: Annotation of the StringTie assembled across-breed transcriptome
  • breeds_expressed.tsv: Expression data of breeds_assembled.gtf
  • de_assembled.gtf: Annotation of the StringTie assembled differentially-expressed transcriptome on non-ref sequences
  • de_expression.tsv: Differential expression results from de_assembled.gtf
  • variant_nonref.tsv: Variants called from non-ref sequences (-1, 0, 1, 2 indicates no call, hom ref, het, and hom alt respectively)

Files

Files (783.1 MB)

Name Size Download all
md5:84c1ddaa260780a613b80879e879578e
783.1 MB Download