Published December 22, 2020
| Version 1.0
Dataset
Open
Supporting data for Novel functional sequences uncovered through a bovine multi-assembly graph
Authors/Creators
- 1. ETH Zürich
- 2. ETH Zurich
Description
Description of the datasets
Data are organized as a folder and compressed with tar.gz.
You need to unzip the folder using the command tar -xzvf data.tar.gz. Unzipping will output a folder named data_tidy, which is organized as follow:
- graph.gfa : Graph in GFA format constructed from 6 cattle assemblies
- nonref.fa : Non-reference sequences extracted from the graph
- nonref.fa.masked: Hard masked repetitive regions version of nonref.fa
- nonref_woflanking.fa: Nonref.fa without flanking sequences
- nonref_woflanking.fa.masked: Masked version of nonref_woflanking.fa
- augustus_predict.gtf: Annotated gene models of Augustus from non-ref sequences
- augustus_prot.fa: Protein fasta of the predicted gene models from Augustus
- breeds_assembled.gtf: Annotation of the StringTie assembled across-breed transcriptome
- breeds_expressed.tsv: Expression data of breeds_assembled.gtf
- de_assembled.gtf: Annotation of the StringTie assembled differentially-expressed transcriptome on non-ref sequences
- de_expression.tsv: Differential expression results from de_assembled.gtf
- variant_nonref.tsv: Variants called from non-ref sequences (-1, 0, 1, 2 indicates no call, hom ref, het, and hom alt respectively)
Files
Files
(783.1 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:84c1ddaa260780a613b80879e879578e
|
783.1 MB | Download |