annonex2embl: automatic preparation of annotated DNA sequences for bulk submissions to ENA
Description
Motivation: The submission of annotated sequence data to public sequence databases constitutes a central pillar in biological research. The surge of novel DNA sequences awaiting database submission due to the application of next-generation sequencing has increased the need for software tools that facilitate bulk submissions. This need has yet to be met with a concurrent development of tools to automate the preparatory work preceding such submissions.
Results: I introduce annonex2embl, a Python package that automates the preparation of complete sequence flatfiles for large-scale sequence submissions to the European Nucleotide Archive. The tool enables the conversion of annotated DNA sequence alignments that are co-supplied with sequence annotations and metadata to submission-ready flatfiles. Among other features, the software automatically accounts for length differences among the input sequences while maintaining correct annotations, automatically adds metadata to each record, and displays a design suitable for easy integration into bioinformatic workflows. As proof of its utility, annonex2embl is employed in preparing a dataset of more than 1,500 fungal DNA sequences for database submission.
Notes
Files
STEP1_annonex2embl-INPUT__GruenstaeudlEtAl2013__Metadata.csv
Files
(7.3 MB)
Name | Size | Download all |
---|---|---|
md5:dd64a2f480c9f3ff98e6da176546989b
|
405.9 kB | Preview Download |
md5:7e0b9a3656883732d504d158dd0dc43f
|
1.8 MB | Download |
md5:b828677e3968b21c427cf1617cd13c6d
|
2.4 MB | Download |
md5:fcba233005fa727f8755621c72894696
|
133 Bytes | Download |
md5:1153d1018d3c2b731320ae62b03f2b13
|
148 Bytes | Download |
md5:47b929baea33b9ecb88ff54dfff572ca
|
491 Bytes | Download |
md5:1be9cee27df517c0f538adca59ce29f4
|
2.7 MB | Download |