There is a newer version of the record available.

Published October 23, 2019 | Version v1
Journal article Open

annonex2embl: automatic preparation of annotated DNA sequences for bulk submissions to ENA

  • 1. Freie Universität Berlin, Berlin, Germany

Description

Motivation: The submission of annotated sequence data to public sequence databases constitutes a central pillar in biological research. The surge of novel DNA sequences awaiting database submission due to the application of next-generation sequencing has increased the need for software tools that facilitate bulk submissions. This need has yet to be met with a concurrent development of tools to automate the preparatory work preceding such submissions.

Results: I introduce annonex2embl, a Python package that automates the preparation of complete sequence flatfiles for large-scale sequence submissions to the European Nucleotide Archive. The tool enables the conversion of annotated DNA sequence alignments that are co-supplied with sequence annotations and metadata to submission-ready flatfiles. Among other features, the software automatically accounts for length differences among the input sequences while maintaining correct annotations, automatically adds metadata to each record, and displays a design suitable for easy integration into bioinformatic workflows. As proof of its utility, annonex2embl is employed in preparing a dataset of more than 1,500 fungal DNA sequences for database submission.

Notes

On PyPI: https://pypi.org/project/annonex2embl/ On GitHub: https://github.com/michaelgruenstaeudl/annonex2embl

Files

STEP1_annonex2embl-INPUT__GruenstaeudlEtAl2013__Metadata.csv