Published August 10, 2025
| Version v4
Dataset
Open
Manually Curated Library of Transposable Elements (TEs) and TE Annotations for Drosophila amaguana
Authors/Creators
- 1. Facultad de Hábitat, Infraestructura e Innovación, Pontificia Universidad Católical del Ecuador, Quito, Ecuador
- 2. Laboratorio de Genética Evolutiva, Facultad de Ciencias Exactas, Naturales y Ambientales, Pontificia Universidad Católica del Ecuador, Quito, Ecuador
- 3. Department of Computer Sciences, Universidad Autónoma de Manizales, Manizales, Colombia
- 4. Centre for Research in Agricultural Genomics, CRAG (CSIC-IRTA-UAB-UB), Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
- 5. Institut de Recherche pour le Développement, IRD, CIRAD, Université de Montpellier, Montpellier, France
- 6. Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Colombia
Description
This data collection provides a manually curated library of transposable elements (TEs) for Drosophila amaguana, including consensus sequences, genome-wide TE annotations, and individual TE copy sequences. The library was built using de novo generated by EDTA (Extensive de novo TE Annotator) and RepeatModeler, curated with MCHelper, and further processed for genome annotation using RepeatMasker and OneCodeToFindThemAll.
Below is a description of the included files:
Dama_curated_TE_library.fasta: Contains 737 consensus TE sequences manually curated for D. amaguana. Sequence identifiers include classification and origin (e.g., new families or similarity to known elements). The identifier for each sequence in the FASTA file includes information about its classification:
- For sequences corresponding to potentially new families: The identifier consists of a three-letter abbreviation for D. amaguana (Dama), followed by the new family identifier and the superfamily name, all separated by underscores. Example: Dama_NF_BELPAO_1.
- For consensus sequences that show similarity to TE sequences previously reported in other species: The identifier includes the abbreviation Dama, followed by the superfamily name and an abbreviation for the species in which the TE was previously reported, all separated by underscores. Example: Dama_Helitron-1_DVir.
Dama_TE_annotations.out: Genome-wide annotation file of TE insertions produced with RepeatMasker and post-processed using OneCodeToFindThemAll to merge fragmented elements.Dama_TE_copies.fasta: FASTA file containing the extracted sequences of all annotated TE copies from the D. amaguana genome.TEcopies_sequences.sh: Shell script used to extract TE copy sequences from the genome using the annotation coordinates.Dynamics_Dama.ipynb: Jupyter Notebook for the analysis of transposable element (TE) dynamics in D. amaguana.
Files
Dynamics_Dama.ipynb
Files
(151.7 MB)
Additional details
Funding
- Pontificia Universidad Católica del Ecuador
- Mecanismos de diversificación y adaptación de las especies andinas del género Drosophila en el Ecuador QINV0196-IINV529010100