Published August 10, 2025 | Version v4
Dataset Open

Manually Curated Library of Transposable Elements (TEs) and TE Annotations for Drosophila amaguana

  • 1. Facultad de Hábitat, Infraestructura e Innovación, Pontificia Universidad Católical del Ecuador, Quito, Ecuador
  • 2. Laboratorio de Genética Evolutiva, Facultad de Ciencias Exactas, Naturales y Ambientales, Pontificia Universidad Católica del Ecuador, Quito, Ecuador
  • 3. Department of Computer Sciences, Universidad Autónoma de Manizales, Manizales, Colombia
  • 4. Centre for Research in Agricultural Genomics, CRAG (CSIC-IRTA-UAB-UB), Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
  • 5. Institut de Recherche pour le Développement, IRD, CIRAD, Université de Montpellier, Montpellier, France
  • 6. Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Colombia

Description

This data collection provides a manually curated library of transposable elements (TEs) for Drosophila amaguana, including consensus sequences, genome-wide TE annotations, and individual TE copy sequences. The library was built using de novo generated by EDTA (Extensive de novo TE Annotator) and RepeatModeler, curated with MCHelper, and further processed for genome annotation using RepeatMasker and OneCodeToFindThemAll. 

Below is a description of the included files:

  • Dama_curated_TE_library.fasta: Contains 737 consensus TE sequences manually curated for D. amaguana. Sequence identifiers include classification and origin (e.g., new families or similarity to known elements). The identifier for each sequence in the FASTA file includes information about its classification:

    • For sequences corresponding to potentially new families: The identifier consists of a three-letter abbreviation for D. amaguana (Dama), followed by the new family identifier and the superfamily name, all separated by underscores. Example: Dama_NF_BELPAO_1.
    • For consensus sequences that show similarity to TE sequences previously reported in other species: The identifier includes the abbreviation Dama, followed by the superfamily name and an abbreviation for the species in which the TE was previously reported, all separated by underscores. Example: Dama_Helitron-1_DVir.

  • Dama_TE_annotations.out: Genome-wide annotation file of TE insertions produced with RepeatMasker and post-processed using OneCodeToFindThemAll to merge fragmented elements.
  • Dama_TE_copies.fasta: FASTA file containing the extracted sequences of all annotated TE copies from the D. amaguana genome.
  • TEcopies_sequences.sh: Shell script used to extract TE copy sequences from the genome using the annotation coordinates.
  • Dynamics_Dama.ipynb: Jupyter Notebook for the analysis of transposable element (TE) dynamics in D. amaguana.

Files

Dynamics_Dama.ipynb

Files (151.7 MB)

Name Size Download all
md5:7ef15cf212664a2b6f01cc8d15f6ffde
1.2 MB Download
md5:2a74237867855d272bcc2b20e112c81a
35.4 MB Download
md5:e339e328a5e905b9bcd6b9127dc22413
115.0 MB Download
md5:477eb3073bd84331636c773dd448241a
24.0 kB Preview Download
md5:44d2ee8b83185a055c0069bdef4f53a7
3.2 kB Download

Additional details

Funding

Pontificia Universidad Católica del Ecuador
Mecanismos de diversificación y adaptación de las especies andinas del género Drosophila en el Ecuador QINV0196-IINV529010100