Published April 21, 2023 | Version v1
Dataset Open

Data from: Higher evolutionary dynamics of gene copy number for Drosophila glue genes located near short repeat sequences

  • 1. Institut Jacques Monod
  • 2. Columbia University

Description

Background

During evolution, genes can experience duplications, losses, inversions and gene conversions. Why certain genes are more dynamic than others is poorly understood. Here we examine how several Sgs genes encoding glue proteins, which make up a bioadhesive that sticks the animal during metamorphosis, have evolved in Drosophila species.

Results

We examined high-quality genome assemblies of 24 Drosophila species to study the evolutionary dynamics of four glue genes that are present in D. melanogaster and are part of the same gene family Sgs1, Sgs3, Sgs7 and Sgs8 – across approximately 30 millions of years. We annotated a total of 102 Sgs genes and grouped them into 4 subfamilies. We present here a new nomenclature for these Sgs genes based on protein sequence conservation, genomic location and presence/absence of internal repeats. Two types of glue genes were uncovered. The first category (Sgs1, Sgs3x, Sgs3e) showed a few gene losses but no duplication, no local inversion and no gene conversion. The second group (Sgs3b, Sgs7, Sgs8) exhibited multiple events of gene losses, gene duplications, local inversions and gene conversions. Our data suggest that the presence of short "new glue" genes near the genes of the latter group may have accelerated their dynamics.

Conclusions

Our comparative analysis suggests that the evolutionary dynamics of glue genes is influenced by genomic context. Our molecular, phylogenetic and comparative analysis of the four glue genes Sgs1, Sgs3, Sgs7 and Sgs8 provides the foundation for investigating the role of the various glue genes during Drosophila life.

Notes

Supplementary Files

File S1. Compressed zip file of the gene annotations (GenBank .gb files, inputs for Easyfig) of large genomic regions containing all the Sgs genes and their neighboring genes in the 24 studied species.

File S2. Fasta file of all the Sgs amino acid sequences used to create Figure 1B and Figure S1.

File S3. Compressed zip file of reference and corrected nucleotide sequences used to create Figure S2.

File S4. Compressed zip file of Sgs protein alignments (fasta.files) used to compute phylogenetic trees and make Weblogo figures.

File S5. Sgs coding sequence length in bp for species having an Sgs3x copy (.csv file, input for R script sgs_size.R).

File S6. Sgs coding sequence length in bp for species not having an Sgs3x copy (.csv file, input for R script sgs_size.R).

File S7. Compressed zip file of comparisons between pairs of large genomic regions (.out files obtained as outputs from Easyfig).

File S8. Table of pairwise percentage of identity between several Sgs1 and Sgs3 amino-acid sequences (.csv).

File S9. Compressed zip file of the repeats annotations (.csv files) obtained with FindRepeat in Geneious on large genomic regions for D. melanogaster Sgs1, Sgs3/7/8, Sgs3x, D. teissieri Sgs3/7/8, D. subobscura Sgs3, D. eugracilis Sgs3.

File S10. Compressed zip file of new glue protein alignments (.fasta files) used to make Fig. S9.

File S11. Fasta file of all the Sgs nucleotide sequences studied here.

File S12. Fasta file of the 154 ng nucleotide sequences found at loci 68C11 and 68C13.

File S13. Fasta file of the 41 ng nucleotide sequences found at loci 3C11-12, 28E6-28E7, 87A1 and 88C3-4.

File S14. Compressed zip file of all the R scripts (.R files) used to create the figures.

File S15. Bam file of raw reads mapped to D. rhopaloa Sgs1 corrected nucleotide sequence, used to create Figure S2A.

File S16. Bam file of raw reads mapped to D. ficusphila Sgs1 reference nucleotide sequence, used to create Figure S2B.

File S17. Bam file of raw reads mapped to D. biarmipes Sgs3x corrected nucleotide sequence, used to create Figure S2C.

Funding provided by: Ministère de l'Education Nationale, de la Recherche et de la Technologie (MENRT)*
Crossref Funder Registry ID:
Award Number: PhD fellowship

Funding provided by: European Research Council
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100000781
Award Number: FP7/2007-2013 Grant Agreement no. 337579

Funding provided by: Centre National de la Recherche Scientifique
Crossref Funder Registry ID: http://dx.doi.org/10.13039/501100004794
Award Number: MITI "Défi Adaptation du vivant à son environnement"

Files

File_S1.zip

Files (23.9 GB)

Name Size Download all
md5:1ca3f9c46a99f6c663d985926741bd90
3.3 MB Preview Download
md5:0557bd4877ae72cf168eb1d9fadeb8b4
5.5 kB Preview Download
md5:ca52ddae89eb6eff43c6f8af612d1ee8
15.9 kB Preview Download
md5:3dab2e12e5c6fe90fab9b0f2c2fa7fdc
14.0 GB Download
md5:0722d573700845a1d1906031c1ca5de6
8.1 GB Download
md5:64c72bd07804235935adbe5c35429692
1.8 GB Download
md5:b50b6d3851bda67e28b4587d90959b41
5.4 kB Preview Download
md5:a4c26a811d97a92bcc7b8773942ab915
17.2 kB Preview Download
md5:7f98351bd41b599e7f9363234bda5429
552.0 kB Preview Download
md5:28382be19835a90d4284d9b8bc6d20e0
4.2 kB Preview Download
md5:06166702b3094e5b19d3263afcb2b856
132.6 kB Preview Download
md5:799a0a210aa843395d52c6f066ce3d2a
127.4 kB Download
md5:6df56881e28d8cb690ad99afd23a29f6
53.5 kB Download
md5:62ed9a6a75c760cf425378aba08147ce
19.9 kB Download
md5:32f3a07087fc0aef790f34dc7b001781
62.7 kB Download
md5:7199ec4a8d7f7cd20fe3a07184dd2ae9
2.0 kB Preview Download
md5:d8c35f5d34958f058e53d72aae774577
1.8 kB Preview Download
md5:f95c8ba48d11fa46d0e88e51dea12f9a
4.5 kB Preview Download

Additional details