Adh and Adh-dup sequences of Drosophila lebanonensis and D. immigrans:

We have cloned and sequenced the Adh genomic region of Drosophila lebanonensis (subgenus Scaptodrosophila) and D. immigrans (subgenus Drosophila). This region, which contains Adh, encoding the alcohol dehydrogenase enzyme, and Adh-dup (duplicate of Adh), has been compared with the same fragment from D. subobscura (subgenus Sophophora). Even though the flanking regions and introns of both genes have been affected by high substitution rates, the consensus sequences have been clearly identified. Although the overall homology of the coding regions was 76-78% among the species compared, there were differences in the exon distribution of the nucleotide substitutions when Adh or Adh-dup were compared, thus showing that these two genes differ in their evolutionary pattern.


INTRODUCTION
The ADH system has been extensively analyzed. Because of its particular features and the amount of information gathered at different levels, it constitutes a good model with which to study gene structure and regulation and assess phylogenetic relationships in the Drosophila genus (Sullivan et al., 1990).
In D. melanogaster, and in other Drosophila species, Adh consists of three exons separated by two introns. The gene is controlled by two developmentally regulated promoters, adult and larval, and the gene product is tissuespecific. Other species of the genus show a different genomic organization for this region: species of the repleta group have two Adh, each one with its own promoter and one pseudogene. Recently, we have found other Adh pseudogenes which had originated from a retrotranscriptional event in some species of the obscura group (Marfany and GonzBlez-Duarte, 1992b).
Moreover, other rearrangements have taken place in the Adh genomic region. A highly conserved sequence was previously described in the 3' flanking region of Adh (Schaeffer and Aquadro, 1987). This sequence, referred to as Adh-dup, contained an ORF and shared high homology with Adh. However, Adh-dup has been reported only in the Sophophora subgenus. We now show that it can also be found in the Scaptodrosophila and Drosophila subgenera. Its presence in species originating among the early radiations of the genus suggests that Adh-dup probably constitutes a common element in most Drosophila species.
Previous studies in our laboratory showed that ADH activity was present in D. immigrans and D. lebanonensis. The analysis of these species revealed a direct correlation between ADH activity and alcohol tolerance: D. lebanonensis, which was found abundantly in wine cellars, showed a high level of enzyme activity, whereas D. immigrans, with low activity, was practically absent in alcohol-rich environments. Differences in ADH activity were due to remarkable variations in the quantity of enzyme, while the specific activity was comparable (Vilageliu and Gonzalez-Duarte, 1984).
The aim of the present study was to determine the nt sequence of a 3828-bp segment of the Adh region of D. lebanonensis (subgenus Scaptodrosophila) and a 3 142bp segment of D. immigrans (subgenus Drosophila). In order to analyze the molecular evolution, the sequences were compared with each other and with the same region of D. subobscura (subgenus Sophophora) (Marfany and Gonzalez-Duarte, 1992a). It is interesting that two related genes, adjacently located and contained in such a short DNA segment, appear to show different evolutionary patterns.

(a) Clone characterization
The hCharon35 genomic library screens yielded several phage clones homologous to the D. melanogaster sAC1 probe (Goldberg, 1980) of D. lebanonensis and D. immigrans. The Adh region was characterized (Visa et al., 1991) and then sequenced for a total of 3828 bp for D. lebanonensis and 3 142 bp for D. immigrans. Both sequences were compared with the same region of D. subobscura. The codon positions aligned exactly among all species. Nucleotides in flanking regions and introns appeared to be subject to higher substitution rates, rendering comparisons difficult due to alignment ambiguities. In these cases, it was possible to identify only the consensus sequences.

(b) Analysis of Adh nt and deduced aa sequences
By analogy with other species, in the 5'-flanking sequences the Adh promoters and transcription start points for D. lebanonensis and D. immigrans were identified. In D. immigrans the sequences of the larval and the adult TATA box were identical to those of all other Drosophila species reported to date (Fig. 1). However, the distance between the two promoters was shorter than in any other species, only 559 nt. In D. lebanonensis the two TATA boxes were 1114 nt apart, and the larval TATA box (CATAAATA) started with an unusual nt due to a T-C substitution. In addition, the boxA sequence, previously characterized in D. mulleri as equivalent to the D. melanogaster p0 binding site, was also identified in both species, and clear blocks of sequence conservation were shown. This boxA seems essential for the correct expression of the proximal promoter and for tissue specificity, Of the other regulatory sequences defined in D. melanogaster as pl, p2 or dl, none was obvious in the sequence comparisons of D. immigrans and D. lebanonensis. However, assays with Schneider cell cultures transfected with different D. lebanonensis and D. immigrans genomic constructs obtained from the regions that we have characterized were successful (data not shown). So, the basic elements for the expression of Adh had to be present in these genomic fragments.
Within the coding regions, we compared 765 positions and found 171 nt substitutions between D. immigrans and D. lebanonensis, 174 between D. immigrans and D. subobscura and 170 between D. lebanonensis and D. subobscura (Table I). Thus, 22-23% of the sites compared was different among these three species. Substitutions were more frequent in synonymous positions: while only 25% of nt sites were silent, changes in these positions amounted to 63% of the overall figure. Taking into consideration that the third codon position was frequently silent, changes appeared to accumulate there rather than be equally distributed. The rest of the nt sites were nonsynonymous (75%) and represented only 36% of the total changes (about 62).
Silent substitutions were randomly distributed between exons when comparing D. lebanonensis versus D. immigrans, D. lebanonensis versus D. subobscura and D. immigrans versus D. subobscura (x' = 0.196, x2 = 0.162 and x2 = 0.915, d.f. = 2, P < 0.05). However, nonsynonymous substitutions did not appear to be randomly distributed. When D. immigrans was compared with D. lebanonensis or D. suboscura, a specific pattern of distribution of substitutions was found (x2 = 10.76 and x2 =7.64, d.f. =2, P~0.05): the second exon showed fewer changes than expected, whereas there was an excess in the first exon. Biochemical data seem to support that the second exon contains some essential information for the catalytic domain of the ADH enzyme (Albalat et al., 1992) and that it could also be relevant for the correct folding of the protein. So, this region is bound to be under high selective constraints.
More than 50% of aa replacements between D. lebanonensis, D. immigrans and D. subobscura were conservative, which is significantly greater than what could be expected by random changes (15%). D. immigrans has six aa   (conservative), F33, N56 and Ilo (nonconservative). Nevertheless, the homology of the hydrophilicity profiles of the three ADHs (data not shown) and the biochemical features of these enzymes seem to suggest that they all share common features at the level of the tertiary structure.

(c) Adh-dup expression; PCR analysis
The high level of nt sequence conservation downstream from Adh was explained under the hypothesis of a protein-coding gene. Moreover, the homology between Adh and this putative gene suggested that it may represent an ancestral duplication of Adh; thus, it was named A&-&p. In order to determine whether it was transcribed, we purified total RNA from D. lebanonensis (Jowett, 1986). Reverse transcriptase reaction was performed with 1.5-2.0 ug of total RNA using the MoMuLV enzyme (BRL) and the 22 oligo (Oligo Etc., Wilsonville, OR) (Fig. 1). The Zl oligo (Fig. 1) and Taq polymerase (Perking Elmer or Boehringer Mannheim) were added for the PCR amplification (Kawasaki, 1990). The sequences of Zl and 22 were included in exons 2 and 3, respectively. Therefore, the processed RNA (mRNA) lacking the intron would be shorter than the corresponding DNA. The PCR product was subcloned into the pBluescriptSK plasmid and sequenced. The sequence obtained agreed with the proposed exon-intron structure (Schaeffer and Aquadro, 1987). The assay was performed with larval, pupal and adult RNA, and in all cases transcriptional products were detected.

(d) Analysis of Adh-dup nt and deduced aa sequences
The Adh-dup nt sequences of D. lebanonensis, D. immigrans and D. subobscura were also compared. As was shown with Adh, this gene consisted of three exons separated by two introns. In all the species, the first and second exon were 96 and 405 nt long, respectively. The length of the third exon varied with the species: 309 nt in D. lebanonensis, 321 nt in D. immigrans and 339 nt in D. subobscura. The exon sizes were similar to those of Adh (93,405 and 267 nt, respectively). The first intron of Adh-dup appeared to be much longer than that of Adh for all the species except D. immigrans. The intron-1 length of Adh-dup was 258 for D. lebanonensis, 285 for D. subobscura but only 59 for D. immigrans. Taking into consideration that the first intron of Adh has approx 60 nt, this may correspond to the length of the ancestor segment, which acquired its present size after several insertion events. The second intron had 60, 59 and 62 nt for D. lebanonensis, D. immigrans and D. subobscura, respectively, and showed a similar length to the second intron of Adh.
Some regulation consensus sequences were identified at the 5'-flanking sequence of Adh-dup. A putative TATA box, TAATTAAA, was present in all reported species. We also found a CAT box consensus sequence overlapping the 3'-flanking sequence of Adh (Fig. 1). However, as with Adh, neither the 5'-or 3'-flanking sequences nor the introns could be properly aligned. Then, only the coding nt positions of Adh-dup were compared among the three Drosophila species (Table I). We found 195 nt changes (24% of total positions) between D. immigrans and D. lebanonensis, 187 nt changes (23%) between D. immigrans and D. subobscura and 194 nt changes (24%) between D. lebanonensis and D. subobscura. Again, silent substitutions were more frequent that nonsynonymous changes and appeared concentrated mainly at the third codon position. Silent substitutions were randomly distributed among the three Adh-dup exons (x2 =0.03,   D. immigrans-D. lebanonensis; x2 = 0.49, D. (x2 = 17.84, x2 = 8.33, respectively, d.f. = 2, P < 0.05). A random distribution of replacements was found between D. lebanonensis and D. subobscura (x2 = 2.55, d.f = 2, P < 0.05). In D. immigrans nonsynonymous substitutions in the first or second exon were lower than expected, whereas the third exon showed an accumulation of nt changes. This trend was also present in the D. lebanonensis-D. subobscura pair, although it was not significant. Given the sequence coding homology between Adh and Adh-dup, the fact that the replacement substitutions in the first Adh-dup exon were very few (none, D. lebanonensis-D. immigrans; three, D. immigrans-D. subobscura and four, D. lebanonensis-D. subobscura) is particularly striking and may reflect differential selection pressure (Marfany and Gonzalez-Duarte, 1991).
Between 30% and 39% of all aa replacements of Adh-dup among D. lebanonensis, D. immigrans and D. subobscura was conservative. A high degree of similarity in the hydrophilicity profiles was also observed for ADH-DUP in the three species. The aa differences were concentrated mainly at the C-terminal end, where replacements and length variations accumulated. In contrast, experiments of proteolysis and chemical modification have shown that a segment close to the C terminus of ADH is required for enzyme activity and that any alteration in this region seriously impairs the function of the protein (Krook et al., 1992).

(e) Conclusions
(1) In the sequence comparisons the number of changes found among the three species under study, D. immigrans, D. lebanonensis and D. subobscura, were very similar in all cases. This feature is in agreement with the phylogenetic trees obtained from the distance matrix values (Table II), which shows that the three subgenera, Scaptodrosophila, Drosophila and Sophophora (D. lebanonensis, D. immigrans and D. subobscura, respectively) diverged nearly at the same time in the evolutionary history of the genus.  Li et al. (1985). ' The Adh-dup exons have been used to calculate the divergence among several species of the Drosophila genus and D. immigrans and D. lebanonenis. KS (top right) and Ka (bottom left) have been determined according to Li et al. (1985).
(2) The A& and Adh-dup exons have been used to Margoliash (1967), using the FITCH program of calculate the divergence among several species of the PHYLIP (Fig. 2). Species positions on the trees appeared Drosophila genus and D. immigrans and D. lebanonensis.
to be similar with either KS and Ka values except for D.

The KS and Ka values between each pair of species have
immigrans. This species became closer to D. afJinidisjuncta been determined according to Li et al. (1985). Very similar when KS estimations were utilized for the tree construcresults were obtained with the method of Perler and tion. It has been reported that substitution rates in D. Efstratiadis (1980) (data not shown). afinidisjuncta are higher than in most Drosophila species Then, phylogenetic trees based on the KS and Ka of the Sophophora and Drosophila subgenus (Sullivan matrices were constructed according to Fitch and et al., 1990), and this could well account for the tree differences. We also found that D. lebanonensis is closer to the Sophophora or to the Drosophila cluster depending on whether the tree is based on KS or Ka values, respectively.
The estimation of evolutionary rates has been determined according to Kimura (1980), and the neighbourjoining method (Saitou and Nei, 1987) was used to reconstruct the phylogenetic tree. When this tree was compared to those based on the KS and Ka matrices, a clear similarity was observed, particularly with the one obtained with Ka, in which D. afinidisjuncta appeared to be closer to D. mulleri than to D. immigrans.
The Adh-dup sequence has been described in a few species of the Sophophora subgenus. Moreover, up to now it has not been characterized in any species of the Drosophila subgenus. Evolutionary inferences drawn from the KS and Ka matrices have a more limited scope than those concerning Adh. Information about other Adh-dup genes belonging to additional Drosophila subgenus is clearly needed to comprehend the evolution of this genomic region.