Construction of a transcription map surrounding the BRCA1 locus of human chromosome 17

We have used a combination of methods (exon amplification, direct selection, direct screening, evolutionary conservation, island rescue-PCR, and direct sequence analysis) to survey approximately 600 kb of genomic DNA surrounding the BRCA1 gene for transcribed sequences. We have cloned a set of fragments representing at least 26 genes. The DNA sequence of these clones reveals that 5 are previously cloned genes; the precise chromosomal location of 2 was previously unknown, and 3 have been cloned and mapped by others to this interval. Three other genes, including BRCA1 itself, have recently been mapped independently to this region. Sequences from 11 genes are similar but not identical matches to known genes; 5 of these appear to be the human homologues of genes cloned from other species. Another 7 genes have no similarity with known genes. In addition, 39 putative exons and 14 expressed sequence tags have been identified and mapped to individual cosmids. This transcript map provides a detailed description of gene organization for this region of the genome.


INTRODUCTION
The generation of transcript maps is an important goal of the human genome project (Collins and Galas, 1993). In addition to detailed genetic and physical maps, a map of transcribed sequences is necessary for understanding genome structure and will aid greatly in the isolation of the genes for inherited traits. As part of our effort to isolate the breast cancer susceptibility gene, BRCAl, we have constructed a transcript map 1 Contributed equally to this work.
'To whom correspondence should be addressed at The National Center for Human Genome Research, National Institutes of Health, Building 49, Room 3A18, 9000 Rockville Pike, Bethesda, Maryland 20892. Telephone: (301)   402-4929. covering approximately 1 Mb of chromosome 17q21. The mapping of the BRCAl gene to chromosome 17q (Hall et al., 1990;Narod et al., 1991) and further refinement of its location by genetic recombination allowed a minimal candidate interval to be defined (Bowcock et al., 1993;Easton et al., 1993;Chamberlain et al., 1993;Simard et al., 1993;Smith et al., 1994). This interval spans approximately 1 CM of chromosome 17. Since on average 1 CM equals approximately 1 Mb, this amount of DNA should be amenable to positional cloning efforts (Collins, 1992). The first step of such an effort is construction of a physical contig of DNA included in the interval. We and others  have constructed physical maps of this candidate interval, and the accompanying paper (Couch et al., 1995) presents a yeast artificial chromosome (YAC)-, Pl-, and cosmid-based contig of this region. Although by its nature a positional cloning approach is targeted toward the isolation of a single gene, a successful outcome in many cases depends upon the isolation of most of the genes contained in the candidate interval.
There are several methods available for isolating transcribed sequences from large genomic regions. Exon amplification/exon trapping (Duyk et al., 1990;Buckler et al., 1991;Krizman and Berget, 19931, direct selection (Parimoo et al., 1991;Lovett et al., 19911, direct screening (Wallace et al., 19901, evolutionary conservation, genomic sequencing (Sulston et al., 1992;Oliver et al., 19921, and CpG island-based (Pate1 et al., 1991;Valdes et al., 1994) methods have all been used to identify the transcribed portions of genomic DNA. The advantages and disadvantages of these methods have been described (Collins, 1992;. No single method has been shown to be capable of isolating 100% of the transcripts from a given genomic interval of this size. For this reason our collaborative multi-institutional effort employed six methods of transcript identification in parallel. Exon amplification, direct selection, and direct screening were performed us-ing cosmids representing the majority of our contig. Transcript isolation by island rescue PCR and evolutionary conservation was carried out on a subset of these. In the accompanying papers, Osborne-Lawrence et al. (1995)   report on extensive direct selection experiments and YAC-based direct cDNA library screening, respectively.
These studies utilized the same collection of genomic clones as were used in this study (Couch et al., 1995). Therefore, some conclusions can be drawn as to the specificity and sensitivity of these methods as the result of this comparison.
While recognizing that these methods can be applied to either YACs or cosmids, we have derived the majority of the data described here from cosmid material due to its relative ease of manipulation. We present here a transcript map of the BRCAl region. In the interval bounded by the genes EDHl7B2 (17,&hydroxysteroid dehydrogenase II) and PPY (pancreatic polypeptide), we have isolated 26 genes. Of these, 19 have similarities or identities to known genes, while 7 appear to be novel genes. In addition, 53 unique exons and cDNA fragments (expressed sequence tags, ESTs) have been isolated and mapped to specific cosmids in the contig.

Molecular biology techniques.
Except as noted, all molecular manipulations (cloning, phage library screening, PCR) were performed using standard methods (Sambrook et al., 19891. Cosmids and YACs. The cloning and construction of the BRCAlregion cosmid, Pl, and YAC contig is described by Couch et al. (19951. Well locations of the cosmids presented here have been maintained from the original library provided by L. Deaven (Los Alamos National Laboratory).

cDNA libraries.
A human breast (Xgtll vector, oligo(dT)-primed) cDNA library was constructed at the University of Michigan Genome Center (Swaroop and Xu, 1993). Human fetal brain (Stratagene, LambdaZAP vector, random and dT primed), normal human ovary (Stratagene, LambdaUNI.ZAPXR, dT primed), and normal human breast (Clontech, XgtlO, random primed) cDNA libraries were purchased. A human endothelial cell cDNA library (Agtll, random and dT primed) was a gift from D. Ginsburg. For library screening 30,000 to 50,000 plaques were plated per 150-mm dish and transferred to Hybond-N (Amersham) membranes. Three to five hundred thousand plaque forming units were screened with each probe.

Chromosome mapping panels.
The chromosomal location of all clones was verified by at least one of two methods. The first was Southern blotting to a somatic cell hybrid mapping panel containing the following human-rodent somatic cell hybrid lines: 7AE4 (containing an intact human chromosome 17; Leach et al., 19891, 12.3B5 (17p-q12, GM106591, MH41(17q23.2-qter, GM105021, and L(17n)C (chromosome 17q; Leach et al., 1989). An appropriate hybridization pattern allowed assignment to human 17q21-q23. The second method employed a screening Southern blot containing two samples (plus a molecular weight standard). Lane 1 contained total human genomic DNA, and lane 2 contained DNA from a pool of 400 cosmids from the region. The cosmid sample was "normalized" for copy number to the genomic samples (10 pg/lane) by loading less cosmid pool DNA (10 rig/lane). These samples were restricted with PstI, and multiple "two-lane" blots were produced. The cosmid lane served to represent the genomic interval of interest. Clones were said to map tially as previously described (Buckler et al., 1991) using the modified cloning vector pSPL3 (Church et al., 1994). For a detailed description and discussion of the exon amplification methods, cosmids employed, and characterization of isolated exons, see Abel et al. (1994). Briefly, pools of 6-10 cosmids were used for each exon amplification reaction. These pools included many redundant cosmids not shown in the minimal set (Couch et al., 1995, Fig. 1). Plasmid clones containing putative exons derived from these experiments were isolated to microtiter dishes to be further characterized. Filter replicas of these dishes were then hybridized sequentially with human Cot1 DNA, the amplification vector, and PCR-amplified inserts of individual clones. These hybridizations allowed the identification of clones containing human reiterated sequences, cloning artifacts, and redundant exons, respectively. The resulting set of unique clones is shown in Fig. 1. Clones containing trapped exons were mapped as described above and used as probes to screen cDNA libraries.

Direct selection.
Direct selection was carried out using the magnetic bead capture protocol as described (Tagle et al., 1993;Couch et al., 1994). Cosmid DNAs spanning the region between cosmids 96H6 and 119D4 ( Fig. 1) were placed in two pools (seven cosmids per pool) and digested with SauSAI or AluI. After linker addition and amplification with biotinylated primers, the cosmid DNA was hybridized in solution to an excess of amplified breast cDNA library inserts. Specifically bound material was eluted at high temperature and cloned into plasmid vectors. Inserts from these clones were then analyzed as outlined above.

Direct screening.
Cosmids were digested with the restriction enzyme NotI, and inserts were isolated from low-melting-temperature agarose (SeaPlaque, FMC) and random-prime labeled individually using standard methods (Feinberg and Vogelstein, 1983). The labeled products of up to five cosmids were then pooled. Pooled probes were mixed with 150 pg of sheared human placental and 300 fig of sheared salmon sperm DNAs in TE (500 ~1). After boiling (10 min), the DNAs were preannealed at 65°C for 1 h. Prehybridization and hybridization of cDNA library filters was carried out in 7% SDS, 0.25 M NazHPO,, 1 n-&f EDTA, and 1% bovine serum albumin. Filters were washed in 2~ SSC, 1% SDS at 65°C (2x, 30 min) prior to exposing to X-ray film. After a 24-h exposure, primary plaque signal intensity varied, from just over background to several-fold over background.

Evolutionary conservation and island rescue PCR.
Cosmid fragments containing single-copy DNA were isolated by subcloning restriction fragments that failed to hybridize with labeled total human genomic DNA. These subclones were used to probe Southern blots containing human, gorilla, rodent (mouse, hamster, and rat), and Torpedo californica (Pacific electric ray) DNA. Probes hybridizing to humans and other species were sequenced.

Sequence analysis.
Double-stranded DNA sequencing was performed on an automated sequencer (Applied Biosystems or Pharmacia) using fluorescence methodology as recommended by the manufacturer. The BLASTN (Altschul et al., 1990) and BLASTX (Gish and States, 1993) programs were used to search GenBank (National Center for Biotechnology Information) with sequence from putative transcribed clones. Gene Recognition and Analysis Internet Link (GRAIL, Uberbacher and Mural, 1991) program version 1.2 was accessed through the XGRAIL client (Shah et al., 1994) on a Sun SparclO workstation. GRAIL 2 analysis was performed on the 21 kb of genomic sequence from the EDHl7B2 locus. This sequence was deposited in GenBank by Peltoketo et al. (1988) (Accession No. M84472). The sequences of clones reported in Table 1 have been deposited with GenBank.

RESULTS AND DISCUSSION
tothe region if equivalent hybridization bands were observed in the cosmid and human genomic lanes. Exons and cDNA clones were A map of transcripts is shown in Fig. 1 1. Schematic diagram of the transcript map in the BRCAl region between markers HSD-A3T (Friedman et al., 1993) and the PPY gene. The minimal cosmid contig is represented by thin horizontal bars (Couch et al., 1995) from centromeric end (top left) to telomeric end (bottom right). Cloned transcripts are represented by thick bars. Arrows at the ends of transcripts indicate the direction of transcription, if known. Previously unidentified genes are named with the prefix '7"' (transcript) and a Greek letter. Human homologues of genes cloned from other species are named with the prefix "h" (human). Downward vertical thin arrows indicate the map location of trapped exons, and vertical bars with squares direct selected cDNA clones. The RNU2 region is represented by a gray box (see text). Cosmids, genes, and exons are not drawn to scale. Genes are drawn to represent accurately the hybridization of cDNA clones to specific cosmids. The overlapping cosmid and Pl contig contains two gaps, one at the break of the top right edge and one between cosmids 102E12 and 52C6 on the bottom. scribed below, first in terms of isolation methods and then by sequence information.

Exon amplification.
Cosmid DNAs representing 39 of the 45 cosmids in our contig were used as templates for exon amplification. Two exon amplification experiments were performed: the first using unordered cosmid pools and the second with cosmids pooled by their contig location (Abel et al., 1994). Over 2000 primary clones were isolated and arrayed in microtiter dishes. From this starting material a nonredundant collection of exons was derived. This set contained a total of 76 exons that mapped to the region of interest (Fig. 1). Exon amplification was applied more extensively than any other method presented here and proved highly effective. Putative exons were isolated from almost every cosmid tested. Trapped exons were isolated as plasmid clones that can be readily sequenced, thus allowing rapid comparison to nucleotide databases. Because they are directionally cloned, the orientation of transcription can be determined from this sequence. An added advantage of this method is the ease of determining intron-exon borders by comparing the sequence of trapped exons to cDNA sequences. In several cases, multiple unique exon clones were shown to be con-tained in a single cDNA (E lA-f, glucose-6-phosphatase, human Enhancer of zeste, and others). In this study, the major disadvantages of transcript isolation by exon amplification was the small size of some of the trapped exons. Although the average trapped exon was 200 bp, the distribution was clearly skewed toward the lOO-to X0-bp range (median = 130; Abel et al., 1994). In the case of the BRCAl gene, we isolated clones corresponding to exons 20 (84 bp) and 22 (74 bp) (Miki et al., 1994). Clones of this size are difficult to use as hybridization probes for screening cDNA libraries using conventional methods.

Direct selection.
A total of 14 cosmids were digested with Sau3AI andAZu1 and hybridized to PCR-amplified inserts from a breast cDNA library for the direct selection of clones. Of 32 clones analyzed from one experiment, 23 (72%) mapped to the region of interest of chromosome 17, although more than half (14) of these contained Alu sequences in addition to single-copy sequence (Couch et al., 1995). Due to the difficulty of using these clones in hybridization experiments, they were not characterized further. This set of clones identified 2 genes that were not isolated by other methods (Table 1).

Direct screening.
The inserts of 25 cosmids represented in the minimal contig were used to probe the  T27163  T27164  T27165  T27166  T27167  T27168  T27169   T27170   T27171, T27223  T27172   T27173  T27174  T27175-6  T27177  T27178-9  T27180  T27181  T27182  T27183   T27184  T27185  T27186  T27187   T27188-9  T27190   T27191   T27192   T27193  T27194  T27195  T27196  T27197  T27198  T27199  T27200  T27201  T27202  T27203  T27204  T27205   Note. Rows containing map features are in centromere to telomere order as in Fig. 1. Clones marked with an asterisk contain repetitive and single-copy sequences (see text). A plus sign is used to denote exon clones that have been trapped from the minus strand of known genes (see text). Initial isolation methods are as follows: A, exon amplification; B, direct selection; C, direct screening; D, evolutionary conservation; E, GRAIL; F, island rescue PCR. Accession numbers are not given when the sequence of clones presented here was identical to those in GenBank. Likewise, a name does not appear in the gene name column if the nucleotide sequence of a cloned fragment was identical to the sequence of the database entry. Blank entries in the database homology column indicate that no significant matches were found. The accession number of the best GenBank match is shown. In many cases additional GenBank entries scored nearly as high as the best match. mRNA size is shown if the isolated clones were used to probe Northern blots. Blank entries indicate that a Northern blot was not performed. oligo(dT)-primed, normal breast cDNA library. These screens resulted in the isolation of four genes. Direct hybridization methods are often complicated by the isolation of transcribed repeats. In this study one cDNA clone, MTO-14, was found to contain a repetitive sequence and adjacent unique sequences. These unique sequences hybridized specifically to the 93312 and 41A8 cosmids. In general, direct library screening yielded few genes per pooled cosmid probe. The major disadvantage of this method was its low efficiency of transcript isolation. Its simplicity and the fact that it isolates intact cDNA clones is one advantage of the direct screening method over those described above.

Island rescue and evolutionary conservation.
Although these two methods were only applied to three cosmids and one YAC, this application did yield four genes. Island rescue was performed on the 26F3 YAC, resulting in the isolation of two genes from the human endothelial cell cDNA library (Table 1).
Clones containing evolutionarily conserved sequences were isolated as a by-product of experiments aimed at cloning of "single-copy" DNA fragments from the region. In the process of mapping these clones to the chromosome 17 somatic cell hybrid panel, signals were noted in lanes containing the DNA of other vertebrates ( Fig. 2A). Subsequent sequence analysis of these clones revealed that two of three clones had significant similarity to sequences in GenBank. A subclone from cosmid 29F5 contains a portion of the glucose-6-phos-phatase gene (Fig. 3), and the subclone spanning cosmids 134A3, 140E9, and 86Gll contains a portion of the MOXl gene (data not shown).

Sequence-based exon prediction.
GRAIL analysis of the published 21,788-bp region of genomic DNA containing the EDHl7Bl and EDHl7B2 genes predicted the presence of a previously undescribed transcription unit proximal to the EDHl7Bl gene (Fig. 4). Three putative exons were amplified from genomic DNA and used to screen a breast cDNA library, resulting in the isolation of several cDNA clones containing the predicted exons spliced together.

Efficiency of transcript isolation.
The goal of this study was to isolate the maximum number of transcripts. Transcript identification was carried out as a dynamic process. Regions of the contig in which transcripts were efficiently isolated by the first method applied were not as intensely characterized by additional methods. Products of each of the methods were crosshybridized to each other, and in several cases fragments of the same transcript were isolated by independent methods (Table 1). In the case of glucose-6-phosphatase (Fig. 31, different segments of the same gene were isolated by different methods. Clones isolated in subsequent experiments that were completely contained within existing clones (regardless of the isolation method) were not characterized further and do not appear in Table 1. Therefore, these results cannot be used rigorously to compare transcript isolation meth- ods. In most cases, the partial cDNAs were then used to screen cDNA libraries to extend the initial clones.
We have failed to clone fragments of at least two genes known to be in this interval. The VHR phosphatase gene has been independently isolated and localized to this region by Kamb et al. (1994) and Friedman et al. (1995). A 4.3-kb cDNA encoding the putative MTO-135 I CA125 antigen, a B-box protein, has also been mapped to this interval by Campbell et al. (1994) and Osborne-Lawrence et al. (1995). Both of these genes are known to be represented in our cosmid collection, highlighting the fact that current applications of transcript isolation methodologies are less than 100% effective.

Characterization of Isolated Transcripts
Placement of features on the map.
Borrowing a term from the cartographer's lexicon, we refer to the transcript clones as "features." For classification purposes we divided the products of our transcript search into three categories: genes, expressed sequence tags (EST), and exon sequence tags (ExST) ( Table 1). These features are distributed across the entire contig (Fig. 1). The chromosomal location of clones was first verified by hybridization to somatic cell hybrid panels as described under Materials and Methods (Fig. 2). Higher resolution mapping was achieved by hybridizing individual clones to gridded arrays of the complete cosmid contig. At least one feature was mapped to each cosmid used for transcript searching. The cosmids 120D8, 95G7, 102E12, and 52C6 were not used in our transcript isolation effort. The transcript sizes of several genes have been determined by Northern blot analysis (Table 1).
It should be noted that the exact physical distance spanned by these cosmids is not known. Our cosmid contig materials contained three gaps (Couch et al., 19951, one of which is now known to contain the majority of the BRCAl gene (Miki et al., 1994;Futreal et al., 1994b). The 40 cosmids used for transcript isolation overlap each other to varying extents (Couch et al., 1995). Assuming conservatively that each cosmid represents only 15 kb of unique genomic DNA (i.e., not found in the adjacent cosmid), we have surveyed approximately 600 kb of genomic material. This estimate does not include the region containing the RNU2 gene cluster, which is known to contain approximately 20 copies of this gene, spanning 120 kb of genomic DNA (Westin et al., 1984).

Genes.
There are two overlapping criteria by which cloned material was placed into the "gene" category.  Lei et al. (1993). The sequence of clone MT0135 reveals an exact match to nucleotides 134-245 of exon I. Exon I spans from nucleotides 1 to 309 of the G6PT cDNA. Because the exon-amplified fragments are directionally cloned, it can be determined that MT0135 was originally trapped from the DNA strand opposite to that of G6PT exon I. The clone MT-O-108 was isolated as an evolutionarily conserved fragment (1.2-kb &XI, subclone) of the cosmid 29F5. This genomic clone spans part of GGPT intron IV and exon V.
BRODY --** l!il' AL.  Sequence of the cloned fragment revealed significant homology to a known transcript. The vacuolar proton pump 1 (VPPl), ElA-f, glucose-6-phosphatase, and MOXl genes are examples of cases in which the isolation of a single fragment provided sufficient information to infer that a clone was part of a bona fide gene.
(2) The fragment is part of a cDNA clone. Intact cDNA library clones were isolated by the island rescue and direct screening methods. These clones are therefore considered "genes" for mapping purposes. DNA fragments cloned by exon amplification or direct selection were placed in the gene category only when they met criterion one or were successfully used as probes to isolate clones from cDNA libraries (criterion two).
In total, 26 genes were placed on the map. Fragments of 2 genes already known to map to this interval, EDHl7B2 and PPY, were isolated. In addition, a previously unrecognized transcription unit was identified in genomic sequence of the EDHB region (see above).
We isolated fragments of two previously cloned genes: glucose-6-phosphatase (GGPT), the gene responsible for glycogen storage disease type la (Lei et al., 1993), and ElA-f, an ets-oncogene related transcription factor (Higashino et al., 1993). The precise chromosomal location of these genes was previously unknown, although the GGPT gene has been localized to chromosome 17 (Lei et al., 1994). Clones corresponding to three additional genes, the Ki antigen , MOXl (Futreal et al., 1994a), and BRCAl (Miki et al., 1994) were also independently isolated and mapped to this interval by others during the course of this work. Of the remaining 21 genes, 11 detect similar sequences in GenBank by BLAST analysis. Of these, 3 genes, HMG-17 (Landsman et al., 1986), y-tubulin (Zheng et al., 19911, and ribosomal protein L27 (Gallagher et al., 1994 are members of gene families. A fourth, the FLT4 tyrosine kinase (Aprelikova et al., 1992), hybridizes to specific cosmids from this interval and to additional sites in the genome. Sequence obtained to date from this cDNA clone is restricted to the 3' untranslated region of the FLT4 gene (data not shown). In the absence of genomic sequence from this region, we are unable to determine whether the cDNAs isolated for these genes were transcribed from these exact loci on chromosome 17 or from the loci of other family members. Five additional genes appear to be human homologues of genes cloned from other species. As mentioned above, the human MOXl clones isolated represent the human equivalent of this mouse gene (Futreal et al., 1994a).
The human vacuolar proton pump-l cDNA shares 97% amino acid identity over 29 residues with the 116-kDa subunit of the bovine vacuolar proton pump (Peng et al., 1994). A rat gene probe was previously used to localize the human VPP-1 homologue to 17q21-qter in somatic cell hybrids (Ozcelik et al., 1991). The human Enhancer of Zeste (E(z)) gene shares 69% amino acid similarity with the Drosophila E(z) gene, a negative regulator of the Antennapedia and Bithorax gene complexes (Jones and Gelbart, 1993). The complete cDNA sequence of the human E(z) gene will be reported elsewhere (Abel et al., manuscript in preparation). Partial sequence of the human copper amine oxidase cDNA reveals a 77% amino acid identity over a 193-aminoacid segment with the bovine serum copper amine oxidase (Mu et al., 1994). The determination of the complete sequence of this gene is in progress. We have isolated a clone identical to a portion of EST-HFBCV14 (Adams et al., 1991), which we infer to be part of the human vesicle amine transporter-l gene (VAT-l) by the virtue of the homology between the EST and Torpedo californica (Pacific electric ray) VAT-l gene.
The GenBankBLAST alignments of four genes revealed limited but significant similarity (smallest sum probability < 1 x 10m5) to sequence in GenBank. Partial sequence of clone MTO-61 reveals limited homology (46% amino acid identity in a 48-amino-acid overlap) to the rat and bovine neurexins (Ushkaryov and Sudhof, 1993). Exon clone MTO-134 shares 44% amino acid identity (82% similarity over a 29-amino-acid overlap) to the human and chicken 23-kDa component (~23) of the progesterone receptor complex (Johnson et al., 1994;Smith and Toft, 1992). The exon clone, MTO-173, is contained within cDNAs isolated with MTO-134 (data not shown). This exon does not share homology to the p23 gene, suggesting that the homology between these two genes is limited. Sequence of the exon MTO-192 reveals 70% amino acid similarity with the Caenorhabditis elegans and Saccharomyces cerevisiae pre-mRNA splicing factors 22 and 16 (Company et al., 1991;Sultson et al., 1992). Additional cDNA-derived sequences demonstrate that this homology is restricted to the carboxyl half of these genes. (Ostermeyer et al., in preparation). The sequence of trapped exon clone MTO-122 is 40% identical and 60% similar (over a 60amino-acid overlap) to the Xenopus int-2 (FGF-3) protein (Tannahill et al., 1992). This region is also common to mouse keratinocyte growth factor Fgf-7 (Dickson and Mason, 1993) and the DNA binding protein PRDII-BFl (Fan and Maniatis, 1990).
Partial sequence from the seven remaining genes reveals no significant nucleotide or amino acid similari-ties to the entries in GenBank. It will be necessary to obtain full-length cDNA sequence of these genes before it can be determined whether they represent previously described transcription units.
ExST's. Our second map feature is products of exon amplification experiments that map to specific cosmids but have not detected cDNA clones in breast or fetal brain cDNA libraries after screening at least 5 x lo5 recombinant phage. The complete sequence of these trapped exons has been determined and searched against GenBank; similar sequences were not located. These 39 unique clones in several cases appear to be clustered in the contig. Most notably, the 81C6 and 134A3 cosmids (Fig. 1) contain 10 ExST's. At least two other less impressive clusters are found in the contig (e.g., distal end of cosmid 90E4 and cosmid 32351. Even though these exons are distributed across this area in a pattern resembling a gene, we have been unable to demonstrate either by sequence similarity to known genes or by the isolation of cDNAs that they belong to a contiguous transcription unit. A likely explanation for this is that an insufficient number of tissues or developmental stages have been surveyed for expression. It is also possible that a more sensitive test such as RT-PCR will be required to detect low-level transcripts from these areas. Further studies will be needed to determine whether these ExST's are constituents of true transcription units or artifacts of the exon amplification process.
ESTs. The third category is ESTs. These clones are the products of direct selection experiments. Because they are products amplified from a cDNA library, they represent fragments of transcribed sequences. Sufficient evidence does not exist to classify them as genes as outlined above. As is the case for the ExST's, the sequences of these fragments are not similar to sequences in the public databases. Like many randomly generated ESTs (Adams et al., 19911, these fragments are of unknown function. Unlike the majority of ESTs, the precise chromosomal location of these putative genes is now known. In conclusion, we have isolated transcripts from approximately 600 kb of the human genome. It is likely that this interval of the genome contains at least 26 unique genes. Although we are unable to calculate the exact gene density of this region, our finding of 1 gene approximately every 20 kb is close to the value expected if the human genome contains 100,000 randomly distributed genes (e.g., 1 gene per 30 kb). This transcript map and further refinements thereof will add to our knowledge of genome structure, gene organization, and expression.