Recombination‐activating gene proteins: more regulation, please

Summary:  Developing B and T cells assemble gene segments in order to create the variable regions of immunoglobulin and T‐cell receptors required by our adaptive immune response. The chemistry of this recombination pathway requires a specific nuclease and a more general repair pathway for double‐strand breaks. A complex of the recombination‐activating gene 1 (RAG1) and RAG2 proteins provides the nuclease activity. In fact, RAG1 and RAG2 probably coordinate many steps involving the coding and signaling DNA sequences. Studies using deletion and truncation mutants of the RAG proteins demonstrate that each of these contain a functional core region, representing about two‐thirds of the polypeptides. While the core regions are sufficient to catalyze recombination in test systems, the full‐length proteins seem to show more complicated behaviors in vivo. A plausible explanation is that regions outside the core help in the proper regulation of recombination. The non‐core region of RAG1 has been found to contain a ubiquitin ligase. Regulatory functions may contribute to autoregulation of the proteins involved, fidelity of the reaction, protection of the cell from translocations, coordination of recombination with the cell cycle, and possibly modification of the chromatin structure of target DNA.


Introduction
Site-specific recombination is used by prokaryotes and eukaryotes to control DNA in a tightly regulated and heritable manner (1). In vertebrates, the only examples known are V(D)J recombination and isotype switching (reviewed in 2). Certain features distinguish V(D)J recombination from other examples. Although the process is quite precise in positioning recombination events at the heptamer border of the recombination signal sequences (RSSs), it is deliberately sloppy in the outcome of recombination at the coding ends, thus introducing short insertions or deletions in the coding regions of the V, D, or J elements. The resulting junctional diversity raises the variability of coding sequence precisely in the antigen-binding portion of the translated protein products, and it raises the number of potential products by several orders of magnitude over the pure mathematical diversity determined by the number of recombining segments. A second rare property is the higher degree of specificity required in order to make the process more efficient in yielding sensible products. If all RSS elements generated recombinants indiscriminately, most recombination events would generate useless products (e.g. recombination occuring between two V regions). To escape this fate, the so-called 12/23 rule (3) evolved, which specifies that two different RSS classes exist, containing a spacer sequence of 12 or 23 nucleotides. Recombination occurs almost entirely at pairs of RSS elements formed by one of each type. By arranging that all V regions use RSSs of the same length while potential partners (perhaps a D element) use the other, only V-to-D events will occur. Recombination is through breakage and rejoining, and cleavage of DNA substrates can be observed in reconstituted systems using only a few proteins. Recombination-activating gene 1 (RAG1) and RAG2 (4) in the form of a multi-subunit complex (5,6) are capable of cutting individual DNA molecules, while coordinated cleavage at pairs of RSS elements can be obtained with RAG1, RAG2, and either of the DNA-bending proteins -highmobility group (HMG) proteins HMG1 (7) or HMG2 (8). Our understanding of the stoichiometry and structure of the cleavage complex is still evolving. An important issue is precisely how a complex of RAG1 and RAG2 can bind to both types of RSSs and how it favors outcomes with the desired pairing over the alternatives. This subject has been reviewed (9), and recent contributions toward an appreciation of the structure of the protein-DNA complex favor a single tetramer composed of two molecules of each RAG protein (10,11), although alternative stoichiometry has also been reported (12). Further regulation at the level of the RAG complex is addressed later.
Recombination is known to be regulated at several levels, and there is a need to identify additional mechanisms to account for the complexity of chromosomal rearrangements. Obviously, the regulation of RAG protein quantity helps limit recombination to specific cell lineages and developmental states. Regulation of synthesis is one side of this coin, and protein turnover is the other. In addition, important regulation may occur through protein post-translational modification.
Chromatin accessibility is also a key regulator of the use of specific target sequences. Analysis of chromatin structure at sites of recombination reveals a significant association with certain histone modifications. Recent reports primarily address histone acetylation and occasionally methylation. Perhaps the list of modifiers deserves to be expanded.
A major unknown aspect of V(D)J recombination is the manner in which it is coordinated with DNA repair. In the largest sense, this can include the fundamental aspects of cell growth, including DNA replication and cell cycle control. These topics are addressed below.
A new enzymatic activity has recently been found in the Nterminal non-core region of RAG1 (13). The RING finger structure has been shown to be capable of acting as an ubiquitin ligase in vitro. The implications of this observation are explored in the following discussion.

RAG protein domains
Chopping away parts of a protein to see what happens is a crude device, but it can reveal the existence of separable protein functions, provided that the whole is not destroyed by the manipulation. An enzymatic core was defined in this manner for RAG1 (14)(15)(16) and RAG2 (17,18). The core regions are competent to mediate complete V(D)J recombination on artificial substrates, but they show some significant differences from the complete protein, when tested under the ideal physiologic conditions. Specifically, with respect to RAG1, gene replacement of full-length RAG1 by the core alone produced mice with reduced numbers of circulating B and T lymphocytes (19). These circulating cells were essentially normal in identity and in the properties of their antigen receptors. Some might consider it remarkable that recombination proceeds as normally as it does in the absence of approximately 40% of the RAG1 peptide. Analysis of different stages of development suggests that the efficiency of recombination is reduced, but this inefficiency may be partially compensated by clonal expansion of the cells that complete the process correctly. An open question is whether the inefficiency is a pure property of the core region acting as a poorer nuclease, or whether the inefficiency reflects the absence of a helpful activity located in the N-terminal region of RAG-1 (20)(21)(22). It should be observed that a human patient with a homozygous mutation in the N-terminal portion of RAG1 (leading to translation initiation at an internal methionine) showed a disproportionate reduction in B-cell development compared to T cells. The stimulating possibility is that this finding reflects a distinct role for RAG1 between the two lineages (23). In the case of RAG-2, there is a different twist. It had already been shown in cell lines that V(D)J recombination at the immunoglobulin heavy-chain (IgH) locus was peculiarly dependent on full-length RAG2 to complete the second round of recombination. This step typically connects a V region to the DJ segment made in the first round (24). Two reports (25,26) show that the same general behavior applies to mice expressing only the core RAG2 gene and lacking the C-terminal 37%. In addition, a reduction in the equivalent step of T-cell receptor b (TCRb) was observed, while a reduction in total TCRd rearrangement was reported by the second group. Most remarkably, Liang et al. (25) find that the non-core portion of RAG2, as a separate molecule, is able to complement the defect in IgH rearrangement, when transiently expressed in Abelson murine leukemia virus-transformed pre-B cell lines derived from the core RAG2-expressing mice. One intriguing possibility is that the non-core portion of RAG2 plays a separate role, possibly independent of the nuclease complex. The alternative is that it is able to associate with and complete a RAG complex despite the non-covalent linkage.
The C-terminal portion of RAG2 may connect in yet another way to the general wellbeing of cells undergoing DNA recombination. It has been a major goal to understand whether aberrant V(D)J recombination contributes to chromosomal translocations. These products are frequently observed in leukemias and lymphomas. While direct RAG-mediated transposition has been demonstrated in vitro (reviewed in 27), it comes as some surprise that transposition has been relatively uncommon in cells, though detected at low levels (28). This does not seem to be a consequence of an intrinsic high stringency of the RAG proteins in recognizing their substrates. Cryptic sites occur frequently (29) and are able to function in test substrates. Rather, full-length RAG2 seems especially capable of inhibiting the reaction that leads to at least one cause of translocation. Experiments using core RAG proteins show a particular fondness for transposition of signal ends into distorted DNA helices (30), but three groups now report that the presence of the C-terminal portion of RAG2 (only tested as part of the full-length protein) reduces the frequency of RAG-mediated signal-end transposition (31)(32)(33). It would be interesting if this reflects an error-correcting behavior that somehow recognizes undesirable products, as suggested previously, and reverses the reaction that created it (34).
There is much left to learn about the architecture of the core region of the two proteins. At the heart of the recombination mechanism is the question of how synapsis is achieved between DNA sequences containing the two different length RSS elements, and how the RAG complex contributes to the processing and directed joining of the intermediate four DNA ends. A conformational change in the complex that would alter the position or orientation of the two coding ends with respect to their old RSS partnersis expected (9). This would disfavor the reverse reaction that would reconnect the original sequences ('open and shut') (35), and drive the DNA ends toward the preferred products. A second persisting fundamental question is why RAG-2 is needed for nuclease activity. All of the acidic residues believed to contribute to metal binding at the catalytic active-site reside on RAG1 (36,37). It appears, though, that the RAG1 core itself contains two structural domains (38,39) and that the active site is divided between them. The idea remains attractive that RAG2 is needed to assemble the active site through a conformational change of RAG1. This role would also reconcile the observations that the presence of RAG2 enlarges the footprint on DNA (40,41), but direct DNA crosslinking studies primarily show contacts to RAG1 (42).
The N-terminal non-core domain of RAG1 consists of 383 residues, which is extensive enough to be a considerable protein by itself. The remainder of this review speculates on the role of the RING motif on V(D)J recombination and regulation of cell physiology.

RING domains and post-translational protein modification
It is important to remember that everything is connected. This thought may be disturbing to one who prefers to associate individual activities to proteins. This statement is made, because the RING structure found near the N-terminus of RAG1 is likely to have many effects. The original determination of the RAG1 protein sequence included the recognition of a series of cysteine and histidine residues reminiscent of zincbinding domains (43). The short region containing this sequence was expressed and crystallized (44), confirming the metal-binding behavior. Subsequently, this structure was recognized to belong to a subclass of zinc-binding domains described as the RING motif (45). The RING is a conserved structure that forms an essential interaction surface for a group of enzymes known as E3 ligases. These proteins form part of an enzymatic cascade that results in the covalent attachment of small modifying peptides to other target proteins. There are about 300 RING family members in our genome as well as a second large group of E3 ligases known as the homologous to the E6-AP carboxyl terminus (HECT) family. The complexity of the network of proteins that performs peptide addition may equal that of protein kinases (46). It is plausible that this portion of RAG1 belongs to the same family, and we have shown that it has this activity in vitro. Covalent posttranslational modification of proteins by the addition of other peptides is emerging as a widespread general method of regulation. At first, only one such peptide modifier was known, a 76-residue protein named ubiquitin, which is conserved through evolution from single-cell eukaryotes through all plants and animals. More recently, a family of modifiers, termed ubiquitin-like modifiers (UBLs), has been discovered, each member with its own specific supporting enzymes (reviewed in 47). These modifiers are coupled to free amino groups of lysine residues in the body of the target protein, or, more rarely, to the amino terminus itself. To add to the beautiful complexity of this subject, the modification can take more than one form and can have multiple consequences. In the case of ubiquitin, for which the most is known, single ubiquitin peptides or chains of these can be formed. Furthermore, chains have differing significance based upon which specific lysine, internal to the ubiquitin, is chosen for the polymeric linkage. Polyubiquitylation (with chains of length greater than six) through ubiquitin lys48 is the most common use of this modifier, and it signals proteosome-mediated degradation (reviewed in 48). However, polyubiquitylation through lys63 is catalyzed by RING-containing DNA repair proteins (49), and the BRCA1/BRCA2 heterodimer of RING proteins has just been shown to catalyze unusual polyubiquitin chains through the ubiquitin lys6 (50). Many ubiquitin (or UBL) modifiers halt after the addition of single peptides. These do not result in degradation of the target but rather have many regulatory consequences for cell proteins. Modified proteins are often transported to new cellular compartments and also can change their association with other binding partners. An additional significance of peptide modifiers arises from the capacity of the same lysine target residues to be used by other mechanisms, such as acetylation or methylation. Competition can be established between alternative signaling pathways. Finally, modification by ubiquitin, and probably many of the UBLs, is a reversible process, owing to the existence of specific proteases that can reverse the linkage to the target. As an E3 ligase, RAG1 could have many roles in the cell. Specific examples of the use of peptide modifiers with potential relevance to V(D)J recombination follow.
Before leaving a general description of RING functions, it is also worth mentioning that self-organizing nuclear protein structures (or aggregates) contain a number of RING proteins. There is a suggestion that the RING itself may dispose a protein to associate in this manner, independent of its enzymatic activity. The significance is uncertain, but it could have thermodynamic effects on reactions that involve these proteins (51).

Regulation of RAG protein stability and coordination with the cell cycle
It is a truism that the brakes can be more important than the accelerator when driving. With respect to proteins, good control means being able to halt a reaction as well as to stimulate it. The RAG proteins are no exception, and as potentially dangerous proteins, it seems wise to keep them under strict control. RAG1 is a short-lived protein, with a half-life of 15 min, as measured by pulse-chase following transfection into pre-B cells (14). Under the same conditions, the core RAG1 decayed with a half-life of 18 min. At that time, the distinction seemed negligible, but now this experiment seems less informative. Rather than the decay characteristics of the bulk protein, especially as expressed transiently from a strong promoter and in the absence of added RAG2, it would be better to revisit the question under more physiologic conditions. Ultimately, it would be especially informative to distinguish the behavior of the rare protein molecules engaged in recombination from that of the vast majority of protein that never acts on DNA. This last thought is prompted by the possibility that the E3 ligase activity of RAG1 could be used to modify itself (or other members of the recombination complex). The modification could lead to conformational change, protein transport to a particular compartment, association with new partners, or degradation. The simultaneous expression and purification of full-length RAG1 and RAG2 yields less protein than co-expression of either core region with its full-length partner (33). From this finding alone, it seems plausible that the N-terminal E3 ligase of RAG1 could ubiquitylate the C-terminal segment of RAG2, leading to its degradation. In the absence of either of these domains, the complex is rendered more stable.
The regulation may be more complicated. RAG2 experiences periodic degradation coordinated with the cell cycle. A body of work from the Desiderio laboratory (52) indicates that phosphorylation of thr490 occurs at the G1/S transition, coincident with the activity of cyclinA/CDK2. This may not, however, signal degradation directly. Rather, according to one report (53), degradation seems to follow a phosphorylationdependent localization to the cytoplasm. RAG2 is polyubiquitylated in its core region, although one cannot determine whether this occurs during residence in the nucleus or cytoplasm, and subsequently delivered to the proteosome. The authors propose that the C-terminal non-core region suppresses the ubiquitylation of RAG2 prior to its phosphorylation. A nuclear localization signal in the C-terminal part of RAG2 was identified (54). A report from the Desiderio laboratory (55) also explores the connection between phosphorylation and nuclear localization but does not address the mechanism of degradation directly.
Can any of these activities explain the apparent reduction in translocations associated with full-length RAG2? Not yet. Coordination with the cell cycle makes sense. Given a choice, a reasonable cell would prefer that chromosomal recombination was isolated from DNA replication or mitosis. Degradation of RAG2, as discussed above, would prevent the initiation of V(D)J recombination as S phase begins. This form of coordination is passive. But, is there a way to inform the cell that recombination is taking place and to actively halt cell cycle progression until it is completed? At one time, it was thought that the free DNA ends, created during recombination, would signal their existence through the Ku proteins and DNA-PK. This, however, does not appear to be the case (56), perhaps because these ends remain hidden from damage sensors by persistent association with the RAG complex. An interesting possibility, in the light of the identification of the E3 ligase activity, is that the RAG proteins may report the status of recombination to the cell. Ubiquitylation is used to modify or degrade signaling proteins and thereby coordinate cell growth in other systems (57)(58)(59).

Histone modification
The targeting of V(D)J recombination to particular chromosomal loci is an important component in the regulation of this system. Epigenetic mechanisms impose much of the selectivity between the loci. Within a locus, both chromatin structure and DNA sequence influence the use of particular gene segments. The current understanding of the nature of these mechanisms is certainly covered well by others in the current issue of this journal. Recent studies have refined the degree to which the RAG proteins are shown to contribute at the level of RSS recognition (60)(61)(62). Because these sequences are not absolutely conserved, sequence variation in the RSS elements or surrounding them can bias recombination. No doubt, natural selection has favored the use of some V regions. However, in the arms race between a host and an invader, the microbial world always evolves faster than the host. Hence, in theory, the strength of V(D)J recombination is precisely not to display a strongly inherited pattern but rather to allow wide expression in order to provide the largest repertoire for subsequent clonal selection.
Because of my new interest in ubiquitin, I would like to point out that there are many chemical modifications that occur on nucleosomes. In addition to acetylation, phosphorylation, and methylation, nucleosomes can be ubiquitylated, sumoylated, and adenosine diphosphate-ribosylated. In decreasing order of abundance, ubiquitin has been found on histones H2A, H2B, H1, and H3. These modifications are usually single additions and do not appear to result in protein degradation. Despite prolonged effort, it has been difficult to assign a structural role for ubiquitylated histones (63,64). It appears though that histone ubiquitylation may influence other modifications. Ubiquitylation of H2B at lys123 is needed (transiently) to subsequently methylate histone H3 at lys4 (65)(66)(67). The ubiquitylation is executed by RAD6 with the help of Bre1, a RING-containing E3 (68,69). Both ubiquitylation and deubiquitylation seem able to effect histone methylation and both can be associated with transcriptional activation (70,71). Is it so hard to imagine that the RAG proteins, with so many interesting behaviors already described, could play a similar role? Fig. 1 illustrates many of the issues surrounding the role of RAG1 as an E3 ligase, and it is included as a summary of this discussion.

Note
While this manuscript was under review, Jones and Gellert (72) published a report confirming the activity of RAG1 as an ubiquitin ligase and mapping a site of modification on the RAG1 protein. Fig. 1. The recombination-activating gene 1 (RAG1) E3 ligase. This illustration represents our current understanding of the activation, enzymatic activity, targets, and consequences of this enzyme.