The P1 0 specificity of tobacco etch virus protease

Affinity tags have become indispensable tools for protein expression and purification. Yet, because they have the potential to interfere with structural and functional studies, it is usually desirable to remove them from the target protein. The stringent sequence specificity of the tobacco etch virus (TEV) protease has made it a useful reagent for this purpose. However, a potential limitation of TEV protease is that it is believed to require a Gly or Ser residue in the P1' position of its substrates to process them with reasonable efficiency. Consequently, after an N-terminal affinity tag is removed by TEV protease, the target protein will usually retain a non-native Ser or Gly residue on its N-terminus, and in some cases this may affect its biological activity. To investigate the stringency of the requirement for Gly or Ser in the P1' position of a TEV protease recognition site, we constructed 20 variants of a fusion protein substrate with an otherwise optimal recognition site, each containing a different amino acid in the P1' position. The efficiency with which these fusion proteins were processed by TEV protease was compared both in vivo and in vitro. Additionally, the kinetic parameters K(M) and k(cat) were determined for a representative set of peptide substrates with amino acid substitutions in the P1' position. The results indicate that many side-chains can be accommodated in the P1' position of a TEV protease recognition site with little impact on the efficiency of processing.

In the field of proteomics, one of the main challenges is developing efficient methods for high-throughput expression and purification of soluble proteins for structural and functional studies. High-throughput purification requires a general approach that will be effective for proteins with diverse chemical properties. Genetically engineered affinity tags are ideal for this purpose because they can be exploited to devise generic purification protocols. Additional benefits of affinity tags such as Escherichia coli maltose-binding protein (MBP) [1] and Schistosoma japonicum glutathione S-transferase [2] include the ability to protect their fusion partners (passenger proteins) from intracellular proteolysis [3,4] and increase their yield [5]. A further advantage of MBP is its remarkable ability to enhance the solubility of its passenger proteins [4,6,7].
Because of concerns about the impact of affinity tags on the structure or activity of a passenger protein, it is ordinarily desirable to remove them. It is this step in the process that has proven to be the Achilles heel of the fusion approach. Both chemical and enzymatic methods have been used to cleave fusion proteins at engineered sites [5,8,9]. However, chemical reagents suffer from a lack of specificity and often work effectively only under severe conditions that may irreversibly damage the passenger protein [10]. Enzymatic reagents, on the other hand, function under milder reaction conditions and typically exhibit greater sequence specificity.
Most fusion proteins are engineered so that the N-terminus of the passenger protein is genetically fused to the C-terminus of the affinity tag with a linker region containing a protease recognition site between them. In principal, enteropeptidase (enterokinase) and factor Xa can be used to generate passenger proteins with native N-termini after digestion of a fusion protein substrate because their primary specificity determinants are Biochemical and Biophysical Research Communications 294 (2002) [949][950][951][952][953][954][955] www.academicpress.com BBRC N-terminal to the scissile bond. In practice, however, both of these proteases have been observed to cleave fusion proteins at locations other than the intended target site, often resulting in the degradation of the passenger protein [10][11][12][13].
In contrast to the aforementioned proteases, there have been no reported examples of cleavage at noncanonical sites in fusion proteins by the nuclear inclusion protease from tobacco etch virus (TEV protease). A potential disadvantage of TEV protease, however, is that its S1 0 pocket is presumed to be an important specificity determinant; amino acids other than Gly or Ser in this position are alleged to reduce proteolytic efficiency [14,15]. To investigate how stringent the requirement is for a Gly or Ser residue in the P1 0 position of the TEV protease cleavage site, we used a model fusion protein substrate, MBP-NusG, containing a canonical recognition site (ENLYFQ/G) that is processed very efficiently by TEV protease. Twenty MBP-NusG fusion proteins, each with a different amino acid in the P1 0 position, were constructed by site-directed mutagenesis. The efficiency with which these P1 0 variants were cleaved by TEV protease was compared both in vivo, using an intracellular processing system, and in vitro. Additionally, the kinetic parameters K M and k cat were determined for a representative set of peptide substrates with amino acid substitutions in the P1 0 position. These experiments constitute the first comprehensive investigation of P1 0 specificity in a potyviral protease.

Materials and methods
Plasmid expression vectors. pRK603 and bacterial strain DH5aZ1 were described previously [16]. pKM631, which produces an MBP-NusG fusion protein with Gly in the P1 0 position of the TEV protease recognition site, was constructed by polymerase chain reaction (PCR) amplification of the open reading frame (ORF) encoding NusG from Aquifex aeolicus genomic DNA with the following oligonucleotide primers: 5 0 -GAG AAC CTG TAC TTC CAG GGT ATG AGC GAG CAA CAG GTT CAG GAA C-3 0 and 5 0 -ATT AGT GAT GAT GGT GGT GAT GAA TCT TTT CCA CTT GGT CAA AGT CCA G-3 0 . This PCR amplicon was subsequently used as the template for another PCR with primers PE-277 and PE-278 [17], generating a second amplicon that was inserted by recombinational cloning into pKM596 [18] to yield pKM631.
The oligodeoxyribonucleotides used to construct plasmid expression vectors for the production of MBP-NusG fusion proteins with altered TEV protease recognition sites were: PE-431 (forward outer PCR primer), 5 0 -GGT TAA TAA AGA CAA ACC GCT GGG TG-3 0 ; PE-430 (reverse outer PCR primer), 5 0 -GCC ATG AGG AGC TTG TCG TTC ATG TG-3 0 ; PE-X (forward mutagenic PCR primer), 5 0 -TAC TTC CAG XXX ATG AGC GAG CAA CAG G-3 0 ; PE-Y (reverse mutagenic PCR primer), 5 0 -GTT GCT CGC TCA TXX XCT GGA AGT ACA GG-3 0 . The MBP-NusG expression vectors were constructed by overlap extension PCR [19]. First, two different PCR reactions were performed using pKM631 as the template: one with PE-431 and the appropriate reverse mutagenic primer (PE-Y), and the other with PE-430 and the corresponding forward mutagenic primer (PE-X). These two PCR amplicons were then combined and used as the template for a third PCR reaction with the outer primers, PE-431 and PE-430. The final PCR product was digested with NcoI and PstI, and then ligated with the NcoI/PstI vector fragment of pKM631. The same procedure was used to construct all of the P1 0 variants. The nucleotide sequence of each construct was confirmed experimentally.
In vivo processing experiments. Cells from single, drug-resistant colonies of E. coli DH5aZ1 containing an MBP-NusG fusion protein vector and pRK603, the TEV protease expression vector, were grown to saturation in 5 ml Luria broth [20] supplemented with the appropriate antibiotics (100 lg=ml ampicillin and 30 lg=ml kanamycin) at 37°C. The saturated culture was diluted 1:50 in the same medium and grown to early log phase ðA 600 ¼ 0:3 to 0:5Þ at 37°C, at which point the temperature was shifted to 30°C (the optimum temperature for TEV protease processing) and both isopropyl-b-D-thiogalactopyranoside (IPTG) and anhydrotetracycline (final concentrations of 1 mM and 100 ng/ml, respectively) were added to initiate the production of the fusion protein and TEV protease. After 4 h of shaking at 30°C, the cells from 10 ml of each culture were recovered by centrifugation and resuspended in 1 ml of 20 mM Tris-HCl (pH 8.0), 1 mM EDTA. The cell suspensions were lysed by sonication, after which aliquots of the cell lysate were mixed with an equal volume of 2Â SDS sample buffer [21] to produce samples of the total intracellular protein for sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE). The samples were heated at 90°C for 4 min and then centrifuged at 14,000g for 15 min prior to SDS-PAGE. Samples were analyzed on 10-20% Tris-glycine SDS-polyacrylamide gels (Novex) and visualized by staining with Gelcode Blue (Pierce).
Overproduction and purification of MBP-NusG fusion proteins. DH5aZ1 cells containing an MBP-NusG fusion protein expression vector were grown to saturation at 37°C in Luria broth supplemented with 100 lg=ml ampicillin. The saturated culture was diluted 1:50 into 1 L of the same medium and grown in a shake-flask to early log phase ðA 600 ¼ 0:3 to 0:5Þ. At this point, IPTG was added to a final concentration of 1 mM, and the culture was grown for an additional 4 h at 37°C. The culture was centrifuged at 10,000g for 10 min, and the cell pellet was stored at )80°C.
The cell pellet was dissolved in 10 ml lysis buffer (100 mM Na 2 HPO 4 , pH 7.8, 200 mM NaCl, 5 mM b-mercaptoethanol (BME), 0.1% Tween 20). The cells were lysed by sonication and then centrifuged at 10,000g for 20 min. The supernatant was mixed with 4 ml Ni-NTA resin (Qiagen), previously equilibrated with cell lysis buffer, and gently rocked at 4°C. After 1 h, the resin was loaded into a BioRad Poly-Prep column and then washed with 10 ml lysis buffer. The column was then washed with 20 ml of 100 mM Na 2 HPO 4 , pH 7.4, 200 mM NaCl, 5 mM BME, 10% glycerol, and 20 mM imidazole. Finally the MBP-NusG fusion protein was eluted from the column with 4 ml of 100 mM Na 2 HPO 4 , pH 7.4, 200 mM NaCl, 5 mM BME, 5% glycerol, and 300 mM imidazole.
To concentrate the protein for dialysis, 8 ml of saturated ammonium sulfate solution was added to the eluate, and the precipitate was recovered by centrifugation at 7000g for 20 min. The pellet was resuspended in 2 ml of TEV protease reaction buffer (50 mM Tris-HCl, pH 8.0, 0.5 mM EDTA, 1 mM DTT) and then dialyzed extensively against the same buffer at 4°C.
In vitro processing experiments. The TEV protease used for the in vitro processing experiments has an amino acid substitution (S219V) that greatly reduces its rate of autolysis without compromising its catalytic activity [22]. All reactions were performed in TEV protease reaction buffer at 30°C. The concentration of the MBP-NusG fusion proteins was held constant at 1 mg/ml (14 lM), while the concentration of TEV protease varied depending on the efficiency with which each P1 0 variant was processed. TEV protease concentrations were either 0.005 mg/ml (0:18 lM), 0.013 mg/ml (0:47 lM), 0.04 mg/ml (1:4 lM), or 0.2 mg/ml (7 lM). The in vitro processing reactions were initiated by combining the enzyme and substrate in 200 ll of TEV protease reaction buffer. At 30 min intervals, 15 ll aliquots was removed from the reaction over the course of 3.5 h and mixed with 85 ll 1Â SDS sample buffer. The samples were denatured at 90°C for 4 min and then subjected to SDS-PAGE. The proteins were visualized by staining with Gelcode Blue.
Statistical analysis and densitometry. At least three independent experiments were performed to obtain numerical estimates of the fraction of each fusion protein that was processed by TEV protease in vitro. Gelcode Blue stained gels were scanned with a Molecular Dynamics Personal Densitometer, and then the pixel densities of the bands corresponding to the intact fusion protein and the individual MBP and NusG moieties were obtained directly by volumetric integration. Using these pixel density values, the percentage of processed fusion protein at an individual time point was calculated by dividing the sum of the values for the processed MBP and NusG by the total amount of intact fusion protein plus the previously calculated MBP and NusG sum. Using Microsoft Excel, regression lines were drawn and error bars were calculated based on the standard deviation of the population.
Oligopeptide synthesis and characterization. Oligopeptide substrates for TEV protease (Thr-Glu-Asn-Leu-Tyr-Phe-Gln-Xaa-Gly-Thr-Arg-Arg-NH 2 ) were synthesized by standard 9-fluorenylmethyloxycarbonyl chemistry on a model 430A automated peptide synthesizer (Applied Biosystems). Stock solutions were made in distilled water and the peptide concentrations were determined by amino acid analysis, following hydrolysis in 6 N HCl, using a Beckman 6300 amino acid analyzer.
Enzyme kinetics. TEV protease assays were initiated by mixing 20 ll of TEV (S219V) protease solution (40-1500 nM) in 50 mM Na-phosphate, pH 7.0, 5 mM DTT, 800 mM NaCl, and 10% glycerol, with 20 ll of substrate solution (0.04-5.2 mM). The actual range of substrate concentrations was selected on the basis of the approximate K M values. Measurements were performed at six different substrate concentrations. The reaction mixtures were incubated at 30°C for 30 min and then stopped by the addition of 160 ll 4.5 M guanidine-HCl containing 1% trifluoroacetic acid. An aliquot was injected onto a Nova-Pak C 18 reverse-phase chromatography column (3:9 Â 150 mm, Waters Associates) using an automatic injector. The substrates and reaction products were separated by an increasing water-acetonitrile gradient (0-100%) in the presence of 0.05% TFA. To determine the correlation between peak areas of the cleavage products and their amount, fractions were collected and subjected to amino acid analysis. The k cat values were calculated by assuming 100% activity for the enzyme. Kinetic parameters were determined by fitting the data obtained at less than 20% substrate hydrolysis to the Michaelis-Menten equation, using the Fig. P program (Fig. P Software Corp.). Standard deviations for the k cat =K M values were calculated as described [23]. The k cat =K M value for the peptide with Val in the P1 0 position was determined from the linear part of the rate versus concentration profile.

Construction of MBP-NusG fusion proteins
A bipartite fusion between E. coli MBP and Aquifex aeolicus NusG (MBP-NusG) was used as a model substrate to investigate the P1 0 specificity of TEV protease (Fig. 1). This fusion protein was selected primarily because it is an efficient substrate for TEV protease, but also because it is well expressed and highly soluble in E. coli. Twenty MBP-NusG fusion protein expression vectors were assembled by overlap-extension PCR [19], each one encoding a different amino acid in the P1 0 position of an otherwise canonical TEV protease recognition site (ENLYFQX). Hexahistidine tags were added to the C-termini of the MBP-NusG fusion proteins to facilitate their purification.

Intracellular processing of fusion proteins
Initially, we exploited an intracellular processing system [16] to compare the efficiency with which the twenty P1 0 variants were cleaved by TEV protease in vivo. TEV protease was co-expressed with each of the MBP-NusG fusion proteins in E. coli DH5aZ1 cells, and the composition of the total intracellular protein was examined by SDS-PAGE (Fig. 2). Remarkably, every fusion protein except the Pro variant was processed to some extent under these conditions, suggesting that the requirement for a Ser or Gly in the P1 0 position is not as stringent as is generally presumed. In fact, quite a few of the substrates were processed to the same degree as the fusion proteins containing the canonical Gly or Ser in the P1 0 position. These experiments also revealed that the fusion proteins with b-branched hydrophobic residues or Glu in the P1 0 position are among the least efficient substrates for TEV protease.

In vitro processing of fusion proteins
Because many of the fusion proteins were processed nearly to completion by TEV protease in vivo, it was not possible to establish a hierarchy for the P1 0 specificity of the enzyme solely on the basis of the intracellular processing experiments. Therefore, taking advantage of the C-terminal hexahistidine tag, all 20 fusion proteins were purified by immobilized metal chelate affinity chromatography for in vitro processing experiments, as exemplified in Fig. 3 for the fusion protein with Gly in the P1 0 position. Wild-type TEV protease readily undergoes autolysis at a specific site to generate a truncated product with greatly diminished activity [22,24]. Accordingly, to ensure that all of the enzymes would remain active during the in vitro processing experiments, we employed a mutant protease (S219V) that is considerably more stable than the wild-type enzyme and just as catalytically active [22]. The fusion protein with Gly in the P1 0 position was used to establish the initial conditions for in vitro processing because it possesses what is considered to be an optimal recognition site [25]. The substrate concentration was held constant at 1 mg/ml ð14 lMÞ, while varying concentrations of TEV protease were utilized. Aliquots were removed from the reactions at regular intervals and analyzed by SDS-PAGE to monitor the extent of processing. At an enzyme to substrate ratio of 1:80, approximately 50% of the Gly variant was cleaved over the course of several hours (Fig. 4A).
It was clear from the intracellular processing experiments that not all of the fusion proteins would be cleaved as efficiently as the Gly variant. Therefore, only the substrates that were processed to at least 90% completion in vivo were examined in vitro at an enzyme to substrate ratio of 1:80 (Fig. 4A). These experiments revealed that the fusion proteins with Ser or Ala in the P1 0 position were processed even more readily than the Gly variant, which is the P1 0 residue most commonly employed in engineered TEV protease recognition sites. The Met, Cys, and His variants were also processed to a substantial extent after 3.5 h (30-40%).
Instead of increasing the duration of the reactions to obtain comparable data for the less efficient substrates, the enzyme concentration was adjusted while all of the other variables were held constant. At an enzyme to substrate ratio of 1:30, approximately 75% of the His variant was processed after 3.5 h (Fig. 4B). A substantial fraction (ca. 30-70%) of the Cys, Phe, Gln, Tyr, Asn, and Trp variants was also cleaved under these conditions but only a small portion of the remaining substrates was converted into products. Therefore, the processing of these less efficient substrates was subsequently compared at an enzyme to substrate ratio of 1:10. Under these conditions, the Asp, Thr, Glu, Leu, and Lys variants were all processed to a significant extent after 3.5 h (ca. 20-60%) but the remaining fusion proteins were not (Fig. 4C). Yet, except for the fusion protein with Pro in the P1 0 position, even the least efficient substrates for TEV protease (the Arg, Ile, and Val variants) were processed to a considerable extent (>50%) after only 3.5 h at an enzyme to substrate ratio of 1:2 (Fig. 4D). No processing (either specific or nonspecific) of the Pro variant was observed even after 24 h at an enzyme to substrate ratio of 2:1 (data not shown).

Processing of oligopeptide substrates
To corroborate the results obtained with the MBP-NusG fusion protein substrates, we also determined the kinetic parameters K M and k cat for a representative set of synthetic peptides with different residues in the P1 0 position (TENLYFQXGTRR-NH 2 ); these data are summarized in Table 1. Compared to the substrate with Gly in the P1 0 position, none of the P1 0 substitutions resulted in an improved catalytic constant (k cat ), while both higher and lower K M values were obtained. In general, the rank order of catalytic efficiencies ðk cat =K M Þ of the peptides was in very good agreement with the hierarchy of processing efficiencies established for the fusion protein substrates, although the MBP-NusG fusion proteins with Gln and Trp in the P1 0 position seemed to be processed with comparatively greater efficiently than were the corresponding peptides. The minor discrepancies between the results obtained with the MBP-NusG fusion proteins and the peptide substrates could be due to the different assay conditions (e.g., pH, ionic strength), which may influence the kinetics of substrate cleavage in the case of some P1 0 residues.

Discussion
In addition to their obvious utility for protein purification, affinity tags can improve the yield of recombinant proteins, protect them from intracellular proteolysis, and in the case of MBP, enhance their solubility [1,2,4,5,7]. However, it is ordinarily desirable to remove the affinity tag from the passenger protein for functional and structural studies. Enzymatic methods are most commonly employed to remove affinity tags, yet not all proteases perform this task equally well. While factor Xa, enteropeptidase, and thrombin frequently cleave proteins at noncanonical sites [10,11], TEV protease is highly specific and active over a wide  Table 1 Kinetic parameters for oligopeptide substrates a with amino acid substitutions in the P1 0 position range of pH and ionic strength [25]. However, until now it has been presumed that TEV protease exhibits a strong preference for Ser or Gly in the P1 0 position of its substrates. While it is certainly true that some of the P1 0 variants (e.g., the b-branched hydrophobic residues) are comparatively inefficient substrates for the enzyme, it is equally clear that many different residues can be accommodated in the P1 0 position of an otherwise canonical TEV protease recognition site with little impact on the efficiency of processing. These include Met and Ala, two of the most common natural N-terminal residues, and Cys. Polypeptides with an N-terminal Cys can be used to assemble segmentally labeled proteins for multidimensional heteronuclear NMR experiments [26], and the digestion of appropriate fusion proteins with TEV protease should provide a useful avenue for the creation of substrates for in vitro peptide ligation. Moreover, with the exception of the Pro variant, even the least efficient MBP-NusG fusion protein substrates were processed to a considerable degree after only a few hours when enough TEV protease was added to the reaction. Unlike factor Xa, thrombin, or enteropeptidase, TEV protease can be used at a high concentration without triggering nonspecific proteolysis. Consequently, our results imply that it should be possible to produce recombinant proteins with any N-terminal amino acid other than Pro by digesting a fusion protein with TEV protease, provided that the canonical recognition site is processed efficiently in the same context.
A previous mutational analysis of a naturally occurring TEV protease recognition site also led to the conclusion that certain amino acids other than Gly or Ser could occupy the P1 0 position without abolishing processing [14]. However, the only amino acids examined in this study were Ser, Ile, Asn, Arg, Thr, Phe, Cys, and Asp. Curiously, in contrast to the results reported here, these investigators observed that processing occurred even more rapidly when either Ile or Asn occupied the P1 0 position of the substrate than when Ser did, although neither of these residues is found in the P1 0 position of any natural potyviral cleavage sites. The other P1 0 variants that they tested were observed to be far less efficient substrates for TEV protease. To account for the discrepancies between our results, it is important to understand the differences between our methods. Because their substrate consisted of a relatively long segment of the TEV polyprotein, it is possible that tertiary interactions within the substrate or between the protease and the substrate influenced the results. Moreover, the protease they utilized was a nuclear inclusion body preparation that was composed of an equimolar mixture of the full-length 49 kDa nuclear inclusion protease and the 54 kDa viral replicase, whereas the TEV protease employed in this study was the soluble 27 kDa catalytic domain of the 49 kDa protease. In the previous study, nuclear inclusion bodies containing the 49 kDa TEV protease were added to the products of an in vitro translation reaction containing the viral precursor. The concentration of protease in these reactions was approximately 2 lM. The substrate concentration was not determined. However, considering the range of yields that have been obtained in the rabbit reticulocyte system [27], it could not have been more than 0:6 lM and is likely to have been much lower than this. Thus, in the previous study the concentration of enzyme exceeded that of the substrate, which was also far below the K M . The present study was conducted under more realistic reaction conditions, using pure preparations of enzyme and substrate with a substantial molar excess of the latter in most experiments. Unfortunately, it is not possible to glean any direct information about the substrate-binding pocket of TEV protease because its three-dimensional structure has yet to be determined. However, our results indicate that TEV protease can accommodate a variety of amino acids in the P1 0 position of the cleavage site with relatively little impact on the efficiency of processing. The most efficient substrates tended to be those with the shortest side chains (e.g., Ser, Ala, and Gly), which are also the three residues that occur most frequently in the P1 0 position of natural cleavage sites for potyviral proteases. Thr is the next most common P1 0 residue in the natural substrates, but our results indicate that Met, Lys, His, and Asp are all tolerated as well or better than Thr. It therefore seems likely that some other kind of selective pressure restricts the range of residue types that occur in the P1 0 positions of the natural cleavage sites in potyviral polyproteins.
It has not escaped our notice that some of the NusG proteins generated by intracellular processing should be substrates for the N-end rule degradation pathway in bacteria [28]. In particular, N-terminal Tyr, Trp, Leu, and Phe are believed to be primary destabilizing residues in E. coli. Although no effort was made to directly measure the half-lives of NusG proteins with potentially destabilizing N-terminal residues, the fact that none of them appeared to be present in substoichiometric quantities relative to the free MBP generated by TEV protease digestion suggests that they were not rapidly degraded in vivo. Although the rank order of relative destabilizing activities among the twenty amino acids is thought to be invariant from one protein reporter to another in a given environment, the actual in vivo halflives can differ greatly among different proteins bearing the same N-terminal residue [29]. Accordingly, determinants other than the N-terminal residue, such as the extreme thermostability of Aquifex aeolicus NusG, may make it an intrinsically poor substrate for the N-end protease ClpAP. In any case, we note that intracellular processing by TEV protease of fusion proteins with noncanonical residues in the P1 0 position could be used to generate N-end substrates in E. coli for further study.