Identification, Characterization and Purification of MSC_0265, a Potential Immunogenic Antigen Homologue of Mycoplasma mycoides subsp. mycoides in Mycoplasma capricolum subsp. capripneumoniae

In silico identification and characterization of vaccine antigens has opened up new frontiers in the field of reverse vaccinology to mitigate the effects of livestock diseases by development of new subunit vaccines. This study aims to characterize, express and purify MSC_0265 for eventual use in immunoassays and inoculation in goats. Mycoplasma mycoides subs. mycoides (Mmm) and Mycoplasma capricolum capripneumoniae ( Mccp) are similar pathogens on the genomic level and Short Communication are the causative agents of Contagious Bovine Pleuropneumonia (CBPP) in cattle and Contagious Caprine Pleuropneumonia (CCPP) in goats respectively. In this study, BLAST was used to identify the homology of MSC_0265 in Mycoplasma capricolum capripneumoniae genome and the protein it is similar to. Characterization of MSC_0265 was also done using I-TASSER to predict secondary structure, solvent accessibility, normalised B-factor, 3D models and function. With cut off points of 0.0 for E-value, 100% for Query coverage and 90% for Identity, MSC_0265 a pyruvate dehydrogenase enzyme gave a high homology score on tBLASTn and BLASTp. It had earlier been cloned in pGS21a vector before proceeding with expression and purification of the His-tagged protein by Ni-NTA affinity chromatography. This study identified the homologue of MSC_0265 as protein WP_029333261.1 in the Mycoplasma capricolum capripneumoniae genome (Accession NZ_LN515398.1) using tBLASTn and BLASTp. Additionally, MSC_0265 was characterized and its optimal expression profile and estimated molecular weight verified.


INTRODUCTION
Mycoplasma mycoides subsp. mycoides (Mmm), previously specified as Mmm Small Colony [1] and Mycoplasma capricolum subsp. capripneumoniae (Mccp), are two members of Mycoplasma 'mycoides cluster', and are responsible for Contagious Bovine Pleuropneumonia (CBPP) in cattle and Contagious Caprine Pleuropneumonia (CCPP) in goats respectively. Both diseases cause significant losses in livestock, particularly in Africa and parts of Asia, besides being threats to disease-free countries [2].
Studies have shown that species of the Mycoplasma 'mycoides cluster' can survive in non-primary hosts with bovine pathogens Mmm and M. leachii having been isolated from goats [3], and caprine pathogens, Mycoplasma capricolum subsp. capripneumoniae (Mccp) and Mycoplasma mycoides subsp. capri (Mmc) being isolated from cattle [4]. Isolation of different Mycoplasma 'mycoides cluster' members from a single diseased animal has also been reported [5], which has brought forth the idea of development of different variants arising from exchange of pathogen genetic material among animals kept together like in the case of pastoralism in most parts of Africa [6].
CBPP and CCPP are respiratory illness characterized by the presence of sero-fibrinous, interstitial pneumonia, interlobular oedema and hepatization giving a marbled appearance of the lung and capsulated lesions termed sequestra in the lungs of affected cattle and goats respectively. The occurrence of subacute, asymptomatic infections and chronic carriers after the clinical phase of the disease are generally believed to create major problems in the control of this disease. CBPP is present in the Middle East, Asia, and is now considered the most significant bacterial disease of cattle in Africa. The last reported occurrence of CBPP in Europe was in Portugal in 1999. CCPP causes high mortality rates of up to 80% [7].
Vaccines play a key role in the control and prevention of infectious diseases. Pasteur's procedure, to 'isolate, inactivate, and inject' the disease-causing agent, has played a key role in production of inactivated and live attenuated vaccines throughout the 20 th century [8]. Sequencing of the first bacterial genome, Haemophilus influenzae in 1995, and the arrival of the era of microbial genomics has led to many advances in vaccinology. The advent of genome sequencing technologies and use of bioinformatics tools has enabled researchers to explore the genomes of microorganism for purposes of antigen discovery and vaccine development.
The first protein based non-recombinant vaccine was a highly active vaccine to hepatitis B consisting of purified hepatitis B surface antigen (HBsAg) from human plasma [9]. The first vaccine to be produced using recombinant DNA technology was licensed in 1986 after the successful expression of HBsAg in yeast [10]. This new vaccine efficiently elicited protective antibodies after vaccination of chimpanzees [11], thereby replacing the plasma derived hepatitis B vaccine in human use.
The current CBPP and CCPP vaccines are live attenuated and which have a number of shortfalls hence necessitating the need for research and development of better vaccines. The new approach, currently known as Reverse vaccinology [12], was first described in 2000 to identify the novel meningococcal vaccine candidates from the genome sequence of a Neisseria meningitidis serogroup B strain. It involves the in silico analysis of the whole genome of the pathogen to identify genes that encode proteins with potential as vaccine candidates. A general starting point is the prediction of surface or intracellular proteins. These selected surface proteins are then expressed in E. coli or other expression vehicles and used to perform immunoassays and animal field challenge to analyse immune responses. However, the applications of reverse vaccinology projects have surpassed just the analysis of a single meningococcal genome [13]. Potential advantages this technique has over the current live attenuated vaccines include, being safer, have better efficacy, longer shelf life and are thermostable. Subunit vaccine candidates typically consist of surface proteins or polysaccharides [14].
Advanced sequencing technologies and high throughput proteomics have evolved to include the analysis of several genomes within a species like pan-genomic analysis of group B streptococcus (GBS) [15]. Additionally, closely related species for examples comparative genomic analysis of GBS, group A streptococcus (GAS) and Streptococcus pneumoniae as well as between pathogenic and commensal members of the same species like comparative genomics of E. coli have been analysed [16,17].
In silico identification of Mmm protein antigens from the complete genome sequence of PG1 [18] and Gladysdale [19] were done at VIDO -InterVac and a reverse vaccinology approach used to determine protein antigen localization, adhesion probability and finally vaccine candidate prediction of the proteins using Vaxign [20]. In total, 58 Mmm proteins were identified. Additionally, eight predicted Mmm surface proteins that had earlier been described elsewhere were also included [21,22], making a sum total of 66 proteins. These proteins were used to inoculate cattle and their immune responses analyzed after challenge. 13 proteins gave the best titres [23].
Other innovative and important vaccine research fields have come up, such as immunoproteomics, structural biology and systems biology. These have become instrumental in vaccine development and in aiding researchers in overcoming the limits of conventional approaches in the discovery and development of novel vaccines especially for microorganisms whose culture has proven a challenge [24,25,26].

Download and Analysis of Protein Sequences
Sequences of the 13 potential Mmm proteins were downloaded from NCBI (www.ncbi.nlm.nih.gov/) [27] on the protein database drop down tab and saved in FASTA format. Kyoto Encyclopedia of Genes and Genomes (KEGG) database was used to verify the protein sequences. KEGG was accessed on www.genome.jp/kegg [28]. Analysis of the predicted PDB protein structure of MSC_0265 was done by I-TASSER as explained on 2.4.

Pairwise Alignment of MSC_0265 and WP_029333261.1 Using EMBOSS Needle
The Mmm protein MSC_0265 was aligned to its Mccp homologue WP_029333261.1 using EMBOSS Needle accessed on; www.ebi.ac.uk/Tools/psa/emboss_needle/. EMBOSS Needle is a pairwise alignment program that creates an optimal global alignment of two sequences using the Needleman-Wunsch algorithm. Identity is the percentage of identical matches between the two sequences over the reported aligned region (including any gaps in the length). Similarity is the percentage of matches between the two sequences over the reported aligned region (including any gaps in the length) [30].

I-TASSER Analysis of MSC_0265
I-TASSER (Iterative Threading ASSEmbly Refinement) is a hierarchical approach to protein structure and function prediction. Structural templates are first identified from the PDB by multiple threading approach LOMETS. For each target, I-TASSER simulations generate a large ensemble of structural conformations, called decoys. To select the final models, I-TASSER uses the SPICKER program to cluster all the decoys based on the pair-wise structure similarity, and reports up to five models which corresponds to the five largest structure clusters. The confidence of each model is quantitatively measured by C-score that is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. C-score is typically in the range of [5,2], where a C-score of a higher value signifies a model with a higher confidence and vice-versa. TM-score and RMSD are estimated based on C-score and protein length following the correlation observed between these qualities. Since the top 5 models are ranked by the cluster size, it is possible that the lower-rank models have a higher C-score in rare cases. Although the first model has a better quality in most cases, it is also possible that the lower-rank models have a better quality than the higher-rank models as seen on benchmark tests. If the I-TASSER simulations converge, it is possible to have less than 5 clusters generated; this is usually an indication that the models have a good quality because of the converged simulations [31,32,33].
This server was accessed through http://zhanglab.ccmb.med.umich.edu/I-TASSER for prediction of secondary structure. The secondary structure is predicted based on sequence information from the PSSpred algorithm [33], which works by combing seven neural network predictors from different parameters and PSI-BLAST [29].
Solvent accessibility and normalized B-factor, (Bfactor is a value to indicate the extent of the inherent thermal mobility of residues/atoms in proteins). In I-TASSER, this value is deduced from threading template proteins from the PDB in combination with the sequence profiles derived from sequence databases. The reported B-factor profile in the figure below corresponds to the normalized B-factor of the target protein, defined by B=(B'-u)/s, where B' is the raw B-factor value, u and s are respectively the mean and standard deviation of the raw B-factors along the sequence [31,32,33].
After the structure assembly simulation, I-TASSER uses the TM-align structural alignment program to match the first I-TASSER model to all structures in the PDB library. This section reports the top 10 proteins from the PDB that have the closest structural similarity, i.e. the highest TMscore, to the predicted I-TASSER model. Due to the structural similarity, these proteins often have similar function to the target.PDB files for analysis were selected based on best hits.

Expression and Purification of MSC_0265
The Qiagen Expressionist and Purification manual (Protocol 7) was used for expression.
BL21 STAR E. coli strain containing the histidine tagged protein was grown overnight in 5 ml of phytone media with 5µl of Ampicillin (100 mg/ml), (Sigma) then transferred to 2 litre conical flasks containing 1 litre phytone media (BD Biosciences) the following day containing 1ml of Ampicilline (100 mg/ml). The shaker-incubator used was Innova 4430 set at 37ºC at 221 RPM. Induction with IPTG 1 mM (Thermo Scientific) was done after attainment of 0.25 ODs using the Ultrospec 3100 pro spectrophotometer and 2 ml samples taken at 0 hrs (before induction), 2 hrs, 4 hrs, 6 hrs, and 16 hrs (Overnight) after induction.
For purification, Protocol 17 on the Qiagen QIAexpressionist™ manual was used.Purification process was done after lysis of the overnight culture with 8M urea lysis buffer and centrifugation. Cleared lysate from 1litre culture, Ni-NTA resin, Empty columns and buffers (lysis, wash and elution) were of interest in this process. 50% Ni-NTA Agarose was added to lysate in the ratio of 1:4 and mixed gently by shaking (200 rpm on a rotary shaker) for 15-60 minutes at room temperature. The capacity of each of the columns was about 12 ml and were soaked in distilled water before use. For each sample, a 50 ml corning tubes was used for the Cell lysate, Flow Through, Wash 1, Wash 2 and Wash 3. 15 ml corning tubes were also prepared for Elution 1 and Elution 2. The lysate were then loaded -resin mixture carefully into an empty column. The first, second and third wash were done using 8 ml of Buffer C and contents collected in the 50 ml Wash 1, 2 and 3 tubes respectively. Elution 1 and 2 were done using 4 ml of Buffer D and E respectively and contents collected in corresponding 15 ml Corning tubes.
10% separating and 4% stacking gels were then prepared for SDS-PAGE analysis. Staining was done using PageBlue protein staining solution (Thermo Fisher Scientific). The marker of choice was the 10-170 kDa PageRuler™ prestained protein ladder (Thermo Scientific).

Quantification of MSC_0265
Quantification was done using the BIORAD DC Protein assay consisting of Reagent A (alkaline copper tartrate solution), Reagent B (dilute Folin reagent) and reagent S. A standard 96 well plate together with 8 standard tubes (0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4 mg/ml) were used. Qiagen Buffer D was used as blank. 5 µl of standards and samples were pipetted into a clean dry microtiter plate, 25 µl of reagent A added into each well followed by addition of 200 µl of reagent B then plate gently agitated and absorbance (750 nm) read after 15 minutes.

tBLASTn analysis
When the 13 Mmm protein sequences were subjected to a tBLASTn analysis with cut off parameters set as 0.0 for e-value, 100% for query coverage and 90% for Identity, the proteins MSC_0160 and MSC_0265 gave the best homology scores against the Mccp (F38 strain) genome (Table 1).
MSC_0160 had an E-value of 0.0, Query coverage of 100% and Identity of 99% while MSC_0265 had an E-value of 0.0, Query coverage of 100% and Identity of 90% (Table 1). With scoring parameters of BLOSUM62 as the matrix, Gap costs existence as 11 and Extension as 1 with a conditional compositional score matrix adjustment, an inference can be made that there is shared ancestry between Mmm and Mccp genomes shown by the high homology scores of the Mmm proteins in the Mccp genome.  Table 2). This therefore infers presence of conserved gene encoding proteins with similar functions and common ancestry among the two members of the Mycoplasma 'mycoides cluster'.

Pairwise Alignment of MSC_0265 and WP_029333261.1 Sequences Using EMBOSS Needle Program
The pairwise alignment of MSC_0265 (Accession number NP_975264.1) and WP_029333261.1 on the EMBOSS Needle platform generated an Identity of 368/370 (99.5%), similarity of 369/370 (99.7%) with 0 gaps and gap penalty of 10 on the EBLOSUM62 matrix parameter. Mismatches were at positions 38 and 100 only (Fig. 1). The high degree of similarity and identity therefore leads to prediction of similar characteristics, features and metabolic functions within the respective genomes.

Secondary structure prediction of MSC_0265 by I-TASSER
The sequence of MSC_0265 was submitted to the I-TASSER suite and the predicted secondary structure suggests that this protein is an alphabeta protein, which contains 16 alpha-helices (in red) and 7 beta-strands (in blue). "H," "S," and "C" indicate helix, strand, and coil, respectively (Fig. 2). Secondary structure prediction is important in the study of protein structure classification and function.

Prediction of solvent accessibility
The predicted solvent accessibility is presented in 10 levels, from buried (0) to highly exposed (9) and is predicted by the SOLVE program (Y. Zhang, unpublished). Most surface parts of this protein are buried (Fig. 3) and are therefore minimally accessible to solvents.

Prediction of normalised B-factor
The regions at the N-and C-terminals and most of the loop regions are predicted with positive normalized B-factors, indicating that these regions are structurally more flexible than other regions. On the other hand, the predicted normalized B-factors for the alpha and beta regions are negative or close to zero, suggesting these regions are structurally more stable as shown in Fig. 4.
B-factor is a value to indicate the extent of the inherent thermal mobility of residues/atoms in proteins. In I-TASSER, this value is deduced from threading template proteins from the PDB in combination with the sequence profiles derived from sequence databases.

Predicted 3 dimensional models of MSC_0265
By having the highest C-score of 1.79, Fig. 5 (Model 1) was predicted to be closest to the target protein followed by Fig. 6 (Model 2) with a C-score of -2.99 then finally, Fig. 7 (Model 3) with a C-score of 0.28.

Predicted function using COACH
COACH is based on the I-TASSER structure prediction. COACH is a meta-server approach that combines multiple function annotation results from the COFACTOR, TM-SITE and S-SITE programs.

Ligand binding sites
The highest ranked hit below generated a Cscore of 0.87, cluster size of 153 denoting number of templates in the cluster with predicted ligand name of TDP. There were 17 ligand binding site residues as shown in Table 3.    166,167,168,171,193,195,197,198,199,264 Predicted binding ligands are shown by green yellow sphere and binding residues by blue ball and stick.

Enzyme commission numbers (EC) and active sites
The highest ranked PDB hit 1qs0A was generated with a confidence C-score EC of 0.734.
TM-score which is the measure of the structural similarity between query and template protein was 0.873. RMSD α is the root mean square division between residues that are structurally aligned by TM-align. The Identity α which was 0.293 is the percentage sequence identity in the structurally aligned region. The Enzyme commission number was 1.2.4.4 and active site residue being 264 (Table 4).  Predicted Enzyme commission active site residues are shown by coloured ball and stick in Fig. 9. on SDS-PAGE of 10% separating gel and 4% stacking gel stained using page blue. There is no leaky expression at 0 hours and thereafter the expression of the protein is consistent over the hours shown. Aliquots of expressed protein at each time zone were loaded on the gel as shown. Phytone media therefore effectively supported the growth of the BL21 STAR (DE3) E. coli containing the insert. Its use as a cell factory is well established and studied thereby making it the most popular expression platform. The estimated molecular weight was confirmed by visualization to be 74kDa.

Purification of MSC_0265
After expression and lysis of the E. coli, an aliquot of the overnight culture was subjected to a purification process. Non specific proteins are washed off through the column containing Nickel-NTA Agarose resin during first wash cycles with buffers at higher PH. The histidine tagged protein that attaches to the resin are then eventually eluted at PH 4.5. Efficiency of the process is determined by presence of minimal background on the gel as shown in Fig. 11.

Quantification of MSC_0265
Protein quantification is important in the analysis of proteins expression potential especially if a protein would eventually be produced in an industrial setup. It also sheds light on proteins that might need optimization of expression and purification conditions or determination of generally poor expressers. From the yield of a 1 litre culture of MSC_0265, 2.748mg of protein was generated (Table 5), which is considered as normal and therefore this infers that this protein's production can be scaled up.

CONCLUSION
MSC_0265, is a pyruvate dehydrogenase E1 protein and first component enzyme of the pyruvate dehydrogenase complex (PDC). The pyruvate dehydrogenase complex facilitates the transformation of pyruvate into acetyl-CoA in a process called pyruvate decarboxylation. Acetyl-CoA may then be used in the citric acid cycle to carry out cellular respiration, therefore pyruvate dehydrogenase contributes to linking the glycolysis metabolic pathway to the citric acid cycle and releasing energy through Nicotinamide adenine dinucleotide (NADH).
MSC_0256 was identified as a potential high score Mmm immunogenic homologue in the Mccp (F38 strain Accession NZ_LN515398.1) genomeas protein WP_029333261. The homologue in Mccp was found to have similar sequence length and function. It wascharacterized and found to have a secondary structure that is alpha-beta in nature with mostly buried solvent accessibility regions. The Histidine-tagged protein, was successfully expressed and purified clearing the way for performing immunoassays, upscale and eventual challenge of goats.