A polarisable QM/MM description of NMR chemical shifts of a photoreceptor protein

ABSTRACT We present a polarisable QM/MM investigation of NMR chemical shifts of a photoreceptor protein belonging to the Blue Light-Using Flavin family. Two different structures have been proposed for this photoreceptor which show a large variability in terms of the position and orientation of the protein residues around the flavin chromophore. Here, the two structures have been investigated with our multiscale approach using both DFT and MP2 level of theory. The picture that comes out comparing the H chemical shifts of the flavin and the most strongly interacting protein residues with the available experimental data, indicates a different behaviour of the two structures, with one showing a better correlation with NMR measurements. This shows that hybrid quantum chemical-classical simulations of NMR chemical shifts can indeed become a valuable tool to investigate the structure of complex biosystems. GRAPHICAL ABSTRACT


Introduction
Quantum chemical simulations of NMR chemical shifts have been largely used to elucidate structures of molecu les [1][2][3][4][5], and to investigate solvent effects when combined with continuum solvation models or MM descriptions [6][7][8][9][10][11][12]. As a matter of fact, QM calculations of NMR properties could also represent an invaluable tool to study complex biological systems [13][14][15]. So far, however, the applications in this field are scarce mainly due to the high computational cost and the difficulty of a correct comparison with available experimental data.
In this study, we present an example that shows the potential of this strategy applied to photoresponsive proteins in particular. In these cases, in fact, the quantum bouring amino acids and lead to signal propagation, do not involve any intramolecular rearrangement of the chromophore upon excitation but changes in its hydrogen-bond environment [18,19]. Flavins in fact contain two hydrogen-bond acceptor C(2,4)O sites and one donor N(3)H site in the pyrimidine moiety. The H-bond changes around the flavin in the light-activated state are revealed by a characteristic 10-15 nm redshift of its first π -π * transition and a 20 cm −1 downshift of the C(4) = O stretching vibration compared to the dark-adapted state. Despite numerous experimental [17,[19][20][21] and computational [22][23][24][25][26][27] studies, the exact nature of the signalling state and the mechanism of photoactivation are not well understood and, in some aspects, controversial. What is commonly accepted is that the light-activation mechanism includes two essential amino acids, a conserved glutamine and a conserved tyrosine (Figure 1). The hydrogen bond responsible for spectroscopic changes is most likely donated by the conserved glutamine residue, which may either rotate or tautomerize in the light-adapted state to facilitate the new hydrogen bond [18,24,25].
In this study, we focus on one of the most widely studied proteins of the BLUF family, AppA, which acts as a transcriptional antirepressor of photosynthesis gene expression in Rhodobacter sphaeroides, which regulates cellular responses to light and oxygen. The dark state structure of AppA has been investigated with both X-ray [28,29] and nuclear magnetic resonance (NMR) techniques [30] but many questions remain about the position and orientation of the residues forming the binding pocket for the chomophore (here a flavin mononucleotide). The first crystal structure (PDB: 1YRX) [28] showed a tryptophan (TRP104) located close to flavin in the so-called Trp in conformation ( Figure 1(A)), whereas a subsequent crystal structure (PDB: 2IYG) [29] presented a different arrangement where the tryptophan is pointing away from the flavin while a methionine (MET106) is close to it (Met in conformation in Figure 1(B)). To further complicate the picture, solution NMR structures have shown a majority of snapshots in the Trp in conformation, but also some in the Met in conformation [30]. The different position of tryptophan and methionine corresponds to a different H-bond network and a different conformation of the other residue strongly interacting with the chromophore, the glutamine residue (GLN63). In particular, the Met in structure suggests that, in the dark, the glutamine side chain is oriented with its carbonyl oxygen near the OH group of the tyrosine residue (TYR21) and its amide nitrogen close to the flavin O4: in response to light this orientation is flipped by ∼ 180 degrees. This interpretation is exactly opposite to that given on the basis of the Trp in structure, where instead the dark state is characterised by the GLN63 carbonyl group oriented toward TRP104 (see Figure 1(A)). It has, however, to be noted that an unambiguous assignment of the conformation of the glutamine side chain is difficult to be obtained through X-ray diffraction data, because the electron densities of the nitrogen atom of NH 2 amino and oxygen atom of carbonyl of the GLN63 side chain are similar. As a matter of fact, the solution NMR investigation shows a highly flexible side chain for the glutamine. The research presented here aims at clarifying the structure of the dark state of AppA by calculating NMR chemical shifts of both flavin and the most interacting residues in the two different crystal structures available (Trp in and Met in ) and comparing them with the available experimental data. A multiscale approach is used in which quantum chemical calculations are integrated with a Molecular Mechanics (MM) model in such a way that the two descriptions are fully coupled through a polarisable MM embedding based on induced dipoles (from now indicated as MMPol) [31]. NMR calculations have been performed at both the B3LYP and secondorder Møller-Plesset (MP2) perturbation theory level so to have a comparison of two conceptionally different schemes thus providing some indication for the accuracy and reliability of the computational results. As far as we know, this is the first time that a MP2/MMPol approach is used to compute NMR properties, and as such, a short summary of the main aspects of the MP2/MMPol implementation and of its extension to nuclear magnetic shieldings is reported before presenting the application to AppA.

The QM/MMPol approach
In the MMPol model, the environment is represented by endowing each MM atom i with a point charge q i and with an isotropic polarisability α i , so that the electric field produced by the overall charge distribution induces a point dipole μ i at the classical atoms. Within this framework, the MMPol energy reads In Equation (1), we have introduced a generalised T L 1 L 2 ij and a modified T L 1 L 2 ij Coulomb interaction [32,33], namely: where s qq ij is a scaling factor [34], that can be set to zero to exclude an interaction, for instance between two bonded atoms, or to an intermediate value to scale it, according to the specific force field exclusion rules. Moreover, the damping function λ 1 (|r i − r j |) ensures that the Coulomb interaction between two point multipoles does not diverge when the two get too close, and is a fundamental ingredient of polarisable force fields, as it allows them to avoid the so-called polarisation catastrophe. Within the same formalism, the electrostatic potential and the electric field due to the QM charge density at the polarisable sites, become where Z J are the nuclear charges, χ μ and χ ν atomic orbital basis functions, P the one-body density matrix, and we have used the generalised Coulomb operator: In Equation (1), E MM i is the field due to the MM charges at the polarisable sites and the induced dipoles solve the following linear system of equations At the SCF level, the Hartree-Fock and MMPol equations are solved self-consistently. This is achieved by solving Equation (6) at each SCF iteration using a QM field computed with the current SCF density and then updating the Fock operator with the MMPol contribution The MP2 energy is computed using the molecular orbitals (MO) and orbital energies φ p , p obtained by solving the coupled HF/MMPol equations: where i, j, . . . denote occupied molecular orbitals and a, b, . . . virtual ones We note that no explicit MMPol contribution is included in the MP2 energy: this coupling scheme, which in the continuun solvation literature is known as Perturbation to the Energy (PTE), is consistent with a second-order perturbative expansion of the energy.

MP2/MMPol nuclear shieldings
MP2/MMPol NMR shielding tensors can be computed as analytical second derivatives of the MP2/MMPol energy with respect to the nuclear magnetic moment and an external magnetic field. As MP2 analytical second derivatives [35][36][37], as well as the calculation of magnetic properties with a polarisable embedding at the SCF level [15,38], have been discussed extensively elsewhere, we limit the discussion to the specific MMPol contribution to the MP2 corrections to the shielding tensors. In the following, we assume that the AO basis functions are fielddependent London Orbitals [39,40], which ensure the Gauge-origin invariance of the computed results. By first differentiating with respect to the nuclear magnetic moment of the j-th nucleus m j , we get where h m μν is the derivative of the one-electron integrals with respect to the nuclear moments and P MP2 = P HF + P (2) is the MP2 relaxed density matrix. In the MO basis, the second-order correction to the density is obtained from the MP2 amplitudes (occupied-occupied and virtual-virtual blocks) and by solving a set of perturbation independent, coupled-perturbed HF (CPHF) equations (occupied-virtual block), as reported by Gauss [35,41]. Such equations are usually referred to as to the Z-vector equations [42]. In the presence of a polarisable embedding, the CPHF equations include an explicit contribution from the environment. Using Casida's notation [43], the CPHF equations read jb (Ã +B) ai,bj P (2) bj = −L ai , and L is the so-called MP2 Lagrangian By further differentiating with respect to an external magnetic field, we get where h m j ,B μν is the second derivative of the one-electron integrals with respect to the nuclear moment m j and the external field B, P MP2,B μν = P HF,B μν + P MP2,B μν , and In equation (14), C pμ are the MO coefficients and their derivatives, that are parametrised in terms of the orbital rotations U pq . The latter are obtained from the MO orthonormality conditions (occupiedoccupied and virtual-virtual blocks) and from the solution to the magnetic CPHF equations (occupied-virtual block): where being G the usual two-electron contribution to the Fock operator, which contains the MMPol contribution in equation (7), G B the same, but evaluated with derived integrals, and S the overlap matrix.
As the magnetic Hessian does not include explicit MMPol contributions [15,38], the magnetic CPHF equations in the presence of the polarisable environment only require a modified right-hand side, as in equation (17). The same applies for the MP2 coupled-perturbed equations that are needed in order to evaluate the MP2 density derivatives. As these are very cumbersome, we do not report all the expressions, which can be found in Ref. [35]. We only point out here that, again, the polarisable environment does not modify the MP2 response kernel, but gives a contribution to the right-hand side of the response equations, i.e. the Lagrangian derivative, which only involves the induced dipoles and is of the form where the E B μν integrals are electric field integrals computed with differentiated London orbitals. All the other contributions are assembled using information available from previous intermediate computations, i.e. the HF/MMPol Fock matrix, MO coefficients, Magnetic CPHF right-hand side and MP2 relaxed density.
On a more technical note, we remark that all the purely MMPol quantities, including the solution to the MMPol equations, are computed with linear scaling computational cost using our FMM-based implementation [34,44], so that the overall cost of adding a polarisable environment is negligible with respect to the cost of the QM calculation.

Structures
The crystal structures of Trp in (PDB: 1YRX) and Met in (PDB: 2IYG) AppA were refined by adding Hydrogen atoms using the LeAP module of AmberTools. The protein was protonated at pH 7 using the H++ web server [45]. Hydrogen atoms were minimised with Amber at the MM level using the Amber ff14SB force field [46]. In order to minimise the bias coming from the unrefined crystal structure, for each crystal structure we optimised the FMN molecule at the B3LYP/6-31G(d) level of theory within a fixed protein environment described at MM level, in the ONIOM scheme [47]. In these optimizations, the tail of the flavin mononucleotide was kept mobile but was not included in the QM part. The MM part was described with the Amber ff14SB force field.

Chemical shifts
Nuclear shieldings were calculated using B3LYP/6-311+G(d,p) and MP2/6-311G(d,p) levels of theory. All the MP2 calculations were performed without frozen core orbitals. The QM part consisted of the isoalloxazine ring and the first carbon of the tail, as shown in Figure 2. In order to account for quantum mechanical effects of the most strongly interacting residues, we also added to the QM part the residues shown in Figure 2, which form a hydrogen-bonding network with the flavin. In particular, we considered the effect of ASN45, which is hydrogen bonded to the H3 atom of flavin, GLN63, which is hydrogen bonded to N5 and presents different conformations in the Trp in and Met in structures, and TRP104 or MET106, respectively, for the two structures. Finally, we also considered TYR21 which does not form directly a hydrogen bond with the flavin, but with GLN63. 1 H chemical shifts are reported with respect to TMS.  [48], whereas for the protein atoms we used the standard PDB atom names.

Results
In this section, we summarise the main results of our NMR calculations for AppA. In this analysis, DFT(B3LYP) and MP2 results are presented and, where available, compared with available experimental data. The presentation of the data is divided into parts: one focused on the 1 H, 15 N, and 17 O of the flavin chromophore and the second on 1 H of the protein residues forming the flavin binding pocket and most strongly interacting with it.

Flavin
In Table 1, we report the 1 H chemical shifts obtained at B3LYP and MP2 level for flavin in the two refined crystal structures of AppA, namely Trp in and Met in (see Computational details). In particular, four different models have been tested and compared. The gas-phase one considers an isolated flavin molecule but using the structure optimised within each of the two protein environments. The other models use the same structures but include the effects of the protein also in the calculation of the chemical shifts. In particular, the Full-MMPol model describes all the protein residues at classical level while the other models extend the QM subsystem to the H-bonded asparagine residue (ASN) or to the four residues more strongly interacting with the flavin (Cluster): see Figure 2 for the representation of the QM clusters for the two investigated crystal structures. In both ASN and Cluster models, the protein residues not directly included in the QM subsystem are described at MMPol level. Because of the large computational cost, the Cluster model at MP2 level has been reduced removing from the QM subsystem the tyrosine residue (TYR21) which is not directly connected to the flavin. To check the effect of this change, the B3LYP calculations have been repeated also for the reduced clusters, finding that the effects of including tyrosine in the QM subsystem are negligible on the NMR chemical shifts (see Table 1).
As it can be seen from the values reported in Table 1, the two QM descriptions give very similar results when moving from one proton to the other and, for the same proton, when changing crystal structure. This shows that the description of the different chemical and protein environment is reproduced in the same way with the two different levels of theory. By comparing the chemical shifts computed in gas phase on the Trp in and Met in structures, we can also see that the internal geometry of flavin has a small effect in determining differences between the two structures.
As a further analysis, in Figure 3 we report the change in nuclear shieldings for nitrogens and oxygens when moving from the gas-phase to Full-MMPol and the Cluster models (the complete set of nuclear shieldings is reported in Supplemental online material). As it can be seen from the graphs, the two QM levels give a similar description of the effects of the protein and also a similar behaviour with respect to the selected structure. We note, however, that MP2 data seem to show a larger sensitivity to the selected crystal structure, especially for nitrogens.
Once we have verified the similarity between DFT and MP2 results, we can move to a detailed analysis of the single nuclei and compare with experimental data. In this analysis we focus on DFT data, possibly employing MP2 results when the two methods give significant differences.
It is known that the chemical shift of N(3)H reacts quite sensitively to hydrogen-bonding interaction; in aprotic solvents (e.g. in CHCl 3 ), the resonance appears at ∼ 8 ppm, whereas in DMSO, a hydrogen-bond acceptor, it appears at ∼ 11 ppm [48]: this corresponds to a δ value of ∼ 3 ppm. In the AppA, a value of 11.16 ppm has been measured for such a proton which clearly indicates a strong H-bond interaction with the close-by asparagine  residue. Our calculations well reproduce these observations. In fact, N(3)H shows the largest shift when moving from the gas-phase to the Cluster model: δ is ∼ 3 ppm at B3LYP level for both structures and 3-5 ppm at MP2 level. It has to be noted that this shift due to H-bonds effects is only partially reproduced by a fully classical description: the δ value moving from gas-phase to Full-MMPol is around 1 ppm at B3LYP level and 1-2 ppm at MP2. The differences between Full-MMPol and cluster models for the N(3)H clearly show that charge-transfer effects between flavin and the ASN residue are possible. It has also been shown that the chemical shift of N(3)H increases by ∼ 0.6 ppm moving from the dark state to the light-induced signalling state [49]. Either a change in the electronic structure of the flavin isoalloxazine ring, or a change in the local residue environment has been invoked to explain the shift. As reported in the Introduction, a possible photo-induced change in the protein around flavin is the rotation of glutamine with the consequent change of H-bonding network around it. This change is indeed present in the two investigated structures (see Figure 1): we can thus use the differences calculated between them to compare with the dark-tolight changes. Looking at our calculated values, we see that in the B3LYP description, the change is ∼ 0.4 ppm when using the Cluster model while it becomes half if the Full-MMPol model is used. At MP2 level, instead, a large change (∼ 1.2 ppm) is found with the Full-MMPol model already, and it even doubles when the cluster model is used. This seems to show that an accurate evaluation of the chemical shift of the N(3)H proton is extremely challenging due to its high sensitivity to both the QM level and the local environment; nonetheless, our data confirm that a change in the local residue environment could explain the observed change in the chemical shift of N(3)H after illumination.

Protein residues
In addition to the flavin nuclei, the NMR measurements on AppA also allowed to quantify the chemical shifts of the protons of the most interacting residues [30]. As in the Cluster model we have access to these data in both the Trp in and Met in structures, we can extend the comparison with experiments also to the protein residues which most strongly interact with flavin. This comparison is shown in Figure 4 where we report a correlation plot between the calculated and measured 1 H chemical shifts for GLN63, TYR21, ASN45, TRP104 and MET106. In the same plot, we have also reported the values for the flavin protons.
In Trp in , an evident outlier is represented by the HD21 of asparagine (see Figure 2), which in our calculations presents a too small chemical shift, 3.53 ppm vs the measured 6.97 ppm. A much better agreement is instead found for the same proton when the Met in structure is used. We note that this behaviour is further confirmed by the MP2 description. In parallel, we note that the other proton in the same amino group (HD22), which is hydrogen bonded to O4, shows a higher chemical shift, 9.16 ppm in Trp in than in Met in , 7.99 ppm and the latter is much closer to the experimental value (7.43 ppm). This overestimated value in Trp in can be explained by the formation of a stronger hydrogen bond in this structure with respect to the Met in . In fact, we expect that the stronger hydrogen bonding there is, the more the proton is deshielded and the higher its chemical shift will be. Looking at the H-bond distance between O4 and HD22 we indeed find ∼ 1.84 Å in Trp in while it is around 2 Å in Met in . It is certainly true that the experimental values are the result of a thermal average over the protein motions which are instead neglected in our calculations. We expect that the HD21 and HD22 would show less pronounced differences if we took into consideration the fluctuations in the hydrogen bond lengths. However, the very different findings obtained in the two structures for HD21 can only be explained in terms of the different local environment consisting of TRP104 and MET106 in Trp in and Met in structures, respectively. Another interesting observation concerns the chemical shifts of the amino protons (HE21 and HE22) in GLN63: the values calculated for the Trp in structure are too large, thus indicating an excessive deshielding. The analogous chemical shifts calculated for the Met in structure, instead, are in good agreement with experiments. This time, however, the B3LYP behaviour is not confirmed by MP2 which, for these two protons, give similar accuracy for the two structures.
We further note that thermal fluctuations of the protein could also open channels inside the structure, which facilitate water solvation of the binding pocket, and in turn affect the chemical shifts of the flavin and residue side chains. There are no water molecules, in either structure, close to the flavin. However, in the Met in structure, there is a closed channel, adjacent to MET106, that could mediate the entry of water molecules into the flavin binding pocket ( Figure S1B). This channel was not found in the Trp in structure, because of the bulkier tryptophan residue in place of the methionine ( Figure S1A).
Finally, still in the Trp in structure, our calculations give an overestimated chemical shift for HZ2 in TRP104. To better understand this result, we have repeated the calculations of the chemical shift of TRP104 in the Met in structure: to do that we have enlarged the cluster so to include also this residue that, we recall, in this structure is placed much farther away with respect to flavin. The results obtained for this enlarged cluster in the Met in structure indeed improve the agreement with experiment. Combining the results of this and the previous section, we can conclude that our QM/MMPol investigation seems to show a more accurate agreement between calculations and experiments for the Met in structure: this is particularly evident if we use a B3LYP description, whereas MP2 reduces the differences between the two structures.

Summary
In this study, we have combined a polarisable MM model with both DFT and MP2 descriptions to investigate NMR chemical shifts of the BLUF photoreceptor AppA. This system has been selected because there is still a large debate in the literature not only about the molecular mechanism involved in its photoactivation but also on the real structure of the dark state. The available experimental data coming from X-ray and NMR measurements have in fact a large variability in terms of the position and orientation of the protein residues around the chromophore (a flavin). In particular, two different structures have been assessed and both of them have been investigated and compared in the present study. These two structures present the same five residues involved in the binding pocket, with two of them, namely methionine and tryptophan, exchanging their position close to the flavin. In parallel, the two structures report a different orientation of the glutamine which eventually induces a different H-bond network around the flavin. The picture that comes out, analyzing the 1 H chemical shifts of flavin and the protein residues that most strongly interact with it in the two structures, seems to indicate a better agreement with experiments for the structure that shows a methionine close to the flavin (Met in ) and the glutamine oriented with its amino group toward it, leaving the carbonyl active in H-bond interactions with the nearby tyrosine. We have, however, to add that this better agreement is very clear when a B3LYP level of calculation is used, whereas using an MP2 description the differences between the two structures become smaller. Moreover, the study here presented has been based on a static approach which uses refined crystal structures. A very important improvement would therefore be that of adding the dynamics of the system in its natural environment. Efforts in this direction are in progress in our group and they will likely give us a definitive answer about the real structure of AppA and, possibly, some important hints about the light-induced mechanism.