Free Energies from Adaptive Reaction Coordinate Forces (FEARCF): an application to ring puckering

Previously an adaptive reaction coordinate force biasing method based on probability distributions has been used to calculate the free energy of conformation, configuration and chemical reactions. This method has recently been generalised to perform calculations on multidimensional reaction coordinates. This paper presents details of this method, termed Free Energies from Adaptive Reaction Coordinate Forces (FEARCF). The efficiency of sampling of this method is demonstrated by applying it to the problem of sampling the many characteristic pucker conformations of a β-D-glucose pyranose using a semi-empirical PM3 Hamiltonian. The sampling ratio of the global minimum conformer (4C1) to the highest energy conformer (a planar hexapyranose ring) is 1.7:1. Pucker free energy surfaces such as the one presented here can be a useful tool in the analysis of enzymatic reactions involving molecular ring puckers in the Michaelis complex.


Introduction
The pathways and manner in which molecular rings deform plays a critical role in key molecular reactions and processes. A notable case is the puckering of a carbohydrate ring when forming a glycosidic bond with another carbohydrate (in the case of oligosaccharides) or a chemical moiety such as a nucleobase (in the case of nucleosides) or degrading the already formed glycosidic linkage through hydrolyses. Glycosyl transferases catalyse the synthesis of glycosidic bonds to form oligosaccharides [1] while glycosidases hydrolyse glycosidic bonds [2,3]. Both enzyme classes whether promoting the formation of the glycosidic bond or degrading of it have a strong affinity for the reducing carbohydrate to take on high energy conformations [4]. An oxocarbenium ion forms in the Michaelis complex accompanied by the distortion of the saccharide ring. The nature of the distortion is complex and poorly understood. This is mainly because enzyme substrates often contain saccharide rings which exhibit several forms of puckering in the transition state [5,6]. The puckering conformation of saccharides in the transition state of various glycosidases results in selective reaction pathways and specific reaction products. In the case of pyranose rings competing puckers have been observed for several members of the glycoside hydrolase family [5,7]. The puckering conformers observed in X-ray experiments and proposed from computational analysis include 1 S 5 , B 3,O , O S 2 [7,8].
The analysis and categorisation of the puckering of molecular rings is marked by the seminal work of Cremer and Pople [9]. Since then several researchers [7,8,10] have shown interest in the calculation and designation of puckering for monosaccharides. Here we report an adaptive force based method for calculating the potential of mean force of ring pucker. We reported the calculation of the potential of mean force (PMF) for glycosidic linkage rotation [11] and chemical reactions [12] using this approach which we now term Free Energies from Adaptive Reaction Coordinate Forces (FEARCF). We will detail the FEARCF method here in the context of ring puckering and then calculate the three-dimensional PMF for the -D-glucose pyranose ring. Although we will present the minima pucker conformations in this paper we will not discuss the stereoelectronic rationale for their existence as this will be reported elsewhere [13].

Method and theory
The force biasing method presented here is a generalisation of the method used to produce two dimensional conformational [11,14] and reaction [12] free energy surfaces. The concept of adjusting the biasing potential iteratively to evolve the free energy was first presented by Mezei as adaptive umbrella sampling [15]. Other methods have subsequently employed this concept and extended it using either forces to directly bias the system or potential biasing. A popular adaptive umbrella potential method developed by Laio and Parrinello uses a combination of Gaussians as a biasing potential from which course-grained forces are calculated to direct the system to regions that have not been previously sampled [16]. Of particular relevance here is Darve and Pohorille's adaptive biasing force (ABF) routine for thermodynamic integration that continuously updates the biasing forces at every step during the simulation [17]. The biasing force is estimated locally from sampled conformations of the system gained from previous steps in the simulation. The bias at a specified point in the reaction coordinate is applied as soon as the bin for that point has enough samples to estimate the mean force. These forces are then applied along generalised system coordinates. The result is that the instantaneous force acting on the coordinates of interest when subtracted from the equations of motion results in zero acceleration along the reaction coordinate. The choice of sample size is recommended to be small ($100) and the sampling is then improved by applying a biasing force of the form r where is a Langrange multiplier that can be related to the derivative of the free energy dA=d.
The FEARCF method is based on probability distributions and histograms. The reaction coordinate space is a discretised n-dimensional grid where the sampling frequency for a bin site is recorded for each simulation. In the case of ring puckering the dimensionality of the grid is N À 3 where N ¼ number of ring atoms. The population of this grid represents a running tally of the reaction coordinate probability density, derived from the history of simulations to that point. It is used as input for a multidimensional cubic-spline interpolation from which the reaction coordinate biasing forces are calculated. These forces are applied to all atoms used in the reaction coordinate definition to bias the next simulation's reaction coordinate trajectory away from previously sampled areas. The entire reaction coordinate space is equally sampled when the biasing forces are derived from the true PMF. The method requires no intervention from the user other than a physical understanding of the complexity of the computer experiment, which is used to make a judicious choice of reaction coordinate, and simulation length at each update of the biasing force. A combination of relatively short simulation times at the start of FEARCF experiment increasing to longer times (three or more times as long) as convergence is reached is recommended.

Hill-Reilly pucker coordinates
Hill and Reilly proposed an intuitive description of ring puckering [18]. For an N membered ring they reduced the Cartesian coordinates to N À 3 ring pucker coordinates. Similar to the classic Cremer-Pople definition a puckering conformation is described as the combination of deviations of each coordinate from a mean plane. The Cremer-Pople pucker coordinates (amplitude-phase pairs (q i , i ) and a single puckering coordinate q N/2 for even numbered rings) are usually translated to a spherical polar set to describe the pucker conformation in the case of six membered rings. This set consists of a pucker amplitude (Q), a phase angle (, 0 5 2) which in combination with another angle (, 0 ), where q 2 ¼ Q sin and q 3 ¼ Q cos , describe the chair, twist and boat conformations.
The Hill-Reilly definitions are derived from triangular decomposition of an N membered ring, which is easily visualised in terms of deviations of the atoms that comprise the ring, from mean planes. A monocyclic ring is decomposed into rotatable planes and a reference plane. The pucker coordinates are specified by the angle the rotatable planes make with respect to the reference plane. In this paper we consider only six membered rings and so we illustrate the Hill-Reilly definitions in Figure 1. For N ¼ 6 there are three puckering coordinates 0 , 1 , 2 2 [À,] where a positive angle indicates rotation of that ring plane above the reference plane. The angle of puckering is calculated from where q i is a vector normal to the rotatable plane i and n is the vector normal to the reference plane. We define a three dimensional reaction coordinate () as the combination of these pucker coordinates, i.e. ¼ ( 0 , 1 , 2 ).

Free Energies from Adaptive Reaction
Coordinate Forces (FEARCF) The potential of mean force, W() is calculated as a function of the three-dimensional coordinate set and is related to the probability density, P() by where k B is the Boltzmann constant and T is the temperature in kelvin. The PMF is derived in a canonical ensemble (constant NVT), which gives the Helmholtz free energy puckering surface. A correct analysis of free energy pathways is only possible if the reaction coordinate is sampled uniformly. To approach uniform sampling we used a variant of Mezei's adaptive umbrella sampling [15] which we applied to a two dimensional dihedral angle PMF earlier [11] and more completely to a four-dimensional intermolecular association hypersurface [19]. To traverse barrier heights larger than 3k B T a biasing potential, U() is applied. The best choice for U() would be The objective is to bias the new trajectory away from reaction coordinate space that has been sampled in previous simulations towards reaction coordinates that have not been sampled. The concept of iteratively adapting an umbrella potential by using the information collected from previous simulations to uniformly sample a single degree of freedom was proposed by Mezei [15]. In this procedure the umbrella potential U() is adapted from the first guess until it is equal to the inverse of the PMF. At the final stage of the adaptive umbrella sampling when U() is applied to the system the entire configurational space (high and low energy regions) will be sampled [15]. The method has been extended to multidimensional conformational space for proteins [20].
We modified the macromolecular program CHARMM [21] to calculate the effect of the perturbing forces generated from the surface of the adapting umbrella potential for three independent reaction coordinates [11]. We have generalised the method to multidimensions [19] and described it here for the threedimensional pucker free energy calculation. At each step of the simulation the biasing force for i applied to the atoms involved in the rotatable plane is calculated from the gradient of the umbrella potential for that puckering coordinate. For three-dimensional surfaces these are where the force F is calculated as the partial derivative i of the applied biasing reaction coordinate umbrella potential. The result is the ith ring plane is biased away from the areas it has sampled with a force opposite to the accumulated sampling density as shown in Equation (3). When implemented in a molecular dynamics (MD) routine the biasing forces for each puckering coordinate is added to the forces from the equilibrium Boltzmann dynamic forces applied to each atom. The biasing forces on each ring atom can be recovered from the reaction coordinate forces by recasting them in terms of the PMF: which can be rewritten in terms of the vector q i using the chain rule: Similarly the forces can be written in terms of the vector n, When these are combined (and including all other contributions), the Cartesian biasing forces (@W()/@x i ) on the ring atoms x i originating from the pucker axis a i (see Figure 1(a)) are calculated as @WðÞ @a i @a i @x 2ðiþ1Þ ¼ @WðÞ @x 2ðiþ1Þ ; @WðÞ @a i @a i @x 2i ¼ @WðÞ @x 2i ð8Þ (b) Description of pucker angle relative to the reference plane and rotatable plane 0 (r.p.0). A simple example of a negative force (to bend the flap towards the reference plane) applied to theta and perpendicular to the reference plane will apply forces to atoms 0, 1, 2 and 4 as indicated.
which are due to the reaction coordinate i . The total biasing force is calculated by summing the contributions from each reaction coordinate which is then applied to the ring atoms. This can be calculated for each of the ring atoms i where i are the labels (1-6) for the ring atoms (C1,C2,C3,C4,C5,O5). At the start of the FEARCF simulations W() and likewise P() are not known. To initialise the simulations a zero biasing potential is applied (U() ¼ 0). The resulting probability distribution is used as a first guess for W() from which an improved biasing potential U() for the next simulation is obtained. This process continues iteratively until the pucker conformational space is adequately sampled.
The FEARCF method is similar to the ABF method in that it uses external forces derived from the derivative of the free energy to achieve unbiased sampling of the reaction coordinate. The ABF method however, calculates the adaptive forces on the fly at each bin and uses a ramp function to reduce errors emanating from the incomplete sampling of bins. The ABF method therefore could converge faster in cases where the statistical errors arising from sampling not being optimal for all bins. We have however, not directly compared the two methods to make any definitive statements on this. The FEARCF method calculates new biasing forces over all bins from the histogram P(), which is constructed after each iteration together with all previous iterations using the WHAM procedure. This is done numerically using a multidimensional cubic-spline routine. The major advantage of the FEARCF method is therefore that it is easily extended to multiple dimensions as we show here while in the ABF method it is not straightforward to reconstruct A ð Þ from its derivatives.

Simulations details for b-D-glucose
We modelled the glucose molecule with the PM3CARB-1 [22] semi-empirical potential and conducted in vacuo dynamics in CHARMM 33b2 [21]. Each simulation was of 0.2 ns in length (first 17 iterations and from then on 0.6 ns), at a temperature of 298.15K. Velocity-verlet dynamics were implemented with the Nose´temperature thermostat. The potential of mean force was calculated using an iterative procedure. To further increase the reaction coordinate sampling, batches of eight simulations starting from various pucker conformations (initially all 4 C 1 and then the last conformer sampled from the previous run) were used with an initial biasing potential of zero. The sampled areas were collated to acquire P() from which the first guess for W() was calculated and the associated U() applied to the next batch of eight simulations. From this the biasing forces were derived from the reaction coordinate surface as described above to improve the sampling. This iterative procedure was completed 24 times totaling a sampling of 51.2 ns (17 Â 0.2 Â 8 þ 0.6 Â 5 Â 8).
Summing and weighting the histograms of all the biased simulations appropriately further improved the sampling rate of the reaction coordinate space. This was done using the Weighted Histogram Analysis Method (WHAM) [23]. The WHAM equations were applied iteratively as shown previously [24]; until the maximum difference (tolerance) between the previous and current iteration weighting coefficients (| f i jf iÀ1 j |) was less than 0.001.

Free energy of b-D-glucose ring pucker
A pyranose carbohydrate ring has 38 characteristic puckering conformers that are strainless [25]. These can be reduced to simpler unique classifications such as chairs (C), half chairs (H), boats (B), skews (S), and envelopes (E), which are illustrated in Figure 2. The individual conformers are named with a superscript indicating a ring atom positioned above a plane formed by the remaining atoms while a subscript indicates a position below this plane. It should be noted that this relative plane is not the same as the one we use in the Hill-Reilly description (Figure 1). It is not surprising that the three dimensional free energy surface, W(), which we calculated using the FEARCF method for -D-glucose is complex (Figure 3). The reaction coordinate axes ( 0 , 1 , 2 ) are orthogonal to each other and the fourth dimension, free energy (FE) is drawn in colour. In this figure, blue represents low energy conformers that are close in energy to the global minimum and red represents conformations that are at least 30 kcal mol À1 greater in energy than the global minimum conformation. To enhance the viewing of the conformers relative to each other we project two contoured slices W( 0 , 1 ) and W( 1 , 2 ) across the FE volume. Two perspectives are given, with Figure 3(a) displaying the chair conformations while Figure 3(b) shows the B 1,4 conformation on the opposite side of the FE volume.
All the stable low energy conformers have been identified and are shown in Table 1. First we consider the pathways between the two chair conformations. The optimal path goes via a minima boat conformation (Figure 3(a)). A significant advantage of the fourdimensional free energy volume is the opportunity to accurately locate minima free energy pathways that connect the various low energy conformations. We extract the values for conformations linking the two chairs where the 4 C 1 conformer is the lowest in energy (0.00 kcal mol À1 ) followed by the B 3,O (2.00 kcal mol À1 ) and 1 C 4 (4.86 kcal mol À1 ) pucker conformations and plot this in a single conflated dimension (Figure 4(a)). In the context of glycosyl transferases [1] and glycosyl hydrolases [3,26] reaction mechanisms this process of identifying pucker transitions from the stable 4 C 1 conformer to the thermodynamically less favoured E and H puckers surface can be an important tool in the analysis of catalytic reaction mechanisms. A conformational change in the ring of the -D-pyranoside substrate is required for nucleophilic assistance of the cleavage according to an intermediate S N 1/S N 2 mechanism. Heterolytic cleavage of the acetal C -O bond requires an antiperiplanar orientation of a doubly occupied, non-bonding orbital. This is referred to as the antiperiplanar lone pair hypothesis (ALPH) and implies that hydrolysis of -D-pyranoside requires a conformational change of the tetrahydropyran ring from a chair to a twist-boat or boat resulting in a pseudoaxial orientation of the aglycon [26,27].  The ALPH is supported by the crystal structures of three endo-glycosidases in complex with substrate analogues showing a skew boat or a flattened boat conformation of the tetrahydropyran ring [28]. Atomic force microscopy (AFM) experiments were performed on an amylose strand and the yielding behavior has been attributed to transitions of the -linked glucose rings from the 4 C 1 chair to a twistboat conformation [29]. Previously using a classical carbohydrate force field potential [30] we calculated AFM profiles for amylose. In contrast to the earlier proposal our classical AFM simulations revealed that the unfolding mechanism of amylose was principally due to the rotation of glycosidic bonds and not the puckering of individual glucose units [31].
The B 1,4 conformation which we observe here from the semi-empirical FE pucker volume resides in a shallow equilibrium well that is 7.91 kcal mol À1 above the 4 C 1 chair conformation and separated from it by a barrier height of 9.49 kcal mol À1 (Figure 4(b)). The high barriers separating the boat from the chair conformations supports our previous interpretation of the AFM experiment. That is, that the pyranose transition to boat conformers early on in the amylose pulling experiment is not possible.
The accuracy of the free energy volume is validated when all states in the reaction coordinate space are adequately sampled. Adequate sampling is generally accepted when a ratio of most sampled to least sampled states is at least 1:50 [20,32]. However, using the adaptive reaction coordinate forces to bias the simulation away from equilibrium favoured sites, sampling ratios of 1:5 have been calculated [11,14,24,30,31]. When we contour the high energy pucker conformations ( Figure 5), it is apparent that these bound the reaction coordinate volume, with a singular high energy conformer in the centre. Upon inspection of the high energy conformers on the periphery, we note that they are not of chemical interest, as they represent conformers where the pyranose ring is 'wrung' out or 'folded' up. The single conformer in the centre of the volume is planar with the closest equilibrium conformers being O E above it and E O below it.
The extent of conformational space is shown from the first unbiased simulation through to the 24th  iteration ( Figure 6). The 4 C 1 conformer is mostly sampled when there are no biasing forces applied with very few transitions to the E 3 , B 3,O , and 1 S 5 conformations. After application of the biasing forces, calculated from a first guess at the PMF, the ratio of sampling of E 3 , B 3,O , and 1 S 5 improve considerably. More of space is explored with conformers O S 2 3,O B and 1 C 4 now evident. After 24 iterations we achieve a ratio of 1:1.7 where the least sampled planar conformer is nearly equal to 4 C 1 , the most stable conformer. This far exceeds the 1:50 ratio used in other studies as a measure of convergence. The high energy planar and E 4 conformers have similar sampling.
Sampling ratios for several conformers relative to the lowest energy conformer ( 4 C 1 ) are even more favourable ( Table 2). We compare the free energies for several conformers calculated here with the FEARCF with those reported using a metadynamics routine [7]. The energy differences between the favoured 4 C 1 conformer are similar to that found by Parrinello et al. using their metadynamics routine. However, in the latter case the free energies of the less favoured conformers are slightly higher than the FEARCF free energies. This difference may be due to either the difference in theory used (i.e. density functional theory in the case of metadynamics simulations compared with the carbohydrate specific PM3 semi-empirical quantum description employed here) or it may be due to the extent of reaction coordinate sampling (sampling ratios were not reported in the metadynamics study).

Conclusions
Details of the adaptive reaction coordinate force method, which we have previously developed and used to derive specific conformational, configurational and reaction free energy surfaces, have been presented here. Specifically, we generalised the method for free energies from adaptive reaction coordinate forces method (FEARCF) and implemented it in CHARMM showing it possible to converge to the full free energy volume. While any PMF can be calculated using this method its use is particularly significant for the calculation of multidimensional free energy surfaces. To illustrate the accuracy and versatility of the method we apply it to a demanding problem of exploring the hexapyranose ring pucker conformational free energy space using a semiempirical PM3 quantum Hamiltonian. Excellent sampling of the conformational space is achieved demonstrating how high energy conformers are accessible using this adaptive reaction coordinate force to bias the trajectory about the reaction coordinate multidimensional volume. Although the FEARCF application presented here is in vacuo using a PM3-CARB1 semi-empirical level of theory it is easily applied to condensed phase explicit solvent simulations using either classical or higher level quantum theory. Furthermore, while this implementation has been done in CHARMM the method could be incorporated into any molecular dynamics algorithm.  Note: Sampling ratios relative to the lowest energy conformer -4 C 1 and averaged over the 24th batch).
There are several pathways from the global minima 4 C 1 chair to 1 C 4 chair conformation. Two possible pathways go via B 3,O or B 1,4 boat conformations with half chair and envelope conformational barriers reaching more than 10 kcal mol À1 . This illustrates that necessary high energy glucopyranose ring conformations observed in glycosylase pockets requires the enzyme to induce and preserve high energy half chair and envelope conformations required for successful hydrolyses and synthesis of the glycosidic bond.