From in silica to in silico: retention thermodynamics at solid-liquid interfaces.

The dynamics of solvated molecules at the solid/liquid interface is essential for a molecular-level understanding for the solution thermodynamics in reversed phase liquid chromatography (RPLC). The heterogeneous nature of the systems and the competing intermolecular interactions makes solute retention in RPLC a surprisingly challenging problem which benefits greatly from modelling at atomistic resolution. However, the quality of the underlying computational model needs to be sufficiently accurate to provide a realistic description of the energetics and dynamics of systems, especially for solution-phase simulations. Here, the retention thermodynamics and the retention mechanism of a range of benzene-derivatives in C18 stationary-phase chains in contact with water/methanol mixtures is studied using point charge (PC) and multipole (MTP) electrostatic models. The results demonstrate that free energy simulations with a faithful MTP representation of the computational model provide quantitative and molecular level insight into the thermodynamics of adsorption/desorption in chromatographic systems while a conventional PC representation fails in doing so. This provides a rational basis to develop more quantitative and validated models for the optimization of separation systems.


Introduction
Solute retention in reversed-phase liquid chromatography is an important process dominated by intermolecular interactions and has received considerable attention during the past decade. [1][2][3][4][5][6][7][8] Due to the molecular complexity of the system and because QSAR (quantitative structure activity relationship)-like interpretations of experimental data remained largely inconclusive, it is of major importance to improve the understanding of solute retention using computational modelling so that the composition, functionalization and other physico-chemical properties of specific columns can be quantitatively predicted. [9][10][11] Extensions of molecularly-resolved pictures to the mesoscale for liquid chromatography are particularly relevant as they have the potential to reduce development times and potentially improved selectivity of such columns. Despite the seemingly simple chemical composition of such systems, the atomistic understanding underlying the separation process remains elusive since the system is highly dynamical, heterogeneous and disordered [1][2][3][4] . Use of computational and experimental spectroscopic techniques such as IR [12][13][14] , Raman 15 and NMR [16][17][18] has provided important insights concerning the water/methanol mixture and proved to be useful in order to characterize these highly heterogeneous systems at the molecular level 1,7,[19][20][21][22][23] . The process is molecularly driven and requires bottom up strategies because decomposition schemes such as QSAR are not sufficiently detailed.
The separation mechanism involves specific and non-specific interactions between the solute and its environment which includes the solvent mixture, the support surface and the functionalized alkyl chains chemically bonded to the support surface. [24][25][26][27][28][29] Challenges further increase when separating structurally or functionally similar compounds (for example, isomers) where their corresponding peaks elute close to one another. 30 Progress has been made in understanding surface density, chain length, temperature, and solvent composition effects on retention. [31][32][33][34] However, quantitative studies are rare 8 and thus no quantitative retention model is available that would allow the chromatographic parameters, such as capacity factor, resolution and separation factor, for individual analytes separated under given conditions to be accurately predicted. Given the cost of industrial-scale experimentation, distinct composition parameters and operation scenarios: choice of solvent structure and mixtures, alkyl chains length and types and silica layer properties can now be computationally tested for separation viability. Such has been the case of in silico drug discovery. [35][36][37] A typical chromatographic system consists of a solid silica substrate tethered with hydrophobic alkyl chains forming the stationary phase, a solvent mixture and analyte molecules (see Figure 1). The alkyl chains in such systems are highly flexible which modulates the intermolecular interactions at the stationary/mobile interface which significantly influences retention and chemical selectivity. 39 The nature of the solvent mixture and alkyl chains is decisive for the separation process. Commonly used mixtures are water (H 2 O), together with an organic co-solvent such as acetonitrile (ACN), methanol (MeOH) or tetra-hydro furan (THF). The change in the solvent composition mainly modulates the hydrophobic nature of the environment which affects the elution time and improves the resolution of the analyte. 39 Experimental studies showed that separation of compounds depends decisively on the hydrophobicity of the studied compounds. [40][41][42] Contrary to normal phase liquid chromatography (NPLC), the stationary phase in RPLC is less polar than the mobile phase, and so the retention time increases as the polarity of the mobile phase decreases. Thus, non-polar analytes are more strongly retained than polar ones. 43 One of the eminent questions for RPLC is whether, at a molecular level, the retention process is dominated by partitioning or by adsorption or whether it is a combination of the two. Adsorption chromatography is based on the retention of solute by surface adsorption, whereas partition chromatography is based on the adsorption of solute by bulk stationary phase usually held in place by an inert scaffold of solid particles. 5 The retention mechanism based on hydrophobic interactions depends on the surface coverage of the stationary phase (C 18 in the present work). In materials with dense coverage ≥ 3.8 µmol/m 2 , partition predominates, whereas in materials with less dense coverage ≤ 3.8 µmol/m 2 , adsorption is of increasing importance. Because higher density coverage shows a homogeneous coverage of the supported surface and the unblocked silanols are shielded by the C18 ligands in materials with a high surface load.
Recent work has shown that atomistic simulations with ac-curate force fields are suitable to quantitatively describe the thermodynamics of complex systems. 8,44,45 They allow to explicitly determine the hydration free energy 44,45 and the free energy of desorption from the surface 8 which can be directly compared with the experimentally measured partition coefficients k. 4,39 However, these force fields can be based on point charges (PC) or multipoles (MTP) depending on the molecules considered. 45 For the specific case of benzene derivative solutes, first a PC force field was used to model the entire system and then a combined PC/MTP force field, where only the solutes (here benzene derivatives) are treated with multipoles (in order to take into account the anisotropic electron distribution of the aromatic cycle and the moieties attached to it) while the chromatographic system is treated with PCs, and the resulting quantitative atomistic simulations were compared to experiment. This strategy was previously found to provide the accuracy of MTP simulations but at a lower computational cost. 44 The present work discusses the use, merit and shortcomings of point charge-and multipole-based force fields for molecular simulations of solute molecules in heterogeneous systems at the solid/liquid interface. First, the parametrizations of both types of models are described. Next, the free energy of adsorption for two different solvent compositions are determined and specific cases are discussed. Finally, conclusions are presented.

The System
The model silica support consists of two 8.75 Å thick segments of the (101) face of a crystalline quartz lattice with dimensions 36 × 41 Å. This results in two -OH capped surfaces with a vicinal silanol density of 3.1µmol/m 2 . A model for a chromatographic column was then generated by covalently tethering C 18 alkyl chains to the silanol oxygen atoms of the quartz surface and orthogonal to the bulk quartz. The C 18 chains were randomly distributed over the surface silanols to give a surface coverage of 2.65 µmol/m 2 which is a typical value in experiments. 39 An equilibrated 80 Å thick solvent box (water/methanol) was superimposed on this arrangement which leads to a unit cell with dimensions of 36 × 41 × 97 Å 3 , see Figure 1.
The water model in the present work is the transferable intermolecular potential three point (TIP3P) 48 model. For the methanol (MeOH) molecule, parameters are taken from previous work 3 with an OH-equilibrium bond length of 0.96 Å. 49 The CT3-OH1 bond, the HA-CT3-OH1 and HA-OH1-CT3 angles, and the dihedrals 49

Force Field Parametrization
Two parametrizations for the solutes are considered in the following. One of them is based on a point charge (PC) representation of the electrostatics as is used in typical conventional force fields, including CHARMM 50 , AMBER 51 , GROMOS 52 and OPLS 53 , although exceptions to this have been explored. 54 Another possibility is to assign atom-centered multipoles (MTPs) to all or selected atoms. Here MTPs up to quadrupoles on non-hydrogen atoms and monopoles on hydrogens are used which has been shown to provide accurate representations of the electrostatic potential [55][56][57] and in particular to correctly capture the electron anisotropy around substituted, halogenated phenyls. 44,45,58,59 The parametrization of PhH and PhCl was that from previous work. 59 For PhOH and PhCH 3 , the parametrization procedure, for PC and MTP, followed the same protocol. 45 The starting topology and the bonded parameters are those of CGenFF 60,61 and only the electrostatic and van der Waals parameters were optimized. Briefly, first the point charges are optimized to represent the molecular ESP. In a next step these PCs are frozen and the atomic dipole and quadrupole moments are optimized to further improve ESP compared to the reference ab initio calculated electrostatic potential (ESP) at the MP2/aug-cc-pVDZ level of theory. 62,63 Next, the van der Waals parameters, starting from values of the CGenFF force field 64 , are scaled to reproduce thermodynamic properties: density ρ, heat of vaporization ∆H vap (determined from a simulation of the pure liquid according to where < E gas > and < E liq> are the ensemble average of potential energies of one molecule in the gas and liquid phases, T is the temperature and R is the gas constant) and hydration free energy ∆G hyd (determined using thermodynamic integration from simulations of one solute molecule in a 25 × 25 × 25 Å 3 box of TIP3P 48 water molecules). 45 The best-fit thermodynamic data for the four compounds based on PC and MTP models for the electrostatics are summarized in Table 1. It is important to point out that the van der Waals parameters for one compound differ depending on whether a PC or a MTP electrostatic model was used. A previously introduced score S which weights hydration free energies more heavily than heats of formation or pure liquid density is used to compare the different models on an equal footing. 45 The ∆G hyd is more heavily weighted because in the present study we are primarily concerned with this quantity. The solutes used in this study, along with the Molecular Electrostatic Potential (MEP) at the 10 −3 ea −3 0 isodensity surface. 46 The MEP is that from which the PCs and MTPs were derived by fitting to the ESP. Phenyl derivatives PhR are arranged according to increasing relative polarity:  [65][66][67] given in parenthesis using both PC and MTP electrostatics. Missing experimental data is represented by the symbol n/a (not available). The density is reported in g/cm 3 , the heat of vaporization and the free energy of hydration are reported in kcal/mol at 298 K. The score S = ∑ 3 i=1 w i (Obs i − Calc i ) 2 is that from Ref. 45 with w ρ = 1, w ∆H = 3 and w ∆G = 5 to differently weight the three observables and put most weight on the hydration free energy.

Molecular Dynamics Simulations
For all molecular dynamics (MD) simulations of the chromatographic system the same protocol was used using the CHARMM program 68 . After solvation, the systems were subjected to 2000 steps of steepest descent minimization to relieve strain. The system was then heated to T = 300 K and equilibrated at constant volume and temperature for 500 ps. Next, MD simulations were carried out with a time step of 1 fs. SHAKE 69 was used to constrain all bonds involving hydrogen atoms. All simulations were carried out with periodic boundary conditions. For PC-PC interactions, particle mesh Ewald (PME) 70 was used with a grid spacing of 1 Å and a relative tolerance of 10 −6 . Nonbonded interactions were truncated at a distance of 14 Å on an atom-by-atom basis, using a shift function for the electrostatic interactions and a switch algorithm for the van der Waals interactions. 71 For higher MTP interactions a power-law dependent switching was employed. 59 The atom positions of the bulk quartz surface with the exception of the exposed hydrogen atoms of the silanol groups were held fixed during the simulation (1664 atoms). Initially, the system without analyte molecules was heated to T = 300 K and equilibrated at this temperature for 500 ps, followed by 5 ns of production simulations. Next, the analyte molecules were introduced in the stationary phase at 3 Å above the silica surface and overlapping solvent molecules were removed. Then, the system was again minimized, heated and equilibrated using the same protocol.
Umbrella sampling simulations 72 were used to evaluate the free energy of binding along the z−direction G(z) of the analyte away from the surface (which is the xy−plane). The reaction coordinate was the distance of the center of mass of the analyte above the surface and the probe molecule was moved away from the surface along the z−direction towards the middle of the mobile phase, see Figures 3 and 4. The sampling consisted of 28 evenly spaced windows, separated by 1 Å for optimal overlap of the histograms. For every window, the simulation included 500 ps of equilibrium simulation and data accumulation took place for the next 2 ns. This gives 56 ns of sampling for a given system. The overall free energy profile was constructed from a weighted histogram analysis (WHAM). 73,74 Convergence of these free energy simulations was assessed by running independent simulations with 2 ns of equilibration and 5 ns of accumulation for PhCl, using both PC and MTP representations, in both solvent compositions 20:80 and 50:50. The results show that convergence within an US window is obtained after ≈ 1 ns of accumulation time after which the free energy contribution for the particular window fluctuates around its average. 75

Thermodynamics of Intercalation
For all analytes, experimental data for the retention time and partition coefficient for a C 18 column in different MeOH/H 2 O compositions (20:80 and 50:50) and for different C 18 coverage is available. 39 In order to make direct contact with this data, free energy simulations using umbrella sampling (US) were carried out. Typical snapshots from such simulations are shown in Figure 1. During US, the analytes' height above the surface z is constrained to within a window of 1 Å but it is free to rotate. It is evident that the C 18 chains are not fully extended but in a partially collapsed state as was already found in previous simulations 3,7,20,21 . The typical average extent is found to be 10 Å and 12 Å above the solid support for 20:80 and 50:50 MeOH/H 2 O solvent mixtures, respectively. 38 The computed adsorption free energy can then be compared in two different ways with experiments: first by calculating the change in free energy of the same compound in two different solvent compositions (∆∆G r 1 →r 2 , where r 1 and r 2 are two different MeOH/H 2 O ratios r) and, secondly, by comparing the difference in free energy of two different compounds in the same solvent (∆∆G c 1 →c 2 , where c 1 and c 2 are two different compounds c). The free energy difference between the solvated and adsorbed states is experimentally estimated from the measured partition coefficient K by the relationship: where R is the gas constant, and T is the temperature. K is the molar concentration of analyte in the stationary phase divided by the molar concentration of the analyte, which corresponds to the equilibrium constant K e . Thus, for a given solute X in two different solvent compositions (here 20:80 and 50:50 MeOH/H 2 O) the change in free energy is ∆∆G r 1 →r 2 = −RT ln k 50:50 k 20:80 (2) and the difference in adsorption free energy for two compounds c 1 and c 2 is The variable k is the capacity factor which is directly proportional to the retention time t r which, in turn, is a measure for the free energy difference for the solute in the mobile phase versus the stationary phase, respectively. Hence, k can be used instead of K in equation 1. 76 The capacity factors for all solutes considered in the present work are those from Figure 4C in Ref. 39   The results in Table 2 indicate that all analytes are more stable in 20:80 solvent composition compared to 50:50, which also agrees with experiment. 39    panel)). Figures 3 and 4 show that the free energy profiles exhibit an attractive region (up to z ≈ 14 Å, which is the average height of the stationary phase) after which the free energy increases by 2 to 3 kcal/mol (depending on the compound, the electrostatic model and the solvent composition) upon transfer from the hydrophobic stationary phase into the polar mobile phase. This transition region extends from 14 ≤ z ≤ 18 Å.
It is also noted that the free energy curves exhibit undulations, up to 0. The present simulations find minima along the free energy curves for 5 < z < 10 Å above the surface which is consistent with the interpretation of previous experiments that report an increase in the interaction energy for z < 18 Å and a minimum in the range between 5 and 10 Å above the stationary phase. 32 Considering the minimum at z ≈ 14 Å along the free energy curves before moving into the solvent shows the following order of stability: PhCl > PhCH 3 > PhH > PhOH, for both solvent compositions when using the MTP model (Figure 3), while using the PC model the order of stability is PhCH 3 > PhH > PhCl > PhOH ( Figure  4). The observed order of stability using the MTP electrostatics correlates with the retention factors k observed experimentally for these probes (k PhCl > k PhCH 3 > k PhH > k PhOH ) 39 while the order of stability using the PC model does not correlate with the retention factors but rather follows the order of polarity of the molecules PhCH 3 < PhH < PhCl < PhOH.
The difference between the free energy profiles using PC and MTP models originates from the electronic anisotropy around the phenyl ring (see Figure 2). This anisotropy is not accounted for when using a PC representation for the electrostatics 44 and affects the intercalation of the solutes within the alkyl-chains (stacking of the phenyl ring between the alkyl-chains, which represents the majority of the interactions) and thus desorption from the stationary phase. This shows that all phenyl-like solutes need to be represented with an MTP force field in order to accurately reproduce the experimental observables.
Another notable difference in the free energy curves is their behaviour for PhOH in the 50:50 MeOH/H 2 O mixture. Using a MTP representation there is an adsorption stabilization of -1.5 kcal/mol at around z = 5 Å whereas simulations with a PC model yield only a broad and unspecific minimum less than -0.5 kcal/mol deep which is indicative of partitioning.

Dynamics of Intercalation
The unbiased dynamics of the solute is also of interest. For this, the dynamics of the solute depending on its initial position with respect to the surface was investigated. Figures 5 and 6 show the evolution of the distance between the center of mass of the analyte (X) and the surface, using MTP and PC models in the simulations, respectively, for 20:80 (left panels) and 50:50 MeOH/H 2 O mixtures (right panels) for different initial positions of the solute (at 3, 9, 15, 21, and 27 Å above the surface) for 5 ns simulations. Independent of the chemical modification on the solute the probe molecules prefer to sample regions close to the surface -i.e. the stationary phase and the alkyl chains -and not the solvent mixture. This is consistent with the free energy simulations and with previous observations, 8,10,19 and was linked to the fact that the interface is rough and that the alcohol in water/alcohol solvent mixtures wets the surface but does not alter the stationary phase structure. 77 The surface of the stationary phase as defined by the average position of the CH 2 group extending farthest away from the silica support is indicated by dashed lines at 10  Depending on solvent composition, the probability for the analyte to sample the surface differs slightly. For 20:80 mixtures the interaction is typically stronger, i.e. the probe molecules sample regions closer to the surface whereas for 50:50 mixtures it is more loosely bound. This confirms the results from Table 2 and 3. A typical example is the free dynamics of PhCH 3 which remains localized at one surface for a 20:80 mixture whereas for 50:50 it interacts with either side of the stationary phase because specific solvent-surface interactions are preferred over hydrophobic solute-surface interactions, and so the solute migrates faster to the mobile phase. 39 Comparing the different probe molecules in 20:80 solvent mixture suggests that PhCl intercalates almost permanently, i.e. interacts most strongly for the time scale considered here (an aggregate of 140 ns) when using MTP electrostatics. Conversely, PhOH as the other extreme exchanges readily between the two interfaces with MTP electrostatics ( Figure 5). This is even more pronounced for a 50:50 solvent mixture. However, when comparing the different solutes in 20:80 and 50:50 solvent mixtures using PC electrostatics ( Figure 6) it is observed that PhCH 3 intercalates almost permanently over the time scale considered, followed by PhH, while PhCl samples regions in the solvent mixtures that are close to the alkyl chain/solvent interface. As for PhOH it migrates to the solvent mixture but does not exchange between the two interfaces ( Figure 6).

PhCl Anchors to Silanol Oxygens of the Surface
As mentioned earlier, with increasing polarity of the solute, its stability in the stationary phase typically decreases. This is, however, not the case for PhCl as found in experiments 39 and when using MTP electrostatics in simulations 8 . Using a PC representation in the simulations leads to ranking in the stability from simulations which follows the order of polarity. Hence, for PhCl it is meaningful to further analyse the molecular origin for the increased interaction strength with the stationary phase which can be captured with MTP electrostatic but not with a simpler PC representation.
PhCl, where Cl is covalently bonded to the phenyl ring, does not follow this trend because the halogen atom exhibits a sigma-hole, that results in a positive electrostatic potential along the C-Cl bond. Such a charge distribution ("janus-like") can interact with negatively charged sites (here, the silanol oxygens) and the negative electrostatic potential on the flanks can interact with positively charged sites (here, the silanol hydrogens). 78 Thus PhCl uses the sigmahole feature of the halogen as a double anchor to stick to the silica surface, which then reinforces its stability in the stationary phase. 8 On the contrary, a simple point charge model for the electrostatic potential, which assigns a partial charge to each atom, is unable to reproduce the thermodynamics of halogenated compounds, as it fails to describe the positive lobe of the sigma-hole, and the quadrupolar electrostatics around aromatic rings. 44,45,79,80 To substantiate the difference between a PC and a MTP representation, Figure 7 (Figure 4), suggesting that a PC model for PhCl is barely capable of distinguishing between PhH and PhCl. The time evolution of the PhCl-surface distance along 5 ns starting from z = 3 as initial position in 50:50 MeOH/H 2 O solvent mixture ( Figure 7B) shows that PhCl simulated with PC electrostatics (green) compared to MTP (black) migrates more rapidly into the mobile phase.
In addition, the surface-to-PhCl center of mass distance time series obtained from PC simulations ( Figure 7B) resemble more those for PhH and PhCH 3 (Figure 6, sampling of the interfacial region) than the time series obtained from MTP simulations of PhCl ( Figure 7B, interaction with the surface). This, again, is indicative that a PC model has difficulties to distinguish between PhCl, PhH and PhCH 3 . Moreover, the corresponding distance probability distribution P(z) averaged over 140 ns of MD (inset in Figure 7B) mirrors the MTP-PC difference in the free energy obtained with US. Third, Figure 7C reports the cumulative number of water (red), methanol (blue) and silica (green) oxygen atoms within 4 Å of Cl during 5 ns of MD simulation starting from z = 3 using both MTP (left panel) and PC (right panel) simulations. Compared to PC sampling, only the simulations with MTP find the Cl atom interacting with the surface-silanol oxygens. In addition, when detaching from the surface, the solute engages in interactions (see Figure 8B) with the mobile phase oxygens (water or methanol oxygens). This rationalizes the functional role of the solvent even for regions deep within the stationary phase (solvent structuring Figure 8B). Finally, Figure 7D shows the projection of the C-Cl bond onto the xy-plane for different z-ranges from MTP (grey and black segments) and PC simulations (light and dark red segments). MTP simulations show how the C-Cl bond points towards the silica surface for z < 5 Å, and for larger values of z, the Cl atom interacts with the water and methanol oxygens and thus the molecule is oriented towards the mobile phase. On the contrary, for z < 5 Å PC simulations the C-Cl bond points towards the mobile phase. Figure 8 shows the PhCl-sampled conformations using MTP electrostatics. At short distances from the stationary phase, PhCl samples two well-defined conformations corresponding to a halogen bond between Cl and the O of the silanol and to a hydrogen bond between Cl and the hydroxyl H of the silanol, respectively ( Figure 8A). Further away from the surface (z > 6 Å) the C-Cl bond samples a wider range of conformations and the solvent molecules tend to structure between the analyte and the silica surface ( Figure 8B). In summary, the positive cone of the halogen in PhCl acts as an anchor to engage in interactions with the silica surface ( Figure 8A), which then reinforces its stability in the stationary phase (Table 3), and successively cheats its way towards the top.

Conclusion
Retention in HPLC simultaneously depends on various types of intermolecular interactions between the solute and the stationary phase, the solute and the mobile phase, and the stationary and  the mobile phases. The present work demonstrates that the combination of accurate MTP force fields and free energy simulations provides the necessary accuracy required for quantitative interpretations of experiments, contrary to entirely PC-based models. It is shown that MD simulations lead to a more detailed microscopic understanding of the solute transfer process, and these studies show that experimentally relevant quantities in chromatography can be accurately computed with current computer capabilities. The results rationalize previous difficulties 24,25,30 encountered in studying retention thermodynamics, explore the effects of a larger number of chromatographic parameters, and extend MD sampling techniques from methane 9,11 to chemically more meaningful and challenging solutes.
Improved characterization of the molecular-level details of retention in chromatographic systems offers a number of future possibilities. For one, this will assist in predicting retention/separation times from computation and to eventually develop more effective, rapid scoring techniques based on molecular-level information. Secondly, this information can be used for column design. In the past, a universal solute/solvent retention index system of RPLC has been developed and tested with a library of compounds and mobile phases. 81 It has been found by examining RPLC retention data by principal component analysis (PCA) 82 and target transformation factor analysis 83 that the data obtained from three different columns share a common factor space and that three vectors are sufficient to describe this retention data. The resulting eigenvector matrix associated with analyte compounds from singular value decomposition is found to be characteristic of the retention behavior of the compounds and independent of the mobile phases and reversedphase columns used for the measurements. 81 This opens up the possibility to combine results from quantitative MD simulations -which can generate large amounts of meaningful data with molecular resolution -with machine learning techniques. Another future avenue is to extract the enthalpic and entropic contributions for libraries of compounds in chromatographic systems of different chemical composition.