Calculation of Solvation Free Energies with DCOSMO-RS

The concept of dielectric continuum models has turned out to be very fruitful for the qualitative description of solvation effects in quantum chemical calculations, although from a theoretical perspective its basis is questionable, at least if applied to polar solvents, since the electrostatic nearest neighbour interactions in polar solvents are much too strong to be described by macroscopic dielectric continuum theory. Based on this insight the Conductorlike Screening Model for Realistic Solvation (COSMO-RS) had been developed, which gives a thermodynamically consistent, quantitative description of solvation effects in polar and non-polar solvents, even in mixtures and at variable temperature, starting from quantum chemical calculations of solute and solvent molecules embedded in a virtual conductor (COSMO). While COSMO-RS usually only requires quantum chemical calculations in the conductor and thus does not allow for studying of the concrete solvent influence on the solute electron density, the direct COSMO-RS (DCOSMO-RS) has been introduced, which uses the -potential, i.e. a solvent specific response function provided by COSMO-RS, as a replacement of the conductor or dielectric response employed in continuum solvation models. In this article we describe the current status of DCOSMO-RS and demonstrate the performance of the DCOSMO-RS approach for the prediction of free energies of solvation.


Introduction
Quantum chemical methods have been developed since more than 50 years, and the combination of many methodological developments with the exponential growth of available compute power has made them indispensible tools for many areas of chemistry. But quantum chemistry by itself is only suitable for individual molecules or smaller clusters of molecules, and it is less applicable to condensed systems with the exception of periodic structures, which can be reduced to small systems by making use of the periodic boundary conditions. Quantum chemistry is not well suited for treating molecules in the liquid state, in which each molecule is interacting with a large number of fluctuating neighbouring atoms. But since at least 90% of industrial chemistry and almost 100% of biochemistry takes place in solution, the restriction of quantum chemistry to gas phase or ordered crystalline states constitutes a serious limitation of the practical applicability and acceptance of quantum chemistry. Unfortunately, a fundamental treatment of liquid phases would require sufficiently long ab initio molecular dynamics simulations of sufficiently large ensembles of molecules, which is currently and in foreseeable future not feasible, at least not as a routine tool.
Therefore the inclusion of solvation effects into quantum chemical calculations inevitably requires the introduction of some empiricism. One route is the combination of higher level quantum chemistry for a core region of interest with lower level, often force field representations for the surrounding solvent molecules, the so called QM/MM approach. 1 Another, computationally much more efficient approach is the dielectric continuum solvation approach, in which the entire solvent is represented by a simple dielectric continuum, supplemented by a few surface proportional non-dielectric contributions. Despite or because of its simplicity, this approach has become widely accepted and used in the quantum chemistry community. The polarizable continuum model PCM, developed by Tomasi and many co-workers over more than three decades, comprises the most developed implementation of the dielectric continuum solvation approach. 2 It has been extended to many levels of ab initio quantum theory and many kinds of property calculations. Another variant of the dielectric continuum models is the conductor-like screening model COSMO, 3 which was originally independently developed, but shows a high degree of similarity to PCM. The novelty in COSMO was the usage of a scaled conductor boundary condition, instead of the exact dielectric boundary condition. This slightly approximate COSMO boundary condition was simpler to handle and computationally more efficient than standard PCM, and it turned out to strongly reduce the problem of outlying charge. 4 Therefore COSMO and its PCM implementation C-PCM have become very popular. 5 If used with the original COSMO scaling function f(((, which unfortunately is not the default in C-PCM, COSMO is as accurate as PCM, at least for neutral solutes, even at low dielectric constants 6 . The most empirical part of all dielectric continuum solvation models (DCSMs) is the construction of the interface between the solute and dielectric continuum. In the vast majority of DCSMs these cavities are constructed based on atom centred spheres with radii that are roughly 20% larger than typical Bondi radii. The number of empirical parameters introduced in this way varies strongly, since some models as COSMO just use element specific radii, i.e. one parameter per element, while many others employ atom-type specific radii, often even depending on the charge of the atom, and sometimes even on the solvent. Our own finding is that element specific radii are good enough, at least for neutral compounds.
One big problem of the DCSMs results from the non-electrostatic short range interactions. While the dispersive interactions appear to be reasonably well approximated by element or atom type specific and surface proportional terms, which introduce one more element specific parameter, hydrogen bonding (hb) is completely out of the concept of DCSA. While the hb interactions between solvent molecules may be somehow hidden in the empirical, solvent specific, and mostly solute surface proportional cavity formation energy, there is no way in which the variation of the polarity of the hb donors and acceptors of the solute enters into the interaction energy expressions of the available DCSMs.
Furthermore, even a hypothetical liquid Lennard-Jones fluid of spherical molecules having permanent dipole moments, in the order of the dipole moment of water, would not behave like a dielectric continuum on a molecular length scale. If we assume a size of the solvent molecules similar to water, then nearest neighbour dipole interaction energy is about 4 kcal/mol, and the difference between aligned and anti-aligned dipoles is 8 kcal/mol. 7 But for applying the macroscopic, linear response dielectric theory, the dipole interaction energy needs to be small compared to the thermal energy, which is 0.6 kcal/mol at 298K. This simple numerical exercise proves that even the electrostatic interactions within the first solvation shell cannot at all be correctly described by the dielectric continuum approach. Given these deficiencies the considerable success of DCSMs is surprising. From a theoretical perspective the only solvents, for which the DCSMs should be applicable for systematic reasons are non-polar solvents as alkanes or noble gases, having no significant permanent local dipole moments or quadrupole moments.
The Conductorlike Screening Model for Realistic Solvation (COSMO-RS) was introduced in order to overcome these problems. [7][8][9] It starts from COSMO calculations for the solute and solvent molecules in the limit of a conductor, i.e. with The resulting surface polarization charge densities  are used for a quantification of the electrostatic and hydrogen bond interactions of the molecules. By a statistical thermodynamics treatment of all possible pair-wise interactions of the surface segments of the solvent COSMO-RS generates a free energy function, the so-called solvent -potential µS(), which is the free energy of a surface of polarization charge density  in the solvent. Integration of this -potential over the surface of a solute X then leads to an expression for the chemical potential of the solute X in the solvent S. The full concept of COSMO-RS is described in more detail elsewhere. [7][8][9] COSMO-RS was shown to provide currently the most accurate predictions Gsolv on a dataset of 2434 neutral compounds. 10 Meanwhile it is widely validated and used for many different kinds of fluid phase thermodynamics property predictions for chemical engineering, drug design and development, and many other purposes. [11][12][13][14][15] Advantages of COSMO-RS over conventional DCSMs are the equal treatment of solutes and solvents, which avoids any special parameterization of solvents that was required in conventional DCSMs, and its applicability to mixtures and variable temperatures. Furthermore, COSMO-RS is very efficient, because the time consuming quantum chemical calculations are only required once per molecule, and not repeatedly in each solvent.
Nevertheless, the latter advantage also provides a certain disadvantage, because COSMO-RS thus does not yield the solvent specific response of the molecular properties, e.g. electron density, dipole moment, spectroscopic properties, etc., which can routinely be calculated with DCSMs. Therefore the direct COSMO-RS (DCOSMO-RS) approach has been introduced, 16,17 which replaces the empirical dielectric continuum solvent response functions by the solvent -potential, which results from regular COSMO-RS. Thus the solvent response includes hydrogen bonding and the more realistic electrostatic response function of COSMO-RS. This approach has already been introduced and applied to spectroscopic property prediction previously. 16 In this paper we present our latest, improved version of DCOSMO-RS and demonstrate its ability to predict solvation free energies of molecules in a broad variety of solvents.

Theory
In conventional DCSMs the solvation energy of a solute in a solvent is a quadratic expression with respect to the polarization charge densities, which can be most simply seen in the COSMO formalism, where the COSMO energy contribution of a molecule X in a solvent S with dielectric constant (S) is given as where the qi, si and i are the charges, surface areas and polarization charge densities, respectively, on the surface segments of the solute X.

X ij
A is the COSMO interaction matrix of the surface segments.
The polarization charges densities are defined by where inv X ij A , is the inverse of the COSMO interaction matrix and X j  is the electrostatic potential produced by the solute X, i.e. its nuclei and electron density, on segment j. In eq. 1b the expression has been split into a conductor contribution, and an (S) dependent contribution, which reflects the deviation of the solvent from a perfect conductor. The COSMO operator required for HF or DFT calculations then is derived from the functional derivative of X S E with respect to the electron density X el  .
In COSMO-RS the deviation of the solute energy of a solute X in a solvent S relative to a perfect conductor is given by is the -potential of the solvent, which can be calculated within the standard COSMO-RS formalism, and the last term is the combinatorial contribution, which takes into account the size ratio of solute and solvent, and which does not depend on the electron density. If we replace the last term in eq. 1b by we get a new expression for the free energy of the solute in solvent S, which we consider as the DCOSMO-RS expression, being composed of the COSMO expression for the conductor limit and a COSMO-RS contribution.
The factor Because the parabolic part of the -potential can be described well by the COSMO model, 18 the problem can be re-formulated using the  dependent COSMO equation combined with a reduced -potential Both models (eq. 4 and eq. 5) coincide in the limit of the perfect conductor (   dataset. 19 Some differences still have to expected due to the fact that the polarization charge densities now are different from those in a perfect conductor due to increased backpolarization of the solute X in the DCOSMO-RS procedure.

Results
The mean unsigned errors (MUEs) of the COSMO-RS and DCOSMO-RS free energies of solvation of the neutral compounds of the SM8 data set 19 are given in table 1. For reference we added the respective MUEs for when the bare COSMO to gas phase energy difference is used. The DCOSMO-RS calculations have been performed with the two models described in the theory section. The results obtained with finite (eq. 5) and infinite (eq. 4) dielectric constant are denoted as eps and inf, respectively.   The plots in figure 3 show a poor correlation between the bare COSMO predictions and the experimental values, which is substantially improved by the two DCOSMO-RS models. The best correlation with respect to slope, intercept and coefficient of determination can be found for the COSMO-RS predictions.
Because of the direct influence of the -potential on the SCF calculation, which does not exist in the pure COSMO-RS theory, we need to adjust the gas  parameter. With a MUE of about 0.7 kcal/mol the DCOSMO-RS models provide a substantial improvement of the pure COSMO approach. The performance is only slightly worse than the one of the SM8 model (MUE=0.59 kcal/mol) which has been fitted on the data set under consideration. 19 The predictions of the subset of the free energies of hydration show the same trends as for the full data set. The MUEs on the COSMO level are nearly the same whereas the COSMO-RS and DCOSMO-RS are less accurate for the aqueous solvation subset.
The achieved over all accuracy of DCOSMO-RS is almost equal to the accuracy of IEF-PCM/MST as reported on a subset of 960 data points for non-aqueous solvation (0.64 kcal/mol). 10 The performance on the subset for aqueous solvation is also similar.
The overall performance of the COSMO-RS model is slightly better than the value reported in ref. 10 (MUE=0.48 kcal/mol) which is mainly due to the newer parameterization that has been used in this work.
The two DCOSMO-RS models exhibit an asymmetrical error distribution, which can also be found for the COSMO-RS method albeit in general with smaller deviations. The largest positive deviations in the order of 2.7-3.0 kcal/mol appear for tertiary amines, which are known since long to cause problems in COSMO-RS, and for formaldehyde in alcohols. For the latter its tendency to react with solvents may put some question marks on the experimental data. For formaldehyde in 1-butanol a reaction free energy of ~1.4 kcal/mol can be derived from the work of Peschla et al.. 27 Larger errors can be found at the negative deviation side. Here the largest outlier (-7.12 kcal/mol) is diethyl-4-nitrophenylthiophosphate, i.e. the pesticide parathion, in benzene. In this case we clearly doubt the experimental value, since on the one hand the predictions for parathion in water and octanol are both within an error range of less than 2 kcal/mol, and on the other hand the CRC Handbook of Pesticides 28 reports miscibility of parathion in benzene, which would correspond to a much more negative value of Gsolv than reported in the SM8 dataset. The same argument most likely also applies to dimethyl-4-nitrophenylthiophosphate in benzene and chloroform, which are both within the top 20 outliers. More serious are several solutes with electronically coupled donors and acceptors in hydrogen bonding solvents, as urea, water and hydrazine in water, which are reasonably well predicted by COSMO-RS but large negative outliers in D-COSMO-RS. Therefore these seem to be over-polarized by D-COSMO-RS.

Conclusions
The improved DCOSMO-RS method based on element specific -potentials provides a robust and widely applicable method for the quantitative treatment of solvation effects in a wide range of solvents.
Solvation free energies for a broad variety of solutes in very different solvents can be predicted with an accuracy of ~0.7 kcal/mol (MUE) on a test set of 2434 data. This is a substantial improvement over the pure COSMO approach and only ~0.3 kcal/mol worse than the accuracy of the full COSMO-RS procedure. It is only slightly worse than the SM8 method, which had been parameterized exactly on the considered dataset.
Since COSMO-RS is applicable to a very wide range of pure and mixed solvents, including ionic liquids, at variable temperature without requiring any experimental information about the solvents, -potentials are easily available for all these solvents without additional parameterization. Thus DCOSMO-RS can be easily applied to almost any solvent, including mixtures. Its strength surely is not the bare prediction of solute free energies. For this purpose the COSMO-RS method is still more efficient and more accurate, and thermodynamically more consistent. But the presented improved DCOSMO-RS now can serve as an efficient tool which combines a robust quality solvation free energy prediction with the capability for response property calculations beyond the dielectric continuum limitations.

Supporting Information Available:
The element-specific -potentials for all solvents used in this paper and the list of experimental and calculated solvation free energies for the 2346 compounds are available as supplementary material free of charge via the internet at http://pubs.acs.org.