Ab Initio Effective One-Electron Potential Operators:Applications for Charge-Transfer Energy in Effective Fragment Potentials

The concept of effective one-electron potentials (EOP) has proven to be extremely useful in efficient description of electronic structure of chemical systems, especially extended molecular aggregates such as interacting molecules in condensed phases. Here, a general method for EOP-based elimination of electron repulsion integrals (ERIs) is presented, that is tuned towards the fragment-based calculation methodologies such as the second generation of the effective fragment potentials (EFP2) method. Two general types of the EOP operator matrix elements are distinguished and treated either via the distributed multipole expansion or the extended density fitting schemes developed in this work. The EOP technique is then applied to reduce the high computational costs of the effective fragment charge-transfer (CT) terms being the bottleneck of EFP2 potentials. The alternative EOP-based CT energy model is proposed, derived within the framework of intermolecular perturbation theory with Hartree–Fock non-interacting reference wavefunctions, compatible with the original EFP2 formulation. It is found that the computational cost of the EFP2 total interaction energy calculation can be reduced by up to 38 times when using the EOP-based formulation of CT energy, as compared to the original EFP2 scheme, without compromising the accuracy for a wide range of weakly interacting neutral and ionic molecular fragments. The proposed model can thus be used Abstract The concept of effective one-electron potentials (EOP) has proven to be extremely useful in efﬁcient description of electronic structure of chemical systems, especially extended molecular aggregates such as interacting molecules in condensed phases. Here, a general method for EOP-based elimination of electron repulsion integrals (ERIs) is presented, that is tuned towards the fragment-based calculation methodologies such as the second generation of the effective fragment potentials (EFP2) method. Two general types of the EOP operator matrix elements are distinguished and treated either via the distributed multipole expansion or the extended density ﬁtting schemes developed in this work. The EOP technique is then applied to reduce the high computational costs of the effective fragment charge-transfer (CT) terms being the bottleneck of EFP2 potentials. The alternative EOP-based CT energy model is proposed, derived within the framework of intermolecular perturbation theory with Hartree–Fock non-interacting reference wavefunctions, compatible with the original EFP2 formulation. It is found that the computational cost of the EFP2 total interaction energy calculation can be reduced by up to 38 times when using the EOP-based formulation of CT energy, as compared to the original EFP2 scheme, without compromising the accuracy for a wide range of weakly interacting neutral and ionic molecular fragments. The proposed model can thus be used routinely within the EFP2 framework.

Effective one-electron operators can be systematically used to convert fragment-based methods into effective fragment potentials. As an example, we developed an efficient charge transfer energy method which can be used in ab initio force fields such as EFP2, with less computational effort and maintaining accuracy.

INTRODUCTION
In the framework of the theory of intermolecular forces, charge transfer (CT) is a rather technical term that originates due to algebraization in finite basis sets and arbitrary fragmentation of a quantum mechanical system. [1][2][3][4][5] Nonetheless, in supermolecular calculations of a complex the charge transferred among interacting subsystems can be quite significant and the corresponding CT effect on the interaction energy is known to be often non-negligible. This is particularly evident in donor-acceptor systems such as H-bonded species and charged complexes. 6,7 Since CT is associated with 'interfragment' matrix elements, it constitutes a part of a variety of inductionrelated effects that are rigorously defined in the symmetry adapted perturbation theory (SAPT). 3 Therefore, evaluation of CT contribution to the intermolecular interaction potential 4 is far from trivial due to its notably complex quantum mechanical (QM) origins and cannot be realized in terms of any classical nor semi-classical approach. Even at the Hartree-Fock 8 (HF) approximation any separation of charge transfer from intramolecular polarization is arbitrary and depends strongly on the chosen basis set, 9,10 to a point of becoming meaningless in the limit of a complete basis. 2 However, Stone and Misquitta showed that CT energy can in principle be extracted from SAPT calculations by comparing the induction energies of fictional systems in which basis functions are centered either solely on the monomers, or the entire interacting complex. 5 CT energy was also formulated by Murrell et al. 11 in their perturbation theory in the region of small wavefunction overlap up to second order. Nevertheless, all these theories are computationally too demanding to be applied in efficient calculations of intermolecular forces during molecular dynamics simulations, as they require evaluation of the electron repulsion integrals (ERIs) and their four-index transformation to molecular orbital (MO) basis.
The apparent difficulty in theoretically characterizing the CT energy in terms of the interacting molecular fragments is indeed a challenge in the development of modern force fields or ab initio fragmentation methods, 12 designed for modeling structure and dynamics of condensed phase systems. 13 This needs to be contrasted with the Coulombic electrostatics, non-Coulombic repulsion due to Pauli exclusion principle, or even dispersion and induction interactions, which all to a certain extent can be adequately described by relatively simple and computationally feasible mathematical models, like the distributed multipole moments of charge densities, the model van der Waals repulsive and attractive potentials, or the various distributed polarizability models. For that reason, the CT effects are usually not explicitly included in most of molecular mechanics force fields developed up to date. 14 Only a few polarizable force fields exist which explicitly incorporate the CT effects in an ab initio manner and are readily applicable to the condensed-phase simulations. 15 The notable examples include the second generation of the Effective Fragment Potential (EFP2) method, 9,[16][17][18][19][20][21][22][23] the Sum of Interactions Between Fragments Ab initio Computed (SIBFA) method 24,25 , the Explicit Polarization (X-Pol) method, 26 and the S/G-1 approach. 27 In all of the above methods, CT energy formulation is based on the antisymmetrization of certain subsets of Hartree products of monomer wavefunctions and a few additional approximations. Naturally, CT can also be implicitly included by performing full QM electronic structure simulations or non-force-field-based fragmentation techniques. 28,29 EFP2, being the most commonly used ab initio force field, was derived from the first-principles at the HF level 9,17,18,[30][31][32] and augmented with intermolecular dispersion effects by the response theory. 33,34 That is to say, the total intermolecular interaction potential, which is completely free of any semi-empirical parameters, is approximated as where E Coul is the Coulombic interaction energy of the unperturbed charge-density distributions of the monomers, represented by the distributed multipole approximation with damping to account for the charge-penetration effects, 35 E Ex−Rep is the exchange-repulsion energy originating from the Pauli exclusion principle, 30,31 whereas E Ind and E Disp are the induction and dispersion energies obtained from the distributed polarizability approximation, [32][33][34] and finally E CT is the CT energy, 17,18 being in the limelight of this work.
Despite the considerable success of the EFP2 approach in accurately modeling the extended molecular systems like solutions [21][22][23]36 and recently even biomolecules [37][38][39][40] with the level of accuracy reaching in many cases 19 that of the second-order Møller-Plesset perturbation theory, 41 evaluation of the CT energy in EFP2 model is still relatively costly for typical applications in the molecular dynamics simulations. It has been reported that the implementation of the EFP2 CT energy and gradient in GAMESS (US) computer program 42 with canonical molecular orbitals (CMO) is on average 20-30 times more computationally demanding than the other components. 17,19 Recent advancement of Xu and Gordon 18 reduced this cost further by up to 50-60% by minimizing the size of the virtual orbital space via the use of quasi-atomic minimal-basis orbitals. 43 (QUAMBOs) In this approach, the diagonalization of a Fock matrix in QUAMBO basis yields the original HF occupied orbitals and the virtual valence orbitals (VVO), which are then used in the CT energy evaluation instead of the original canonical virtual MOs (CVOs). Nevertheless, even with this improvement, being now a standard default in most of EFP2 applications, the EFP2 CT term still remains the most time-consuming to evaluate from among all the EFP2 contributions.
In effect, the CT energy component is sometimes ignored when applying EFP2 to chemical problems. [36][37][38][44][45][46][47][48][49] In fact, EFP2 CT term is available only in the GAMESS (US) quantum chemistry program, 42 whereas it is neither supported in the official release of the recent LIBEFP library for linking quantum chemistry packages with the EFP2 functionalities, 50 nor in the Q-CHEM quantum chemistry program, 51 contrary to the remaining electrostatic, exchange-repulsion, induction and dispersion EFP2 terms.
One of the main goals of this work, apart from developing a more efficient model of the CT energy that is compatible with the EFP2 approach, is to extend the definition of effective one-electron potential (here referred to as the EOP) technique, that has been widely explored in the past, 8,10,17,[52][53][54][55][56][57][58][59][60] to simplify the rigorous and costly fragment-based quantum chemical models of extended systems with a particular emphasis on solvation phenomena and molecular dynamics simulations.
The presented EOP technique of removing ERIs from the working equations follows the notion of the importance of one-electron densities in chemistry, 53 Effective one-electron potential (EOP) operator can be expressed asv eff = λv nuc + dr r v eff el (r) r , where the electronic part, associated with a certain effective density ρ eff , is given by Eq.(S1), whereas the nuclear part is defined in Eq.(S5) -see SI for details about the notation. Consider now an arbitrary linear functional F that explicitly depends on the ERIs. In this work, EOP-based ERI elimination procedure is defined by the following wherev eff,A is the EOP operator associated with molecule (fragment) A. Note that its mathematical form depends on the linear functional F . In Eq. (2), the summations over k and l orbitals are incorporated into a single one-electron matrix element. Thus, the total computational effort is, in principle, reduced from the two-fold sum involving evaluation of ERIs to just one much easier to compute OEI. It is also possible to generalize the above expression even further by summing over distinct linear functionals F t , as well as over where the capital italic letters denote subsets of orbitals associated with a particular fragment and X = A or B (see the discussion below). The above design has the advantage that it opens the possibility to define firstprinciples effective fragments as long as the functionals F t are well defined, computable and can be partitioned in between the interacting fragments. The derivation of Eq. (3) is shown in Appendix A.
Three unique classes of ERIs can be recognized based on the basis function partitioning scheme within the system composed of two molecules (shall be A and B throughout the course of this work). They are as follows: (see Eq. (S13) in SI). In this work, however, this method is considered already too expensive for application in the CT energy because evaluation of Eq. (S13) requires calculation of electrostatic potential and electrostatic potential gradient(s) OEIs. These kind of integrals are typically the most expensive when compared to other standard OEIs such as overlap or kinetic energy integrals. Therefore, Coulomb EOP matrix elements will be treated via semi-classical DMTP expansion (see Section S4 in SI) which is more approximate but much less expensive approach. The overlap-like EOP matrix elements will be treated via the extended density fitting method, which is discussed next.

Extended Density Fitting of EOPs
Extended density fitting of EOPs, which will be referred to as the EDF scheme, is applicable in case of matrix elements of i A v eff,A j B type. These matrix elements require ERIs of type AA AB only. To get the ab initio representation of such an overlap-like matrix element, one can use a procedure similar to the typical density fitting (DF) or resolution of identity (RI), which are nowadays routinely used to compute electronrepulsion integrals (ERIs) more efficiently, and reduce computational cost of post-Hartree-Fock methods. 61 Density fitting was also applied to design ab initio force fields. 57,58 An arbitrary one-electron potential acting on state vectors A of molecule A can be expanded in an auxiliary and generally non-orthogonal AO space a centered on A aŝ where under the necessary assumption that the auxiliary basis set is nearly complete, i.e., a S −1 aa a ∼ = 1 aa . In the above equations, S ab = a b denotes the overlap AO integrals matrix. In practice, basis sets approximately fulfilling such a resolution of identity are relatively much larger than the primary AO basis sets. Therefore, it should be computationally more efficient to utilize smaller auxiliary AO basis set m , which satisfieŝ In Eq. (6), the EOP matrix V A m is for the time being unknown. To find an expression for V A m , consider a certain orthonormal MO basis for T aX = S −1 aa a X , which is as small as possible but accurately represents the EOP operator, i.e., with T † aX S aa T aX = 1 XX due to orthonormality. The similarity transformation matrix T aX can be found from essential eigenvectors of the covariance matrix of V A a expressed in the orthogonal RI basis after Löwdin symmetric orthogonalization, i.e., where VãV † a = U˜aXgXXU † aX (10) and The operator Q in Eq. (9) selects only eigenvectors U˜aX associated with the non-vanishing eigenvalues stored in the diagonal matrix gXX. Note that the size of basis X is bounded from above by the number of state vectors A, which is also an upper bound for the size of the auxiliary AO basis m. The latter can be found by maximizing the overlap between the X MOs and their approximate expansion in basis m, i.e., where the overlap matrix is The approximate transformation matrix in m basis can be found by using the basis set projection method of Polly et al., 62 From the above analysis, the approximate identity operator is which results in the following expression for the EOP matrix, Eqs. (9)-(16) define the EDF method and once basis m is found Eq. (6) can be used to effectively eliminate ERIs and replace them with products of EOP matrix V A m and overlap matrix involving m AOs. Note that in the limiting case of m = a the EDF method reduces to the usual density fitting in the nearly-complete AO basis according to Eq. (5).
It is emphasized here that other possibilities of formulating the V A m matrix exist (see Section S5 in SI). However, it was found that Eq. (16) seems to be the most computationally efficient because only overlap integrals are required.

Charge Transfer Interaction Energy for Fragment Potentials
As the theory that is necessary to eliminate ERIs from fragment-based models has been given in the previous section, we shift our attention to the CT energy formulation for the EFP2 model. In the CT treatments of bimolecular complexes, the CT energy can be expressed as a sum of the stabilization energy due to excitations from molecule A to B and vice versa, i.e.,

EFP2 Model
To provide a complete account on the new CT energy expression proposed here, the original EFP2 formulation is briefly reviewed first. Li, Gordon and Jensen used the expansion of the overlap density in Taylor series and found four different approximate theories for the CT energy. 17 The optimal theory, which was shown to well reproduce the CT energies obtained by using the reduced variational space (RVS) analysis of Stevens and Fink, 63 reads as where and in which the summations extend over occupied (denoted by 'Occ'), virtual (denoted by 'Vir') or both (denoted by 'All') sets of MOs. Note that in case of using QUAMBOs as MO basis for Fock matrix VVOs and resulting orbital energies, instead of the original canonical HF orbitals and energies, are to be used. The effective potential energy matrix elements are defined by and are evaluated by expanding thev B tot operator in distributed multipole series according to Eq. (S13). The apparent success of the EFP2 scheme is rooted in the dramatic simplifications of the ab initio expressions for interaction energy, in which the relatively costly ERIs have been effectively removed from the working models while maintaining the required accuracy. That is, in the case of the EFP2 CT energy component, the MO energies ε i are constant parameters, whereas the overlap S nm , kinetic energy T mn and effective oneelectron electrostatic potential U eff in matrix elements are all certain types of OEIs, orders of magnitude cheaper to evaluate than ERIs. Unfortunately, due to extensive summations over virtual orbitals, evaluating Eq. (18) is still considerable in cost because typically large basis sets need to be used for generating the EFP2 parameters.
In effect, calculation of U eff in is more expensive as compared to other types of OEIs and is the bottleneck of EFP2 CT energy calculation, even when using VVOs.
In the following subsections, the alternative model of the CT energy is proposed by introducing EOPs.
Although application of the EOP method to the CT formulation in EFP2 method is probably possible, it would be relatively difficult to discuss the resulting EOP-based EFP2 models because there are four distinct versions of this theory with a set of different approximations, selected based on performance assessment rather than a rigorous derivation manner. 17 Instead, perturbation theory of Murrell et al. 11 with the explicit formulation for closed shell systems by Otto and Ladik, 10 which is somewhat more rigorous than the EFP2 CT model, is chosen as a starting point in this work. It is believed that this choice will enable a clear demonstration of the EOP technique in fragment-based modeling.

Otto-Ladik's Model: Starting Point
The CT energy at HF level of theory can be expressed by 10,11 where the coupling constant is given by Otto and Ladik, 10 here referred to as the OL method, as where i v kl j ≡ − i j kl (see the notation convention explained in SI). In the above expression, the following effective one-electron potential operators, were introduced without making any approximation to the original equation from Ref. 10 Note that ERIs in MO basis are necessary to evaluate all terms in Eq. (23).

Otto-Ladik's Model: Application of EOP Technique
One can immediately notice that the five summation terms from Eq. (23) can be classified based on Table 1 into three groups regarding the type of ERIs that are required: (i) overlap-like AB BB -the first two terms; (ii) Coulomb-like AA BB -the third term and (iii) Coulomb-like BB AA -the two last terms. Note also that there are no exchange-like terms needed in this case. Therefore, all the contributions can be re-cast in terms of the EOPs.
Group (i). Group (i) can be rewritten by using Eq. (3) to eliminate the interfragment ERIs of the overlap-like type: where the EOP matrix is given according to Eq. (5) by Note that all the calculations that are required to obtain V B ξ n are performed solely on the densities and basis sets associated with the unperturbed molecule B. Therefore, V B ξ n can be considered as effective fragment parameters used to compute the first two terms of Eq. (23) and the final expression reads which is a great simplification over the original form of group (i) because only the overlap integrals between the ith MO on molecule A and ηth auxiliary orbital on molecule B need to be evaluated. Note that the only approximation made so far was the application of density fitting and resolution of identity. If the RI basis set is sufficiently large, the errors due to this approximation can be minimal and negligible in principle. Alternatively, the optimized auxiliary AO basis set instead of the RI AO basis set can be used according to the EDF method developed in Section 2.1.2. In this case, the matrix elements of the EOP matrices are given by Eq. (16)  Group (ii). The term belonging to this group can be considered as a sum of interaction energies between the total charge density distribution of molecule B and the partial density ρ ik (r) of molecule A, weighted by the overlap integrals S nk . Using the distributed multipole expansion based at the charge centroids of the localized molecular orbitals (LMOs), r i = i r i with χ i (r) being localized, this group can be approximated by Here, it was assumed that |ρ ik (r)| ρ ii (r) for i = k in most locations in the case of LMOs, which allows one to conjecture that Now, the DMTP expansion of the interaction energy in the right hand side of Eq. (30) can be expressed as because the distributed charges q i = −1 whereas the distributed dipole moments centered at their respective LMO charge centroids vanish. 64 This means that Eq. (29) can be finally given as follows: Therefore, only overlap integrals and relative distances between atomic and LMO centroid positions are needed, which leads to a great reduction of the calculation cost, as compared either to the original expression or to the multipole expansion (left-and right-hand sides of Eq. (29), respectively). We shall refer to this approximation as to the local overlap approximation (LOA) resulting in similar expressions to the ones obtained by Jensen and Gordon in their exchange-repulsion interaction energy EFP2 model (see Eq. (39) in Ref. 30 ). Note that, to make this approximation valid, occupied molecular orbitals need to be spatially localized.
Group (iii). The terms with the overlap integrals involving the occupied MO on A can be combined into a single summation term, i.e., where the effective potential v A,eff ik (with the associated effective density In order to include the ρ B jn density, it is approximately represented here by a set of effective cumulative atomic charges {q B,( jn) y } associated with the effective one-particle density matrix In this work, the effective charges were defined via the Mulliken method as discussed in SI with λ = 0. By applying the LOA for the ρ A ik density the effective potential from Eq.
which leads to Final EOP-based forms of the coupling constant. Gathering the results from previous paragraphs, the coupling constant can be given as follows where the symbols G n (n = 1, 2, 3) denote a particular group of terms from Eqs. (28), (32) and (37), and the primes denote the localized MOs with the L A matrix being the CMO-LMO transformation matrix of molecule A. For completeness, we list all the EOP-approximated contributions to the coupling constants below: where the auxiliary variables are defined as Note that the LOA-based contributions G 2 and G 3 need to be transformed back to the canonical MO basis.
Thus, the final working formula for the interaction energy due to CT with excitations from A to B reads The total CT energy is given by the sum of the above contribution and the twin contribution due to CT from molecule B to A according to Eq. (17).

METHODS
Implementation. The benchmark CT energy was assumed to be the CT energy defined in the RVS method. 63 All the models that were used to test the theory presented in this work, i.e., the EFP2, OL, EOP and RVS methods, as well as the EDF method, were implemented in our in-house plugin to PSI4 quantum chemistry program. 65 For the CT EFP2 component, potential energy integrals from Eq. (S13) were calculated with the CAMM up to quadrupoles (distributed centers are atoms), instead of the DMA (distributed centers are atoms and mid-bond points) as implemented in most of quantum chemistry programs. The choice of CAMM versus DMA was due to convenience of implementation and, because the quantitative accuracy of our EFP2 CT energy code is comparable to the EFP2 code of GAMESS (US). Thus using only atomic distribution centers does not affect the interpretation of results in Section 4. It was also found that accuracy of the LOA from Eqs. (32) and (37) is usually slightly better when the Boys localization method 66 is used, as compared to the Pipek-Mezey method. 67 Henceforth, the former method was used for molecular orbital localization throughout all the production calculations. CT energies, unless the EDF scheme (Eq. (16)) was utilized in the case of which the auxiliary basis sets were optimized for each species separately (see description below). For helium, 6-311+G(d,p) primary 80 and augcc-pVDZ-RI auxiliary 84 basis sets were used.
Validation: Multi-Fragment Complexes. To test the total accuracy of the EFP2 method with the EOPbased CT term, three different multi-fragment model systems were selected. The structures were randomly sampled from classical molecular dynamics simulations of bulk water, DMSO and (C 2 mim)(NTf 2 ) ionic liquid in standard conditions, and subsequently optimized at EFP2/6-31++G(d,p) level by using the 'GLOBOPT' routine of GAMESS (US) program. 42 Due to considerable size of the complexes chosen, 6-31++G(d,p) basis set [73][74][75][76]78,82,85 was used to obtain the interaction energies. Full RHF interaction energies were calculated as the reference values in the aggregate-centered basis set 86 (ACBS), to eliminate the basis set superposition error (BSSE). To superimpose EFP2 parameters (including the EOP matrices), the Kabsch method was used. 87,88 Generation of the EFP2 parameters for reference EFP2 interaction energy calculations was undertaken by using the 'MAKEFP' routine of GAMESS (US) program. 42 The DMA method 89,90 was used to generate the DMTP expansion for electrostatics. Due to technical aspects of the implementation in our in-house code, the core MOs were also included. Boys method was used to localize the MOs. 66 Charge-penetration effects for Coulomb and induction interactions were taken into account by using the overlap-correction method 91 and the Tang-Toennies method, 92 respectively. The Tang-Toennies damping constant was set to a default value of 0.6 in GAMESS (US). Infinite cutoff thresholds for the evaluation of Coulomb, induction, exchange-repulsion and CT interactions were assumed. Since the reference level of theory is HF, dispersion interactions were not considered in this work and, therefore, they were omitted in all the EFP2 interaction energy calculations.
Auxiliary AO Basis Set Optimizations. The optimization routine of the EDF method was fully automatized in our in-house computer program. Basin hopping global optimization algorithm [93][94][95][96] was used, as implemented in the SCIPY Python library. 97 Since the local minima of the objective function are usually separated by c.a. 0.002 × Z max where Z max is equal to the number of VVOs, the consecutive basin hopping step was assumed to be accepted with the Metropolis probability of exp − Z new −Z old 0.002×Z max . In all of our optimizations the optimal value of objective function exceeded 0.99 × Z max and was reached in only ten basin hopping steps.
To find the local minimum in each step, the Sequential Least Squares Programming (SLSQP) method 98 was used with tolerance 1.0 × 10 −9 , whereas bounds for orbital exponents and contraction coefficients were set as [0.0002, 5000.0] and [0.001, 1.000], respectively. Additional constraint for the normalization of contraction coefficients was also implemented. All optimized auxiliary basis sets were assumed to be triply-contracted minimal basis sets with a single set of polarization p-type functions on the 1 st -row atoms and d-type on larger atoms. The starting guess parameters were found to be only weakly dependent on the chemical composition and the primary basis set. The optimized basis sets for selected molecules are given in SI, Section S8.
CPU Time Profiling. Time profiling of the code for the EFP2 and the EOP methods was performed for all the computational operations required for a single point energy calculation in a hypothetical sequential run on multiple geometries such as during a typical molecular dynamics calculation. Therefore, all the multi- Color codes: neutral hydrogen bonded neutral dipole interacting neutral charge transfer neutral weakly interacting neutral π-π stacking ionic (b)     Table 2, and the scatter plots against the reference (RVS) is shown in Figure 1 for the VVO basis, and Figure (S1) for the CVO basis. The overall accuracies of all the models are good and comparable within approximately 1 kcal/mol tolerance, with the OL/CVO method being the most accurate (RMSE of 1.13 kcal/mol), and EFP2/CVO and OL/VVO method being the least accurate (RMSE of about 2.6 kcal/mol). The EOP method performs on average slightly better than the EFP2 method by about 0.6 kcal/mol regardless of the type of virtual MOs (CVOs or VVOs) being used. Interestingly, although ideally the accuracies of the EOP method should be similar as the OL method, our results show that the accuracy of the EOP method is better when using the VVO basis, and worse when using the CVO basis, as compared to the OL method. Close inspection of this discrepancy shows that the ERI elimination technique works quantitatively well only for the overlap-like ERIs (G 1 terms), whereas the contributions of the Coulomb-like ERIs (G 2 and G 3 terms) are typically overestimated relative to the corresponding OL results, especially when the VVOs are used. Therefore, the errors due to the LOA in G 2 and G 3 terms contribute to an accidental error cancellation in the VVO basis and loss of accuracy in the CVO basis. It is also worth noting that despite quite similar overall accuracy of EFP2 and EOP models in the VVO basis, only EOP model correctly describes the neutral charge-transfer complexes (compare blue dots with blue crosses in Figure 1a).
However, in the case of H-bonded neutral systems as well as ionic systems (black and red markers in Figure 1a, respectively) VVO EFP2 performs better than VVO EOP by roughly 1 kcal/mol.
The asymptotic dependence of the CT/VVO energy shown in Figure 2 indicates that in the region near the equilibrium and farther the EOP method (see solid green dots) is similarly accurate as the EFP2 method (see solid blue dots) except for the H 2 O-NH + 4 system, in the case of which the EOP model overestimates the CT energies roughly by a factor of 2 whereas the EFP2 model well reproduces the RVS estimates. In the shortdistance regions, the EFP2 method works quantitatively well for neutral systems, and slightly underestimates the CT energies for ionic systems, while the EOP model consistently underestimates the CT energies in all cases except for the H 2 O-NH + 4 system. In contrast to the VVO basis, using the CVO basis with combination of the EOP formulation leads to unreasonably large overestimation of the CT energy in the ionic systems. Therefore, the CVO basis is generally not suitable for the EOP model and, from that point onwards, we focus our attention only on the VVO-based models.
EOPm Variant. Interestingly, using optimal polarization auxiliary basis sets as alternative ('EOPm' variant, green open circles in Figure 2) drastically improves the behavior of the EOP model in the short-distance region. This is likely due to the fruitful cancellation of errors upon truncation of the RI space in the G 1 terms. Near the equilibrium and in the long-distance limit 'EOP' and 'EOPm' variants generate similar results.
Multi-fragment systems. From the above results one can conclude that the EFP2 model and the EOP CT model perform on overall comparably well when the VVO basis is used, except for the 'EOP' variant in the short-range regions in the case of which CT energies are often underestimated. Nevertheless, accuracies near equilibrium geometries are acceptable for both 'EOP' and 'EOPm' variants. Therefore, we have applied the following three variants of the full EFP2 model for several multi-fragment model systems (Table 3): (i) original EFP2 formulation; (ii) original EFP2 but with the CT term replaced with the 'EOP' variant -denoted as the EFP2 EOP method; (iii) EFP2 with EOP-based term in 'EOPm' variant -denoted as the EFP2 EOPm method. As can be seen from Table 3, the accuracy of the EFP2 EOP method is good and comparable to the original EFP2 method in most of the cases studied. The total errors fall in the range of 2-4 kcal/mol as compared to the full HF results, except for the (C 2 mim + ) 4 (NTf − 2 ) 4 system where the EOP term introduced error of 7.1 kcal/mol. It is also found that the EFP2 EOPm method performs rather similarly in (H 2 O) 15 and (DMSO) 9 systems, and even slightly better in the (C 2 mim + ) 4 (NTf − 2 ) 4 system. For the latter, the error was reduced by 3 kcal/mol as compared to the EFP2 EOP method, resulting in a total error of 4.1 kcal/mol. However, EFP2 method is more accurate in ionic liquid system, which is also consistent with Table 2 where the EFP2 model outperforms EOP in the ionic systems database set. Nevertheless, the general accuracy of the EOP and EOPm models is acceptable.

Reduction of computational costs
The utmost goal of this work is to reduce the computational cost of the CT energy evaluation in the calculations involving effective fragment potentials. In Table 4 estimation of the computational cost of the EFP2 and EOP models is shown. It is apparent that EFP2 requires much more quantities to be computed as compared to the EOP method ('Calculables' in the table). Clearly, evaluation of the EFP2 CT expression from Eq. (18) involves quite a number of different types of OEIs. According to our estimations that assume sequential (two-step) twoindex AO-MO transformations of OEI matrices and large AO basis sets, the computational cost is of an order of 2p 3 (s + t) + 3vop 2 , where the o and p denote the number of occupied orbitals and the number of atomic basis functions, respectively. Here, s, t and v are the relative costs of evaluation of the overlap, kinetic energy, and multipole potential OEIs, respectively, with the latter being most expensive but necessary to compute U eff matrices from Eq. (21). On the contrary, EOP-based expression from Eq. (41) requires only overlap OEIs that are the least expensive, and has the cost of approximate magnitude of 2sop 2 for relatively small auxiliary basis sets. Note also that, among the calculables that are needed in each EOP-based CT energy evaluation, are the auxiliary vectors and matrices from Eqs. (40a) and (40b), the cost of which is negligible. The amount of effective fragment parameters, that needs to be superimposed during the calculations by applying rotation of orbitals and basis functions ('Superimposable parameters') is rather the same in EFP2 and EOP models.
This includes the LCAO-MO coefficients in canonical basis, and the primary basis set, with an addition of the auxiliary basis set for the EOP model. Therefore, the cost of parameter superimposition should not be significantly larger as in the EFP2 formulation, provided sufficiently small auxiliary basis set is used. For example, assuming a water dimer system and 6-311++G(d,p) primary and minimal auxiliary basis set with s = t = v ≈ 1, the EOP CT method is predicted to be roughly 12-16 times faster than EFP2 CT method. In practice, the parameters t and v will have larger values, especially the latter.
Comparison of the time profiling of full EFP2 and EFP2 EOP methods, shown in Table 3, reveals considerable CPU time savings when the EOP CT term is used. Based on our implementation, 8-, 23-and 33-fold speed-ups in the case of (H 2 O) 15 , (DMSO) 9 and (C 2 mim + ) 4 (NTf − 2 ) 4 systems were observed, respectively. The CPU time saving generally increases with the size of the fragment.
It was also found that using the 'EOPm' variant leads to a further 50% reduction of the CPU time for the CT term evaluation as compared to the 'EOP' variant. In effect, the CPU time for the total energy evaluation reduces by a factor of 1.6 when using the EFP2 EOPm method, since the EOP-based CT term is of comparable cost as the EFP2 exchange-repulsion term. As a result, the total CPU time required to evaluate total interaction energy with the EFP2 EOP(m) methods gets dramatically reduced. The order of magnitude of these speed-ups is in good agreement with our theoretically predicted values based on Table 4 discussed above.
The time needed to compute the CT parameters in the preparatory EFP2 calculation is comparable when using the original EFP2 method and the EFP2 EOPm method. However, minimization of the auxiliary basis set size through the EDF scheme, which is required for the 'EOPm' variant, is more costly and, based on our current implementation, requires additional few CPU hours. Nevertheless, the overall CPU time required to compute the EOP CT parameters in the 'EOPm' variant is still negligible as compared to the CPU time needed for a typical molecular dynamics simulation employing the EFP2 force-field.

Memory requirements
The memory requirements for generating the EFP2 and EOP CT parameters in the 'EOP' variant are virtually the same. Slightly more memory is required for generating parameters in the 'EOPm' variant due to the additional storage of intermediate basin hopping solutions and steps during the gradient search when optimizing the auxiliary basis set. However, memory requirements are still much lesser than those to solve the coupled-perturbed Hartree-Fock (CPHF) equations, 100,101 which is necessary for the preparatory step of EFP2 parameters generation. Therefore, memory requirements are not expected to be larger as compared to the EFP2 in a typical implementation of the EOP and the EDF method.

CONCLUSIONS
In this work, the effective one-electron potential operator technique of eliminating electron repulsion integrals in the fragment-based theories of intermolecular interactions was proposed. It was shown that in general case two types of EOPs can be defined and worked out either through the density fitting or the distributed multipole expansion. For the first group of EOPs, the density fitting was extended via the optimization of the auxiliary AO basis set to further reduce computational costs.
The EOP technique was then applied to calculate the charge transfer energy in a variety of bi-molecular complexes as well as a few multi-fragment systems within the framework of the state-of-the-art EFP2 model.

Conflict of interest
The authors declare that they have no conflict of interest.

A General Form of Interfragment-Separable ERI Functionals
Here we derive Eq. (3) which defines the interfragment ERI elimination technique. To generate EOP matrix elements out of interfragment ERIs, the ERI functional must be linear with respect to ERIs and separable in between fragments so that the effective potential operators involving orbitals of same fragment can be defined.
Here we consider a general example of such ERI functional, that involves a generalized Coulomb and exchange operators acting only within the same fragment orbital subspace. For the case of the BA AA ERI class, where the linear functional of the electron-electron repulsion operator |r 1 − r 2 | −1 is , as well as functionals f A t J and f A t K are arbitrary as long as they depend solely on fragment A. For instance, it immediately follows that Now consider the sum over Coulomb and exchange functionals, F t J + F t K . When ERIs are represented in the same basis a unified EOP operator cannot be defined. However, one can make use of another (arbitrary) basis via the unitary transformation U such that and expand j and k orbitals in the expressions for the Coulomb and exchange functionals, respectively. Such basis sets might be for example molecular orbitals and atomic basis functions. Now, the summations over ERIs can be combined and after a suitable interchange of dummy summation variables the EOP is given by where the EOP operator reads v eff,A,(t) j Defining EOPs for the BB AA ERI class allows the following functionals (Y A,(t) kl matrices being arbitrary) which immediately can be re-cast via an EOP By summing over t (all distinct functionals), one arrives to two-electron part of Eq. (3). The derivation of the one-electron part of this equation is straightforward.     It was assumed that the number of virtual orbitals is equal to n.
download file view on ChemRxiv main.pdf (3.25 MiB)

Effective One-Electron Potential Operators
To establish the notation within the present work, the formalism of effective one-electron potential (EOP) operators is given here. The one-electron Coulomb static effective potential v eff (r) produced by a certain effective one-electron charge where r is a spatial coordinate. For convenience, the total effective density can be split into nuclear and electronic contributions, where λ is a certain parameter and is assumed to be either 1 or 0 in this work. Formally, the electronic part can be expanded in terms of an effective bond order matrix, or effective one-particle density (EOD), represented in a certain (arbitrary and general at this stage of presentation) basis of orbitals, {φ α (r)}, Based on that, the operator form of the effective potential can be written aŝ v eff = λv nuc + dr r v eff el (r) r (S4) with the nuclear operator defined bŷ and r α ≡ φ α (r). The matrix element of the effective potential operator is therefore given by where Z x is the atomic number of the xth atom, and the symbol ' At' denotes all atoms that contribute to the effective potential.
In the above equation and throughout the work, for any one-electron operator O(1) and the ERI is defined according to

Notation of Matrices
In this work, matrices are represented either by explicit indexed description of its matrix elements, i.e., A i j , or by bold symbols, i.e., A (the particular notation form depends on convenience of presentation). In the latter case, the vector spaces are sometimes emphasized to avoid confusion, e.g., A aX denotes a matrix A which rows are expanded in a vector space a and columns in vector space X. Vector space is here denoted as a matrix itself, comprising of the set of basis vectors, i.e., a ≡ a(r) = {φ 1 a (r), φ 2 a (r), · · · } (being the columns of the matrix a). Equivalently, using the Dirac notation rather than orbital functions, one can also write a = 1 a , 2 a , · · · . Matrix-matrix multiplication in bold notation is defined as The general operator can be represented as a matrix in bold notation according to where C i j = ∑ i ∈a C ii U i j (or, equivalently, C ab = C aa U ab ) and U ab is the similarity transformation matrix between arbitrary complete basis sets a and b with overlap matrix denoted as the inner product a b ≡ S ab . We use such a notation in Section 2.2 in the main text because manupilating basis set transformations, as well as matrix representations of identity operator in various vector spaces are much more straightforward as compared to the traditional explicit index notation.

Multipole Expansion of the Potential Energy Integrals
One-electron potential energy integrals required to evaluate the EFP2 CT term 1 can be expressed through (S13) In the above equation,q eff w ,μ eff w andΘ Θ Θ eff w are the quantum operators of effective distributed monopole (charge), dipole moment and quadrupole moment, respectively, centered on the wth site at r w , whereas T (d) we are the so called interaction tensors of rank d between r w and r e , with the latter being the electronic coordinate. Explicit forms of interaction tensors can be found elsewhere. 2 Note that j and l can belong to either molecule. In this way, ERIs are no longer needed and the computational cost reduces appreciably.

Semi-Classical Multipole Expansion of EOPs
Semi-classical multipole expansion is the most applicable in case of matrix elements of the type j B v eff,A l B because it can be considered as a Coulombic interaction between ρ B jl and ρ eff,A . These matrix elements require ERIs of type AA BB only. In general, given certain two effective one-electron density distributions, the associated effective Coulombic interaction energy can be estimated from the classical formula according to The above integral can be approximated by applying the DMTP expansion to both effective potential operators and dropping theˆsymbol in the multipole operators which leads to: 3 The symbol ' ' denotes the sum of all the tensor contractions performed over the DMTP's of molecule X and Y to yield the associated interaction energy. The choice of the distribution centres as well as the truncation order of the multipole expansion is crucial in compromising the accuracy and computational cost of the resulting expressions. There are many ways in which this can be achieved, e.g., through the distributed multipole analysis (DMA) of Stone and Alderton, 4,5 the cumulative atomic multipole moments (CAMM) of Sokalski and Poirier, 6 the localised distributed multipole expansion (LMTP) of Etchtebest, Lavery and Pullman 7 or other schemes based on fitting to electrostatic potential, such as ChelpG. 8 In this work, CAMM method with Mulliken partitioning of the AO space were chosen because of their simplicity. One of examples of such EOP-based models reported already in the literature is the transition cumulative atomic multipole moments (TrCAMM) model for estimation of the excitation energy transfer (EET) couplings in the Förster limit, 9 or the vibrational solvatochromic distributed multipole moments (SolCAMM) for the determination of the vibrational frequency shifts of spatially localized IR probes. 10 In general, having the EOD and λ , the effective cumulative atomic charges are given by Note that P eff αβ can refer to the particular electron spin, or their sum as a bond order matrix as well. Equations for the effective distributed dipoles, quadrupoles, octupoles and hexadecapoles can be obtained from Ref. 9

Density Fitting of EOPs: Alternative Approach
In contrast to the EDF scheme in the main text, an alternative scheme, which requires additional calculations of the twocenter ERIs, was also developed in this work. This alternative scheme was implemented by us and was found to generate accurate EOP matrices in auxiliary AO basis set m. However, due to the need of calculating ERIs (which is more costly than just overlap integrals as compared to Eq. (16) in the main text) we do not use this method in production calculations. Additionally, EDF method from the main text allows to estimate the minimal size of basis m, which is another advantage 1-12 | 3 over the alternative approach. Nevertheless, we present this development below for completness and future reference.
An alternative way of finding an optimal auxiliary AO basis set m is via using a well known method in density fitting, mainly minimization of the self-energy 11 with the error density defined by and the EOP matrix given bŷ By requiring that Note that, while R ηξ is a typical 2-center ERI that can be evaluated by standard means, 12 b (i) η matrix elements are not trivial to calculate because the EOP operator, which contains integration over an electron coordinate, is present inside the double integral. Therefore, the following triple integral, has to be computed. Obtaining all the necessery integrals of this kind directly by performing integrations of Eq. (S23) is very costly even for medium sized molecules. 13 However, one can introduce the effective potential in order to eliminate one integration. This can be achieved by performing additional density fitting in a nearly complete intermediate basis 13 where the 'RI' symbol in the summation denotes the nearly-complete RI AO basis set. The working equation is therefore given by which can be easily evaluated by noting that It is emphasized here that, as long as the state vector i , EOP operatorv A , and the auxiliary and intermediate basis sets depend solely on the unperturbed molecule A, the matrix elements V A iξ can be calculated just once and stored as effective fragment parameters. We noticed that the above scheme can be used for incomplete auxiliary AO basis sets.
6 Accuracy of the EOP method with CVOs Below, Cartesian coordinates (in Å) of bi-molecular complexes used for asymptotic analysis from Figure 2 in the main text are shown. They were obtained by energy optimizations at HF/6-31+G(d,p) level of theory. [19][20][21][22] To generate the displaced structures see the main text for details.