Multiresolution quantum chemistry in multiwavelet bases: Hartree–Fock exchange

In a previous study we reported an efficient, accurate multiresolution solver for the Kohn-Sham self-consisitent field (KS-SCF) method for general polyatomic molecules. This study presents an efficient numerical algorithm to evalute Hartree-Fock (HF) exchange in the multiresolution SCF method to solve the HF equations. The algorithm employs fast integral convolution with the Poission kernel in the nonstandard form, screening the sparse multiwavelet representation to compute results of the integral operator only where required by the nonlocal exchange operator. Localized molecular obitals are used to attain near linear scaling. Results for atoms and molecules demonstrate reliable precision and speed. Calculations for small water clusters demonstrate a total cost to compute the HF exchange potential for all n(occ) occpuied MOs scaling as O(n(occ) (1.5)).


I. INTRODUCTION
In a previous work, 1 we presented a practical multiresolution method 2,3 in multiwavelet bases to solve the allelectron local density approximation ͑LDA͒ Kohn-Sham ͑KS͒ ͑Ref. 4͒ equation for molecules. The approach described in Ref. 5 was employed for the solution of the integral form of the density functional equations. In this paper, we extend the approach to include Hartree-Fock ͑HF͒ exchange using an algorithm that will scale asympotically linearly with the molecule size while maintaining a guarantee of arbitrary finite precision.
The evaluation of HF exchange is necessary to determine the electronic wave functions for HF method 6 and the KS method with hybrid exchange functionals ͑such as B3LYP͒ ͑Refs. 7 and 8͒ as well as post-HF methods. The HF ͑or exact͒ exchange term is a nonlocal operator, and this evaluation has been regarded as a major computational bottleneck for electronic structure calculations. 9 The widely used Gaussian basis implementation with the direct selfconsistent field ͑SCF͒ method 10 demands O(N 3 -4 ) scaling cost for small and moderate-size molecules, and possibly O(N 2 ln N) for large systems. 11 Several algorithms have been proposed for O(N) Hartree-Fock exchange in the Gaussian bases, [12][13][14] based upon assuming or forcing an exponential asymptotic behavior in the density matrix, (r,rЈ) ϳexp(Ϫ͉rϪrЈ͉/ᐉ) ͑Ref. 15͒, where ᐉ is a system-specific length-scale parameter. However, the demonstrations of linear scaling behavior are limited to using relatively small and nondiffuse basis sets for large systems. Diffuse functions are often chemically essential even in HF or density functional theory ͑DFT͒ calculations, and present a significant computational challenge in large systems, such as the DNA stacking problem. 16 Furthermore, linear scaling algorithms based upon forcing sparsity unavoidably introduce an error in the total energy that grows with the system size unless the truncation threshold is reduced with the system size. This is not practical for computing small chemical energy differences in very large systems.
These previous methods have relied exclusively upon spatially localized representations to achieve efficiency by introducing sparsity into the density matrix and/or integrals. However, this approach immediately precludes efficient treatment of either the smoothly varying component of density, or dense blocks of the density matrix such as inevitably occur within atoms or between neighboring atoms. Forcing exponential decay in metallic systems or systems with small highest occupied molecular orbital-lowest unoccupied molecular orbital ͑HOMO-LUMO͒ gaps, such as graphitic sheets, will generate incorrect results since the density matrix does not decay exponentially. The previous O(N) HF exchange method reverts to an O(N 2 ) complexity in such systems. No systematic analysis has been performed to date on the errors introduced by these exchange approximations. This is an especial concern in multicomponent systems such as required in the study of catalytic processes involving conjugated hydrocarbons on the surface of nanoscale metal particles absorbed on oxide surfaces. In contrast, a multiresolution approach, by separating the behavior of functions between length scales is efficient and maintains a guarantee of precision regardless of the physical composition of the system.
Recent success with hybrid DFT exchange-correlation functionals 7,8 has motivated the development of efficient techniques to include HF exchange in plane-wave ͑PW͒ DFT molecular dynamics. 17 A straightforwardly optimized procedure, which was reported by Chawla and Voth 18 in conjunction with the all-electron ''projector augmented wave'' method, 19 formally scales in O(n occ 2 N PW ln N PW ) for both en-ergy and its gradient ͑with respect to orbitals͒ of the HF exchange, where n occ and N PW are the numbers of occupied MOs and plane waves, respectively. A systematic approach to multiresolution constructions started with the development of wavelet bases, see Ref. 2 and references therein. For numerical applications, the results in Ref. 3 pointed out a practical approach to reducing the computational cost. One of the results of Ref. 3 was the introduction of the nonstandard form ͑NS-form͒ for representing operators in multiresolution bases. However, the straightforward generalization of NS-form ͑or for that matter, the standard form͒ to multiple dimensions is too expensive for practical applications. Our approach is based on using NS-form and the separated representations of operators that were first used in Ref. 20 and significantly extended in Ref. 21. The basic point of Ref. 21 is that many apparently nonseparable operators are, in fact, separable with a finite but arbitrary precision. Moreover, the number of terms necessary for such representations is remarkably small.
In the following, we first present essential aspect of the numerical approach, describe the algorithm for the HF exchange, then the combination with localized orbitals. Finally, we provide some numerical results and conclusions.

II. MATHEMATICAL BACKGROUND
Multiresolution wavelet and multiwavelet expansions organize functions and operators efficiently in terms of proximity on a given scale and between the length scales. 2,3,5,20 Multiresolution decomposes the Hilbert space L 2 (R d ), d у1 into a chain of closed subspaces ͑refinement scales͒, The wavelet subspaces W j are defined as an orthogonal complement of V j in V jϪ1 , thus In a previous study, 1 we presented a practical approach for fast computation with integral operators ͑notably Green functions for the Poisson and bound-state Helmholtz equa-tions͒ in the NS-form. Let T be an operator such as the integral convolution operator with the kernel K,

͑3͒
As was introduced in Ref. 3, by using orthogonal projection operators P j :L 2 (R d )→V j , and Q j :L 2 (R d )→W j with Q j ϭ P jϩ1 Ϫ P j , the NS-form of the operator T is expressed as where T j ϭ P j TP j , A j ϭQ j TQ j , B j ϭQ j TP j , and C ϭ P j TQ j . This Eq. ͑4͒ is derived from a ''telescopic'' series represented by the projection operators as The application of the NS-form of the operator T to the function f, i.e., T n f n , ͓Eq. ͑3͔͒ is also given as a telescopic series We can rewrite Eq. ͑7͒ taking the relation P j Q j ϭQ j P j ϭ0 into account, In this paper, the elements of T n are denoted ͑in one dimen-sion͒ by ͓r lm where il n (x) is a basis for the projection operator P n . 5 The elements of A nϪ1 , B nϪ1 , and C nϪ1 ͑equivalently, T n ϪT nϪ1 ) denoted by ␣ lm n , ␤ lm n , and ␥ lm n , respectively, are computed from those of T n by the two-scale relations. The key in NS-form is that this representation of an operator uses bases with supports on squares and has an advantage that it acts scale by scale, without explicitly involving the interactions between different length scales. Futhermore, the sparsity in the multiwavelet space (W j ), which is guaranteed by the vanishing moment property, eventually enables a linear scaling operation as O͓(Ϫln ⑀)N͔, where ⑀ is the scheduled accuracy, and N is the number of boxes, or coefficients.
In one dimension the implementation of the above formulation using the kth Legendre polynomial bases is very straightforward for the application of an integral operator with O͓(Ϫln ⑀)k 2 N͔ operation cost. However, in three dimensions the cost is nominally O͓(Ϫln ⑀)k 6 N͔ operations, which has a prohibitively large overhead for a practical code. Separated representations of integral operators 1,20 and use of the low operator rank of the resulting one-dimensional operators, reduce the computational cost to O͓(Ϫln ⑀)k 3 N͔ operations, which enables practical computation with multiwavelet bases in three and higher dimensions.

III. ALGORITHM
Initially, most of the equations are written for simplicity as if in one dimension. The extension to three dimensions is discussed at the end. The application of the Hartree-Fock operator (K ) to a function ͓ f (x)͔ is given by where the density matrix in coordinate presentation is given by where (x) is the th occupied molecular orbital ͑MO͒, and n occ is the number of occupied MOs.
We already have a fast algorithm in multiwavelet bases to apply Coulomb operator ͑i.e., the Green's function to the Poisson equation͒ ͑Ref. 1͒, which may be used to implement exchange in a straightforward fashion, as follows: This is evaluated by looping over MOs ( ), and forming the product of the MO with the target function ͑f͒; applying the Coulomb operator; and multiplying the resulting potential by the MO. During each iteration of the solution of the Hartree-Fock equations, this operator must be applied to each occupied orbital. Therefore, the nominal cost of this approach is roughly O(Vn occ 2 ), where V is the system volume. The dependence on V will probably be weaker than this due to the adaptive multiresolution representation and the potential being smooth at long range. Therefore V might be neglected from scaling expressions hereafter. If the occupied orbitals are localized ͑or, equivalently, if the density matrix is sparse͒, there will only be O(n occ ) nonzero products of the occupied orbitals, and the above algorithm reduces to O(Vn occ ). The factor proportional to the system volume arises from computing in all space the electrostatic potential due to the product (xЈ) f (xЈ). However, the very next step in the algorithm is multiplication by the localized MO. If the algorithm that applies the Coulomb operator is modified to compute the potential only in the volume where the MO is significant, the overall cost can be reduced to O(n occ ), or O(1) for each localized target function.
Our goal is to modify the Coulomb operator so that it computes the potential to the required precision in the volume where the localized MO is significant, and outside this volume does as little work as possible. Since the NS-form 3 of the Coulomb operator computes contributions at all length scales, it is inevitable that there will be nonzero values outside of the desired minimum volume. However, the finerscale contributions need to be computed only where necessary, and only to a precision controlled, in part, by the magnitude of the orbital.
We denote the product of a MO ͓(x)͔ and the target function ͓ f (x)͔ by g(x)ϭ(x) f (x), and the electrostatic potential due to the charge density g(x) by u(x). Then in some subvolume ⍀ we wish to determine if the exchange potential may be neglected. A norm-wise bound on the contribution may be obtained from the Schwarz inequality

͑13͒
The 2-norm of a function in the scaling function basis is simply the 2-norm of the coefficients, since the basis functions are orthonormal.
The potential is formed from the product of the matrix representation at level n of the Coulomb operator (T n with elements ͓r lm n ͔ i j ) and the function coefficients (g li n ) The translation l labels a subvolume at level n. Due to the multiscale summation, and also to minimize the computation, we must screen each contribution separately. A normwise bound for contribution in box l from box m is provided by the Schwarz inequality, which for matrices becomes the Frobenius norm At each level of the multiscale summation, the dominant contributions come from the box and its 26 nearest neighbors, so we can neglect contributions if The multiscale summation over the n max refinement levels is accomplished through the nonstandard form ͓Eq. ͑5͔͒. The difference T n ϪT nϪ1 decays very rapidly with distance since the long range ͑i.e., smoother͒ parts of the operator are accurately represented at the coarser levels. Thus, the screening is in practice performed using ʈs l n ʈ 2 ʈr lm n Ϫr lm nϪ1 ʈ 2 ʈg m n ʈ 2 р ⑀ 27n max . ͑17͒ The norm of the target function ͓g(x)ϭ (x) f (x)͔ in each box can be precomputed and stored. Since the Coulomb operator is an integral convolution, the matrix r lm n actually only depends upon lϪm. Furthermore, the operator is homogeneous and, therefore, the matrix at level n can be obtained by rescaling the matrix at level 0. These properties permit the operator matrices to be computed as needed with little computational overhead. Finally, the norm of the molecular orbital in each box in which it is significant can also be precomputed and stored. During the application of the operator, the norm of the MO may be required in a box which has not been precomputed. This can happen during the computation of contributions at a locally finer or coarser level than that required to resolve the MO. In this instance, the two-scale relationship 5,22 can be used to compute the missing coefficients. This additional computational cost is minimized by appropriate ordering of the screening tests and loops, and by reuse in the application of the exchange operator to multiple target functions. The algorithm that applies the screened Coulomb operator to g(x) is then as follows: Since the multiwavelets have at least k vanishing moments ͑or multipoles͒, the numerical significance of the coefficient block ␣ lm n , which is one of NS-form elements ͑denoted by ''͉NS͉'' in the above algorithm͒ of the interaction with the kernel of Poisson equation (͉rϪrЈ͉ Ϫ1 ) in full multiwavelet representation, will be asymptotically proportional to ͉lϪm͉ Ϫ2kϪ1 . Also, the most slowly decaying elements in the other two blocks of NS-form elements, ␤ lm n , and ␥ lm n , decays as ͉lϪm͉ ϪkϪ1 . Furthermore, the multiscale representation of the function will introduce additional sparsity. Since we usually use Legendre polynomials of relatively high order (kϭ5,7,9) for reliable calculations, each submatrix of the operator in NS-form is expected to be significant just in a narrow band around the diagonal. This aspect leads to the linear-scaling computation cost in the application of the operator to the function.
The extension to three dimension is accomplished by replacing the translations ͑l and m͒ and the multiwavelet indices ͑i and j͒ with three vectors. However, as previously described in Ref. 1, an efficient algorithm requires the use of a separated form for the operator to reduce the effective scaling to quartic in order of the multiwavelet. The separated form is given by and the number of terms included in the summation over is referred to as the separation rank of the operator in analogy to the two-dimensional singular-value decomposition. A suboptimal separation is currently obtained by fitting 1/r ͑the kernel of the Coulomb operator͒ to a sum of Gaussian over a large range of r. The matrix elements of the one-dimensional operators ͑Gaussian convolutions͒ are readily generated as described in Refs. 1, 3, and 5. Application of the separated nonstandard form (T lm n ϪT lm nϪ1 ) of the operator can be accomplished in several ways. The most straightforward is to apply separately T lm n and T lm nϪ1 , and then to subtract the results. However, this is both inefficient and can result in loss of significant digits. A better approach is subtract the separated representations of the two operators and then perform a rank reduction as described in Ref. 21. The NS operator is expected to be low rank, 21 and, it is possible to avoid some of the loss of precision. The most accurate, and potentially most efficient, approach is to use the two-scale relationship to form individually the 63 blocks of the nonstandard form, 3,5 each in a separated representation. This avoids the loss of precision arising from taking the difference T lm n ϪT lm nϪ1 and permits exploitation of the differing sparsity in each block of the nonstandard form. However, in the separated representation the necessary code is much more complicated and we have not yet pursued this approach.

IV. LOCALIZATION OF ORBITALS
Our formulation for the HF method is based on the canonical molecular orbitals ͑CMOs͒ ͓i.e., ϭϪ2Ĝ * (V• )], which are spatially delocalized. In our implementation of the integral operators, the Coulomb potential can be computed in O(Vn occ ) due to the locality of the density function (x), which is invariant to the unitary transformation among the occupied MOs, whereas the computational cost to apply the HF exchange operator to all of the occupied MOs depends on the spatiality of the MOs. The nominal cost of the HF exchange operator with all of the delocalized CMOs is estimated to be roughly O(Vn occ 2 ). The quadratic dependence on n occ can be reduced by using the localized molecular orbitals ͑LMOs͒ via the cutoff mechanism implemented in NS-form of the HF exchange operator. The LMOs are obtained via the unitary transformation among the CMOs as where U is an unitary matrix giving a rotation between the CMOs and LMOs. The HF exchange operator to all of CMOs is calculated via the unitary transformation of HF exchange for all of the LMOs as The algorithm to compute HF exchange via LMOs is summarized as follows: ͑1͒ Obtain the unitary matrix for Foster-Boys localization. 23 The matrix is simply obtained from the n occ

A. Hartree-Fock calculation on atoms
An initial test of the implementation was performed upon the neutral atoms He, Be, Ne, Mg, Ca, and Sr. Table I lists the total and the HOMO energies along with the results of Thakkar and co-workers. 24 In the multiresolution calculations, the nuclear potential smoothing parameter was chosen so as to yield an energy accurate to at least 10 Ϫ6 hartree ͑denoted by ⑀ nuc ϭ10 Ϫ6 ), the box size was set as L ϭ80 bohrs and D 2h symmetry was used. We could not use LMOs because the symmetry usage forced the MOs to be delocalized. We used the seventh and ninth order multiwavelet bases and solved to a residual in the MOs ͓denoted by r(MO)] of 10 Ϫ5 and 10 Ϫ6 , respectively. For the Ca and Sr atoms, a more precise calculation with kϭ11 and r(MO) ϭ10 Ϫ6 was additionally carried out. The results agree with the atomic data from Thakkar, which was calculated with Slater-type functions to predict total energies within around 10 Ϫ6 hartree, which is the accuracy delivered in the previous testing LSDA calculations. 1 Systematic convergence to the correct atomic limit is observed in the multiresolution calculations. For all the test atoms except Sr, the ninth order multiwavelet bases reproduce Thakkar's results for the total energies with the five decimals. For Sr, the 11th multiwavelet bases give an excellent agreement with Thakkar's total energy within 1ϫ10 Ϫ6 hartree. The accuracies of the total energies obtained by the seventh multiwavelet bases are 9ϫ10 Ϫ8 , 2ϫ10 Ϫ6 , 2ϫ10 Ϫ5 , 5ϫ10 Ϫ5 , 9ϫ10 Ϫ5 , and 4ϫ10 Ϫ4 hartree for He, Be, Ne, Mg, Ca, and Sr, respectively. For Ca and Sr, the ninth multiwavelet bases yield the accuracy as 3ϫ10 Ϫ6 and 2ϫ10 Ϫ5 hartree for the total energy, respectively.

B. Hartree-Fock calculation on diatomic molecules
Hartree-Fock calculations were performed upon several homonuclear/heteronuclear diatomic molecules near their equilibrium geometries. The molecules and their bond lengths are from a previous study by Pahl and Handy. 25 For the mononuclear diatomic molecules, we computed H 2 , C 2 , N 2 , and F 2 with the bond lengths 1.400, 2.358, 2.068, and 2.668 bohrs, respectively. For the heteronuclear systems BH, HF, BF, CO, and NO ϩ with the bond lengths 2.3289, 1.7328, 2.386, 2.132, and 2.0092 bohrs. The box size was set as the bond length times 32 bohrs in order to locate the nuclei at dyadic points.
Tables II and III list the total and HOMO energies for the diatomic molecules, and the dipole moments for the heteronuclear molecules. The results for the homonuclear diatomic molecules were obtained with the D 2h symmetry and with the nuclear potentials smoothed within the accuracy 10 Ϫ6 and 10 Ϫ7 hartree ͑denoted by ⑀ nuc ϭ10 Ϫ6 and 10 Ϫ7 , respec-tively͒. For the heteronuclear diatomic molecules, we used C 2v symmetry and ⑀ nuc ϭ10 Ϫ6 . The tables include the extrapolated results reported by Pahl and Handy, 25 the numerical Refs. 26 and 27, and the Gaussian calculations with augcc-pVXZ, (XϭT,Q,5) ͑Ref. 28͒ using the NWCHEM program package. 29 As to the homonuclear diatomic molecules, the most accurate multiresolution results obtained were with 11th multiwavelet bases, residuals of MOs r(MO)Ͻ10 Ϫ6 , and ⑀ϭ10 Ϫ7 . Our calculations show systematic convergence with an accuracy of at least 10 Ϫ6 hartree for the total energies. We observed that the total energies are already converged with ninth multiwavelet bases and r(MO)р10 Ϫ6 within the accuracy 10 Ϫ6 hartree. There is no significant error within an expected accuracy between two smoothing parameters ⑀ nuc ϭ10 Ϫ6 and ⑀ nuc ϭ10 Ϫ7 , as expected. The resulting total energies agree with the previous numerical calculations 26 27 The deviations of aug-cc-pV5Z from the multiresolution calculations with the 11th multiwavelet bases are 1.9 ϫ10 Ϫ5 , 1.2ϫ10 Ϫ4 , 2.2ϫ10 Ϫ4 , and 4.4ϫ10 Ϫ4 hartree for the total energies of H 2 , C 2 , N 2 , and F 2 , respectively. Since we observed that the ninth order multiwavelet bases r(MO)р10 Ϫ6 and ⑀ nuc ϭ10 Ϫ6 gave sufficient accuracy, we did not use either the 11th multiwavelet bases or ⑀ nuc ϭ10 Ϫ7 for the hetronuclear diatomic molecules. For these molecules, the dipole moments were also computed, the implementation of which is discussed elsewhere. 30 The  results for the total energy again agree to three decimal places with the previous numerical calculations 27 and to four places with those of Pahl and Handy. 25 The differences in the energy between the multiresolution calculations with the 9th multiwavelet bases and aug-cc-pV5Z are 4.2ϫ10 Ϫ5 , 2.3 ϫ10 Ϫ4 , 2.6ϫ10 Ϫ4 , 2.3ϫ10 Ϫ4 , and 2.7ϫ10 Ϫ4 hartree for the total energies of BH, HF, BF, CO, and NO ϩ , respectively. The maximum difference in dipole moments is 1.9 ϫ10 Ϫ4 debye for HF molecule.

C. Hartree-Fock calculation for the benzene molecule
HF calculations were performed upon the benzene molecule. We took the same geometry, R C-C ϭ1.3862 Å and R C-H ϭ1.0756 Å, as used by Pahl and Handy. 25,31 The nuclear potentials were smoothed with ⑀ nuc ϭ10 Ϫ6 , and the box size was set to Lϭ45 bohrs. We used the seventh and ninth order multiwavelet bases with r(MO)р3ϫ10 Ϫ4 . The calculations were performed using both D 2h and C 1 symmetry. In the C 1 symmetry, we employed both the CMOs and also the LMOs to verify the accuracy and efficiency of the LMOs-based HF exchange method. In summary, three calculations were performed: with D 2h symmetry, without symmetry in CMOs, and without symmetry in LMOs. Table IV shows the total and HOMO energies, the timings for applying HF exchange operators and the localization to all of the MOs, and the number of coefficient blocks for the multiwavelet bases ͑their Frobenius norm Ͼ5ϫ10 Ϫk ) for each CMO/LMO. The parentheses in the timing columns mean the average times for each MO. The central processing unit ͑CPU͒ time was measured on a single 1.3 GHz Power4 processor on IBM p690 system.
The ninth order multiwavelet bases with r(MO)р3 ϫ10 Ϫ4 yielded consistent results for the three calculations. For the seventh order multiwavelet bases, the differences in the total energies are within 1ϫ10 Ϫ4 hartree. The deviation of the total energies with the seventh order multiwavelet bases from the ninth order multiwavelet bases is 1Ϫ2 ϫ10 Ϫ4 hartree. Pahl and Handy's result agrees with the mul-tiresolution calculation within the available digits. The augcc-pV5Z basis has an error of 5ϫ10 Ϫ4 hartree in the total energy.
The times for applying the HF exchange operator with the D 2h symmetry is around eight times faster than that for the CMO-based HF exchange operator with the C 1 symmetry. The eightfold speed up is an expected computational reduction by the spatial symmetry usage in our implementation. The same scale is also observed in reduction of the number of coefficient blocks between D 2h and C 1 calculations. By using LMOs, the timing are improved by factors of 1.9 and 2.6 for the seventh and ninth multiwavelet bases, respectively. The times for the localization were negligibly small, around 3% of total times for applying the HF exchange operator. The localization reduces the numbers of blocks to 76% and 49% for the seventh and ninth order multiwavelet bases, respectively.

D. Computational scaling of Hartree-Fock exchange for n †H 2 O ‡, nÄ1, 2, 3
We examined the computational scaling of the Hartree-Fock exchange calculation using the water monomer, dimer, and trimer, whose geometries are listed in Table V. Table VI shows the total and HOMO energies, the timings to compute HF exchange and localize MOs, the numbers of coefficient blocks ͑their Frobenius norm Ͼ5ϫ10 Ϫk ). The parentheses in the timing columns mean the averages for each MO. The table also includes the CPU times for computing Coulomb potential. The calculations were performed with the C 1 symmetry, ⑀ nuc ϭ10 Ϫ6 , and the box size Lϭ60 bohr. The seventh and ninth order multiwavelet bases were used with r(MO) р3ϫ10 Ϫ4 . The results were obtained in two approaches to compute the HF exchange using CMOs and LMOs, so that we can directly compare the computational scaling between the two approaches. Table VI includes the ratios of timings against the CMO-based HF exchange calculation for the monomer. The same calculations were carried out using NWCHEM with the default screening threshold.
As to the accuracy, the calculations with the CMO-and LMO-based HF exchange in the ninth order multiwavelet bases yield total energies consistent within 10 Ϫ6 hartree, and those in the seventh order multiwavelet bases agreed with those in ninth order multiwavelet bases at the first six digits.
Overall, the LMOs reduce the computational demanding for applying the HF exchange operator. Even for the monomer, the CPU time for LMOs was 10%-15% faster than that for CMOs. The timing ratios for the total operation for all MOs are estimated as 1.0ϫn 1.74 (kϭ7, CMOs͒, 0.79ϫn 1.53 (kϭ7, LMOs͒, 1.0ϫn 1.91 (kϭ9, CMOs͒, and 0.87ϫn 1.48 (kϭ9, LMOs͒. The best scaling for the HF exchange calculation for this system is obtained as 1.5 on the number of waters, which is successfully smaller than quadratic scaling. We note the Coulomb potential is computed in O(n) scaling.

VI. CONCLUSION
An accurate, efficient algorithm has been presented to compute the HF exchange operator using multiresolution analysis in multiwavelet bases. The present implementation of the HF exchange in MADNESS ͑Ref. 32͒ is a straightforward extension of the fast algorithm in multiwavelet bases to apply Coulomb operator ͑i.e., the Green's function to the Poisson equation͒ in three dimensions. We use the NS-form for application of the operator to the function, which allows us to do the application in a linear scaling fashion and efficiently within a guaranteed finite accuracy. Specifically for the HF exchange operator, we have implemented screening for the weak nonlocal interaction in the HF exchange to avoid O(n occ 2 ) scaling operation for each target MO or totally O(n occ 3 ) scaling for all the occupied MOs. The asymptotic scaling is formally reduced to a total operation cost to O(n occ ), or O(1) for each target localized MO.
We demonstrated the HF SCF calculations for the manyelectron closed-shell systems including a number of atoms, diatomic molecules, polyatomic molecules such as benzene molecule and water monomer, dimer and trimer. The performance was satisfactory in that the results yielded the required precision, and the computational cost was observed to scale as expected.
We used Foster-Boys's LMOs for the HF exchange calculations to enhance the sparsity. The illustrative calculations using benzene and water trimer demonstrated that the LMOs worked efficiently to reduce the computational cost and scal- ing in the C 1 symmetry without losing accuracy. In the water cluster, we observed a computational scaling for LMO-based HF exchange of O(n occ 0.5 ) for each target MO, and O(n occ 1.5 ) for all the occupied MOs.