Introducing DDEC6 atomic population analysis: part 1. Charge partitioning theory and methodology †

Net atomic charges (NACs) are widely used in all chemical sciences to concisely summarize key information about the partitioning of electrons among atoms in materials. The objective of this article is to develop an atomic population analysis method that is suitable to be used as a default method in quantum chemistry programs irrespective of the kind of basis sets employed. To address this challenge, we introduce a new atoms-in-materials method with the following nine properties: (1) exactly one electron distribution is assigned to each atom, (2) core electrons are assigned to the correct host atom, (3) NACs are formally independent of the basis set type because they are functionals of the total electron distribution, (4) the assigned atomic electron distributions give an e ﬃ ciently converging polyatomic multipole expansion, (5) the assigned NACs usually follow Pauling scale electronegativity trends, (6) NACs for a particular element have good transferability among di ﬀ erent conformations that are equivalently bonded, (7) the assigned NACs are chemically consistent with the assigned atomic spin moments, (8) the method has predictably rapid and robust convergence to a unique solution, and (9) the computational cost of charge partitioning scales linearly with increasing system size. We study numerous materials as examples: (a) a series of endohedral C 60 complexes, (b) high-pressure compressed sodium chloride crystals with unusual stoichiometries, (c) metal – organic frameworks, (d) large and small molecules, (e) organometallic complexes, (f) various solids, and (g) solid surfaces. Due to non-nuclear attractors, Bader's quantum chemical topology could not assign NACs for some of these materials. We show for the ﬁ rst time that the Iterative Hirshfeld and DDEC3 methods do not always converge to a unique solution independent of the initial guess, and this sometimes causes those methods to assign dramatically di ﬀ erent NACs on symmetry-equivalent atoms. By using a ﬁ xed number of charge partitioning steps with well-de ﬁ ned reference ion charges, the DDEC6 method avoids this problem by always converging to a unique solution. These characteristics make the DDEC6 method ideally suited for use as a default charge assignment method in quantum chemistry programs.


Introduction
Net atomic charges (NACs) are a ubiquitous concept in all chemical sciences. It is difficult to imagine chemistry being learned at either an introductory or advanced level without some reference to NACs. 1 For example, the pH scale measuring hydrogen ion activities in solution embodies the concept of hydrogen atoms carrying positive NACs. In biochemistry, many cellular functions depend on the transport of charged atoms such as K + , Na + , Ca 2+ , and Cl À across cell membranes. 2 Experiments measuring the water molecule's dipole moment imply a negative NAC on its oxygen atom and a positive NAC on each of its two hydrogen atoms. 3 NACs also play an important role in solid state physics, where oxygen atoms in solid oxides carry negative NACs to enable oxygen ion transport. 4 Zwitterions, which are widely encountered in amino acids, illustrate that important chemical behaviors depend not only on the overall molecular charge but also on the net charges of local regions within a molecule. 5 Our overall objective is to develop an atomic population analysis method with characteristics suitable for use as a default method in popular quantum chemistry programs. This requires the method to maximize broad applicability and minimize failure. The algorithm should converge rapidly and reliably to a unique solution with low dependence on the basis set choice. The assigned NACs, atomic spin moments (ASMs), and other atoms-in-materials (AIM) properties should be chemically consistent, accurately describe electron transfer directions, and correlate to experimental data for reference materials.
Existing atomic population analysis methods are not wellsuited for use as a default method in quantum chemistry programs: (1) Mulliken 6 and Davidson-Löwdin 7 population analyses do not have any mathematical limits as the basis set is systematically improved toward completeness, and they are not directly applicable to plane-wave basis sets. 8 (2) Bader's quantum chemical topology (QCT) has many theoretically desirable properties, but it can lead to non-nuclear attractors that produce undened NACs. 9 (3) Electrostatic potential tting methods (e.g., ESP, 10 Chelp, 11 Chelpg, 12 REPEAT 13 ) do not have good conformational transferability and assign unreasonable charge values to some buried atoms. [13][14][15][16][17] Including constraints (e.g., RESP 13,14 methods) improves this, but the form of these constraints is exible leading to numerous possible charge values. Simultaneously tting across multiple conformations is an another possible solution, but this requires computing electron distributions for many different system geometries. 17 Also, nonporous systems do not possess a surface outside which to t the electrostatic potential. (4) The original Hirshfeld (HD) 18 method, which is based on neutral reference atom densities, usually underestimates NAC magnitudes. [19][20][21][22] While the Iterative Hirshfeld (IH) method improves the NAC magnitudes, 21 it exhibits the bifurcation problem described in Section 2.4. (5) The Charge Model 5 (CM5) was parameterized to give NACs that approximately reproduce static molecular dipole moments. 20,23 The CM5 charges include an empirical correction to the HD NACs. 20,23 The main limitation of CM5 is that it only partitions the integrated number of electrons for each atom, and thus cannot be used to compute AIM properties such as atomic multipoles, bond orders, etc. (6) Because Atomic Polar Tensor (APT) 24 and Born effective charges 25 require computing system response properties (via perturbation theory or atomic displacements), they are not well-suited for use as a default atomic population analysis method. (7) The Natural Population Analysis (NPA), 8 Natural Bond Orbital (NBO), 26 Adaptive Natural Density Partitioning (ANDP), 27 Intrinsic Atomic Orbital, 28 Intrinsic Bond Orbital, 28 and several related approaches 29 provide chemically meaningful localized atomic and bonding orbitals. These approaches have a number of advantages, including the recovery of heuristic chemical bond forming concepts. In the near future, modications of some of these methods might emerge with sufficiently wide applicability and convergence robustness to be used as default methods in quantum chemistry programs. Recently, some of these methods have been tested on a limited number of periodic or semiperiodic materials, [29][30][31][32] but not yet on enough materials to draw denite conclusions about their suitability for being used as a default method for analyzing dense solids. Because these methods require generating localized orbitals, using planewaves or other delocalized basis functions poses a challenge for their application. 30 In this article we present a new atomic population analysis method, called DDEC6, that is a renement of the Density Derived Electrostatic and Chemical (DDEC) approach. This method is an explicit functional of the electron and spin distributions with no explicit basis set dependence. We developed the DDEC6 charge partitioning algorithm using a scien-tic engineering design approach that resembles the process used to build airplanes. Similar to constructing an AIM partitioning, there is more than one conceivable way to build an airplane. One could make an airplane longer or shorter, for example. Yet, it is not accurate to say airplane design (or AIM partitioning design) is an arbitrary process, because the scien-tic method is utilized. Like airplanes, our AIM partitioning method has been scientically engineered to meet chosen performance goals. This involved a process of constructing and testing prototypes to rene the design until all performance goals were achieved. When a new airplane is designed, prototypes are built and tested in wind tunnels to determine which shapes achieve appropriate li, minimize drag, and respond favorably to air turbulence. Not only should an airplane y, but it should take off and land smoothly, have good fuel efficiency, and so forth. All of these aspects are tested when developing a new airplane design. Our process for building an AIM partitioning method is similar. Specically, we built and tested prototypes to improve the control, efficiency, accuracy, and robustness. We made extensive comparisons to experimental data during this development process. 33 This scientic engineering design approach requires choosing performance goals. Table 1 lists nine desirable features we chose for assigning NACs. The rst criterion is to assign exactly one electron distribution per atom in the material. This criterion is fullled by many but not all charge assignment methods. For example, Bader's QCT yields nonatomic electron distributions in materials with non-nuclear attractors. 9,34,35 The second criterion is to assign core electrons to their host atom. This criterion is not appropriate for APT and Born effective charges that quantify the system's response to nuclear displacements. Methods that directly t the electrostatic potential without regard for atomic chemical states also do not satisfy this criterion. Since a goal of AIM methods is to describe atomic chemical states, they should preferably assign core electrons to the host atom. The third criterion is to assign NACs as functionals of {r(r)}. The main purposes of our NACs Table 1 Nine desirable features (performance goals) we have chosen for assigning NACs (1). Exactly one assigned electron distribution per atom (2). Core electrons remain assigned to the host atom (3). NACs are functionals of the total electron density distribution (4). Assigned atomic electron distributions give an efficiently converging polyatomic multipole expansion (5). NACs usually follow Pauling scale electronegativity trends (6). NACs for a particular element have good transferability among different conformations that are equivalently bonded (7). The assigned NACs are chemically consistent with the assigned ASMs (8). Predictably rapid and robust convergence to a unique solution (9). Computational cost of charge partitioning scales linearly with increasing system size are to convey information about charge transfer between atoms and to approximately reproduce the electrostatic potential surrounding a material. Since charge transfer between atoms cannot occur without effecting r(r) and r(r) determines the electrostatic eld surrounding the material, it makes sense to construct the NACs as functionals of {r(r)}. The fourth criterion is to assign atomic electron distributions to give an efficiently converging polyatomic multipole expansion. Polyatomic multipole expansions including multipolar and charge penetration terms of arbitrarily high order provide a formally exact representation of the electrostatic potential. [36][37][38][39][40][41] In practice, this expansion is normally truncated at some nite order; therefore, we wish to reproduce the electrostatic potential with good accuracy using the leading terms of the polyatomic multipole expansion. The h criterion is the assigned NACs should usually follow Pauling scale electronegativity trends. The Pauling scale electronegativity was parameterized to describe typical electron transfer directions in chemical bonds, where higher electronegativity elements typically take electrons from lower electronegativity elements. 42,43 The sixth criterion is that NACs for a particular element have good transferability among different conformations that are equivalently bonded. We choose this criterion, because one of our goals is to assign NACs with good conformational transferability that are well-suited to construct exible force-elds for classical atomistic simulations of materials. The seventh criterion is that the assigned NACs should be chemically consistent with the assigned ASMs. We will have more to say about this seventh criterion in Section 5.4. The eighth criterion is that the AIM distributions should have predictably rapid and robust convergence to a unique solution. The ninth criterion is that the computation cost of charge partitioning should ideally scale linearly with increasing system size. This criterion is desirable to have the method's computational cost remain competitive as the number of atoms in the unit cell increases.
As a point of clarication, we intend that these criteria should be satised across a broad range of materials encompassing molecules, ions, nanostructures, solid surfaces, porous solids, nonporous solids, and other complex materials. Notably, developing a reliable method for charge partitioning in dense periodic solids is not simply the task of adding periodic boundary conditions to a charge partitioning method initially developed for small molecules. Small molecules are comprised mainly of surface atoms with few buried atoms. In contrast, dense solids are comprised mainly of buried atoms with few surface atoms. Therefore, charge assignment methods that work well for surface atoms but poorly for buried atoms are problematic for bulk solids. Currently, the most commonly used charge partitioning method for dense solids is Bader's QCT. 44 Because two charge partitioning methods that give practically equivalent results for molecules with lots of surface atoms sometimes produce spectacularly different results when applied to dense solids, 19 correlations between NAC methods for molecular test sets should not be extrapolated to dense solids. In summary, charge partitioning in dense solids is an intrinsically more difficult problem than charge partitioning in small molecules.
The remainder of this article is organized as following. Section 2 presents an overview of charge partitioning operators, atoms-in-materials methods, the DDEC approach, and the bifurcation or 'runaway charges' problem. Section 3 presents the theory and equations dening the DDEC6 method. Section 4 summarizes computational details for the quantum chemistry calculations, electrostatic potential expansion, and Ewald summation. Section 5 contains the performance test results: 5.1 Convergence speed, 5.2 Atomic dipole magnitudes, 5.3 Conformational transferability and accuracy for reproducing the electrostatic potential, 5.4 Quantifying the consistency between assigned NACs and ASMs, 5.5 Retaining core electrons on the host atom and assigning exactly one electron distribution per atom, and 5.6 NACs usually follow electronegativity trends. Section 6 contains our conclusions.

Charge partitioning operators
Chemical systems are comprised of atomic nuclei surrounded by an electron cloud. This electron cloud can be computed using quantum chemistry calculations. Throughout this article we use the Born-Oppenheimer approximation, in which the electron cloud is assumed to equilibrate rapidly with respect to the nuclear motions. The NAC for atom A (q A ) equals its nuclear charge (z A ) minus the number of electrons assigned to it (N A ): Herein we use the same notation as previously, except we use (L 1 , L 2 , L 3 ) instead of (k 1 , k 2 , k 3 ) to specify a translated image of atom A: "Following Manz and Sholl, 45 we begin by dening a material as a set of atoms {A} located at positions {R A }, in a reference unit cell, U. For a nonperiodic system (e.g., a molecule), U is any parallelpiped enclosing the entire electron distribution. The reference unit cell has L 1 ¼ L 2 ¼ L 3 ¼ 0, and summation over A means summation over all atoms in this unit cell. For a periodic direction, L i ranges over all integers with the associated lattice vectorṽ i . For a nonperiodic direction, L i ¼ 0 andṽ i is the corresponding edge of U. Using this notation, the vector and distance relative to atom A are given bỹ and r A ¼ |r A |." 19 In this article, we are only interested in studying time-independent states of chemical systems. For such systems, a time-independent electron distribution can be theoretically computed or experimentally measured 46,47 where J el is the system's multi-electronic wavefunction within the Born-Oppenheimer approximation and r^(r) is the electron density operator. Before considering specically how to partition the electron density operator among atoms in a material at each spatial position, we rst consider various schemes that partition only the integrated number of electrons. Electrostatic potential tting (ESP, 10 Chelp, 11 Chelpg, 12 REPEAT 13 ) methods optimize NACs by minimizing the root-mean-squared-error (RMSE) over a chosen set of grid points located outside the material's van der Waals surface. The APT charge quanties the change in dipole moment due to the displacement of a nucleus. 24 APT and related dipole-change-derived NACs are useful for representing infrared (IR) spectra intensities. 48 In periodic materials, Born effective and related charge methods quantify the change in electric polarization due to the displacement of a nucleus and its periodic images. 25 A key limitation of electrostatic potential tting, APT, and Born effective charges is that they are not designed to retain core electrons. For example, the Born effective charge of Ti in the cubic phases of BaTiO 3 , CaTiO 3 , SrTiO 3 , and PbTiO 3 ranges from 6.7 to 7.2, which exceeds the nominal number of 4 valence electrons for a free Ti atom. 49 CM5, which uses HD charges as input, was parameterized to give NACs that approximately reproduce static molecular dipole moments. 20 The CM5 NACs do a much better (but not perfect) job of retaining core electrons on the host atom. 20,23 The Voronoi deformation density (VDD) method assigns NACs according to the integral of the deformation density (i.e., the difference between r(r) and a sum of spherically symmetric neutral reference atoms) over the Voronoi cell enclosing each atom. 50 AIM methods partition the electron distribution at each spatial position subject to the constraints denoting summation over all atoms in the material, and r A (r A ) is the electron distribution assigned to atom A. Because r(r) $ 0, constraint (4) allows to be interpreted as the probability of assigning an electron at positionr to atom A. AIM methods include HD, 18 61 Bader's quantum chemical topology (QCT) partitions the electron cloud into non-overlapping compartments that satisfy the Virial theorem because they have zero-ux surfaces: Vr$dn ¼ 0 where dn is the differential surface normal unit vector. [62][63][64] Because non-atomic Bader compartments exist in materials with non-nuclear attractors, 9 Bader's QCT is not strictly a partition into atomic electron distributions. However, Bader's QCT has historically been categorized with AIM methods, because it pioneered the theoretical development of the AIM concept. [62][63][64][65] For the study of electrides, a non-nuclear attractor describing the electron ion would normally be considered an advantage. 34

Fundamentals of vectorized charge partitioning
We use the term vectorized charge partitioning to denote the class of AIM charge partitioning methods for which the relative probability of assigning electrons at positionr A to atom A can be represented in terms of some spherically symmetric atomic weighting factor, w A (r A ): where We call this vectorized charge partitioning, because for each atom w A (r A ) forms a one-dimensional array of w A values corresponding to a series of r A values. The whole quest to dene the charge partitioning method thus reduces the problem of nding a three-dimensional array of f A (r A ) values for each atom to that of nding a one-dimensional array of w A (r A ) values for each atom. This reduction in parameter space from three to one degrees of freedom per atom makes vectorized charge partitioning computationally efficient, because one-dimensional rather than three-dimensional arrays need to be computed and stored for each atom.
A key use of NACs is to construct point-charge models to regenerate the electrostatic potential in classical molecular dynamics and Monte Carlo simulations. 66 From Gauss's Law of Electrostatics it directly follows that the electrostatic potential exerted outside a spherically symmetric charge distribution is identical to an equivalent point charge placed at the sphere's center. Hence, it is wise to assign approximately spherically symmetric atomic electron distributions so that a point-charge model comprised of the NACs will approximately reproduce the electrostatic potential surrounding the material. This can be accomplished by making for then eqn (7) simplies to The differential path action dS allows us to study convergence properties of vectorized charge partitioning methods: 19 Stationary points of the path action for every atom, which yields eqn (7) as the only solution(s). 19 Due to its path dependence, S is not a functional of {r A (r A )}. The path action S is a kind of mapping that takes a path in {r A (r A )} optimization space as its input and returns a real number as its result. In practice, only the differential path action dS needs to be considered. A second key purpose of NACs is to represent the chemical states of atoms in materials. This requires the assigned {r A (r A )} to have atomic-like properties. The spherically averaged electron distributions of isolated atoms decay approximately exponentially with increasing r A $ 2Å: To maximize the transferability of atomic chemical properties between isolated atoms, molecules, porous solids, solid surfaces, non-porous solids, and nano-structures, the atomic electron distributions in materials should be assigned to approximately follow eqn (15).
Another key consideration is the number of electrons assigned to each atom should resemble the number of electrons contained in the volume of space dominated by that atom. The volume of space dominated by atom A is dened as the spatial region for which r A (r A ) > r B (r B ) for every B s A. If the w A (r A ) for anions are too diffuse in ionic crystals, this might cause too many electrons in the volume of space dominated by the cations to be mistakenly assigned to the anions. As shown in Section 5.5, this can lead to situations where the total number of electrons assigned to the cations is even lower than their number of core electrons. To avoid this mistake, some care should be given to quantify how many electrons are in the volume of space dominated by each atom. The number of electrons assigned to each atom should then be optimized to resemble this value, subject to additional optimization criteria.
To maximize chemical transferability, it is desirable to have each atom in a material resemble a reference ion of the same element having similar (but not necessarily identical) net charge. For example, a Na 1+ ion in a material should resemble a Na reference ion having a charge of approximately +1. Therefore, we use reference ions to construct part of the atomic weighting factors, {w A (r A )}.
The HD method uses neutral atoms as the reference states: 18 The extremely poor performance of the HD method can be explained by the fact that in partially or totally ionic systems r(r) does not approximately equal the sum of neutral atom densities: Thus, eqn (9)- (11) are not consistently satised by the HD method. Consequently, the HD NACs usually give a poor representation of the electrostatic potential surrounding a material. The IH method improves upon the HD method by using self-consistently charged reference states: 21 While the IH method offers a clear improvement over the HD method, the performance of the IH method is still not optimal. Specically, the IH method does not accurately account for the relative contraction or expansion of each ionic state due to its local environment. For example, an atomic anion in an ionic crystal is usually more contracted than the corresponding isolated atomic anion, because the cations in the ionic crystal provide charge balance and electrostatic screening that reduces electrostatic repulsion between excess electrons in the bound atomic anion. While it is possible to use charge-compensated reference ions in the IH method, 53,67 the overall accuracy of constructing W(r) z r(r) is still limited in the IH method by using a single set of reference ions that do not respond to their local environment. This problem is overcome in the DDEC3 and DDEC6 methods by conditioning the reference ion densities to match the specic material. This conditioning describes the contraction or expansion of reference ions in response to their local environment while still only requiring a single reference ion library as input. 19 Eqn (10) will be fullled across the widest variety of systems if {w A (r A )} are themselves derived from partitions of r(r). The ISA method is an early example of a charge partitioning scheme in which {w A (r A )} are derived from a partition of r(r). 55,56 In the ISA method, Although the ISA method clearly fullls eqn (9)-(11), the ISA NACs have poor conformational transferability and are chemically inaccurate for many materials (especially, non-porous materials). 19,45,[68][69][70] In Section 3 below, we construct a new charge partitioning scheme, called DDEC6, that combines electron localization, reference ion weighting, and spherical averaging to create an accurate and robust charge partitioning method.
resemble reference ion states maximizes the chemical transferability of the assigned {r A (r A )} and NACs, while making {w A (r A )} resemble {r avg A (r A )} causes the NACs to approximately reproduce the electrostatic potential surrounding the material. 19,45 Three variants of the DDEC method have been previously published. 19,45 In the DDEC/c1 and DDEC/c2 methods, the atomic weighting factors are dened by with c DDEC/c1 ¼ c DDEC/c2 ¼ 1/10. 45 During the iterative updates, the reference ion charges are updated to match the AIM charges, as done for the IH method. The reference ions, are computed using charge compensation and dielectric screening. 45 In the DDEC/c1 method, charge compensation and dielectric screening were modeled by computing the reference ion densities for atoms placed in a periodic array with a uniform compensating background charge. 45 In the DDEC/c2 method, charge compensation and dielectric screening were modeled by computing the reference ion densities for atoms enclosed by a spherical charge compensation shell. 45 For anions, the shell radius and compensating charge are carefully selected to minimize the system's total energy. 45 For cations of charge +q, the compensating charge is Àq and the shell radius is the average radius of the outermost q occupied Kohn-Sham orbitals of the isolated neutral atom. 45 We recently reported a complete set of these charge-compensated reference ions for all elements atomic number 1 to 109. 71 In the DDEC3 method, these reference ion densities are smoothed to ensure the following. 19 (1) Each smoothed reference ion density, r ref , decreases monotonically with increasing r A . 19 (2) For a particular r A value, ) decreases monotonically with increasing r A . 19 The DDEC6 method introduced in this article uses the same smoothed reference ion library as the DDEC3 method.
In this article, the term charge partitioning functional means a functional F whose stationary points yield the corresponding atomic charge distributions. A point is a stationary point if and only if the full derivative of F is zero: dF ¼ 0. This requires all of the rst-order partial and variational derivatives of F with respect to the independent variables and functions, respectively, to be zero. Nalewajski and Parr showed the HD method minimizes the charge partitioning functional where G(r) is a Lagrange multiplier enforcing constraint (4). 59 Later, Bultinck et al. showed the ISA method minimizes a similar charge partitioning functional where r avg A (r A ) appears instead of r ref A functional or path action is convex if and only if its curvature is non-negative. Smooth convex functionals and path actions have a unique minimum. Both the ISA and non-iterative Hirshfeld charge partitioning functionals are convex and possess a single minimum leading to unique solution. 70 The DDEC/c1,c2 methods minimize the partial derivative of  (20). Therefore, the object to be minimized for the DDEC/c1,c2 methods is not the functional shown in eqn (23), but rather Manz's path action S described in the previous section. This can be a potential source of confusion, because the rst paper on the DDEC/c1,c2 method predated the introduction of the path action S and relied on partial (not proper) minimization of eqn (23). 45 With the introduction of the path action S, this earlier approach that was not completely variational should be abandoned.
The same problem has also occurred in early literature on the IH method that predated Manz's path action S. Specically, setting c ¼ 1 in eqn (23) does emphatically not yield a charge partitioning functional for the IH method, because it fails to properly impose the constraint {q ref (23) or through the addition of Lagrange multipliers to eqn (23) does not yield the IH weighting factors, but rather yields a new AIM method whose performance was not as good as the IH method. 54 The object to be minimized for the IH method is the path action S. Interestingly, a proposed proof 72 that the IH method always converges to a unique minimum (independent of the starting conditions) is invalid, because it was based on partial (not proper) minimization of eqn (23) rather than using the correct object to be minimized. In the following section, we present specic examples of materials for which the converged IH NACs depend on the initial guess, thus proving the IH optimization landscape is sometimes non-convex.
The DDEC3 method was introduced to improve the accuracy of charge partitioning in dense solids containing short bond lengths. 19 For these materials, the DDEC/c1,c2 methods yielded extremely poor results. 19 The DDEC3 method improved the accuracy by introducing reference ion conditioning (see Section 3.1) and a constraint forcing the w A (r A ) tails of buried atoms to decay at least as fast as exp(À1.75 r A ). 19 The DDEC3 method starts with the same reference ion library as the DDEC/c2 method. DDEC3 smooths these reference ions to improve the optimization landscape curvature. 19 Finally, the reference ion weighting, c DDEC3 ¼ 3/14, was set to a theoretically derived value that achieves a balance for all materials. 19 2.4 The bifurcation or 'runaway charges' problem The 'runaway charges' problem was rst noted for the DDEC/c2 method in the paper by Manz and Sholl introducing the DDEC3 method. 19 The DDEC3 method was introduced to correct this problem for dense materials by introducing reference ion conditioning and constraints preventing the tails of buried atoms from becoming too diffuse. 19 The DDEC3 method dramatically improved over DDEC/c2, but stopped short of a provably convex optimization functional. 19 Among the hundreds of materials studied to date, we have only noticed one system for which the 'runaway charges' problem still occurs for the DDEC3 method: the H 2 triplet state for a constrained bond length of 50 pm. The discovery of one system with this problem indicates other systems with this same problem probably exist. The two H atoms in H 2 are symmetrically equivalent in the CISD/aug-cc-pvqz wavefunction and electron distribution. As shown in Fig. 1, during DDEC3 partitioning the NACs diverged from the initial values (HD charges) of +3.3 Â 10 À4 and À3.3 Â 10 À4 to nal converged values of +0.50 and À0.50 aer 37 iterations. This indicates the optimization landscape contains a bifurcation instability that leads to symmetry breaking. Only tiny initial differences in the input density grid les determine which of the two H atoms will head towards a NAC of +0.50 while the remaining one heads towards À0.50. The same type of bifurcation instability also occurs for the IH method in some dense solids. As shown in Fig. 1, during IH partitioning the NACs of symmetry-equivalent atoms bifurcate from the initial values (HD charges) of +4.8 Â 10 À4 and À4.8 Â 10 À4 to +0.97 and À0.97 for the Mn crystal and from +4.3 Â 10 À3 , À5.4 Â 10 À3 , À3.5 Â 10 À3 , and +4.6 Â 10 À3 to À0.36, À0.39, À0.38, and +1.1 for the Rh crystal before the VASP program gave up aer 150 charge cycles. (This IH analysis was performed in VASP 5.3.5 using the PBE 73 functional.) Again, tiny differences in the initial conditions determines which of the symmetry-equivalent atoms will head towards the large positive NACs and which will head towards the negative NACs.
To understand why previous DDEC and IH methods sometimes yield bifurcation, we now compute their optimization landscape curvature. The path action's curvature is given by 19 "If the curvature is positive everywhere, i.e., d 2 S > 0, then the action has only one stationary point, and this stationary point is its global minimum." 19 The solution lies within the search space satisfying constraint (4).
Let S be the path action S restricted to paths lying inside this valid search space. Combining eqn (24) and (25) yields the optimization landscape curvature: Substituting w model ) c (r avg A (r A )) 1Àc as a DDEC-style weighting factor yields the curvature where Bultinck et al. previously proved the ISA curvature (eqn (28)) is non-negative. 70 We now consider several special cases. Case 1: The ISA method, which corresponds to c ¼ 0. This case gives d 2 S ISA $ 0 (i.e., positive semi-denite curvature) indicating a convex optimization landscape with a unique solution. However, the optimization landscape can be approximately at (d 2 S ISA / 0) in regions. Case 2: The non-iterative Hirshfeld method, which corresponds to c ¼ 1 and dq ref for any |dr A (r A )| > 0. This positive denite curvature indicates a convex optimization landscape with a unique solution.
Because the curvature is positive denite, the optimization landscape is not approximately at anywhere. Case 3: The IH method, which corresponds to c ¼ 1 and  (29), the convexness or non-convexness of the IH method for a specic material depends on the particulars of the reference ion set used. Thus, under some conditions the IH method may have a negative optimization curvature. This yields the bifurcation or 'runaway charges' problem shown in Fig. 1 and Table 2. Case 4: The DDEC/c2 method, which corresponds to c ¼ 1/10 and dq ref A as dened in eqn (31). This case has similar characteristics to Case 3 and yields the bifurcation or 'runaway charges' problem for the same reason. Case 5: The DDEC3 method improved over the DDEC/c2 method by increasing the d 2 S ref curvature term in eqn (29), 19 but it stopped short of a provably convex optimization landscape. Thus, under rare circumstances (e.g., H 2 triplet with 50 pm constrained bond length in Fig. 1) it leads to the bifurcation or 'runaway charges' problem.
The DDEC6 method cannot lead to 'runaway charges' in any material, because it involves a prescribed sequence of exactly seven charge partitioning steps. For purposes of illustration, assume the initial symmetry breaking present in the input grid les is some small positive amount 3. For purposes of illustration, let us further assume that during each subsequent charge partitioning step the amount of symmetry breaking is multiplied by some nite factor K. Aer X charge partitioning steps, the amount of symmetry breaking will thus be K X 3. If X is small or if |K| # 1, the amount of symmetry breaking will be a modest multiple of 3. In the DDEC6 method, X ¼ 7, so that even when |K| > 1 the magnitude of symmetry breaking can be contained as long as 3 is small. By making the input density grids more precise, the value of 3 and hence the nal DDEC6 symmetry breaking ($K 7 3) can be made as small as desired. On the other hand, if X is large and |K| > 1, the symmetry breaking will be profoundly severe (i.e., 'runaway charges'). For the DDEC/c2, DDEC3, and IH methods, X may be arbitrarily large leading to 'runaway charges' when |K| > 1. Since X is arbitrarily large for these methods, the value of K X 3 cannot be contained for |K| > 1 even if 3 is made a smaller positive number. Thus, improving the input density grid precision does not necessarily alleviate the 'runaway charges' problem for the DDEC/c2, DDEC3, and IH methods.
We also tested a second strategy that solves the bifurcation or 'runaway charges' problem by constructing the selfconsistent convex charge partitioning functional: k A is a Lagrange multiplier that enforces the constraint {r xed_ref A (r A )} is a set of xed reference densities that is not updated during the self-consistent cycles. gives the solution The curvature is Inserting eqn (28) into (36) yields which is positive denite if c > 0. Thus, the functional is convex with a unique minimum. Through computational tests, we found nearly optimal results are obtained by setting c convex ¼ 1/2. We computed {r xed_ref A (r A )} by applying the upper and lower bound exponential decay constraints while minimizing a distance measure between r xed_ref A (r A ) and r cond A (r A )hr(r)/ r cond (r)i r A . (See Section S3 of the ESI † for computational details of the Convex functional.) As shown by results in Section 3.1, the overall performance of this method is slightly inferior to the DDEC6 method.
We performed computational tests proving the bifurcation or 'runaway charges' problem is associated with non-unique minima on the optimization landscape. Table 2 summarizes computational results for the H 2 triplet system. For consistency, all four methods compared used the same density grids, same reference ion library, 71 and same integration method. (This integration method is explained in Section S2 of the ESI. †) The pair of numbers (q 1 ,q 2 ) denote the charges associated with the rst and second H atoms, respectively. The initial guess for the IH and DDEC3 methods is the starting reference ion charge. For the DDEC3 method, the conditioned reference densities and the w A (r A ) were initialized to equal the starting reference ion densities. For the Convex functional (eqn (32)), r avg A (r A ) was initialized to equal a reference ion density having the charge listed as the initial guess. The DDEC3 and IH methods exhibited pronounced bifurcation, with the converged NACs highly dependent on the initial guess. In contrast, the Convex functional exhibited the unique solution (À0.0093,0.0093) independent of the initial guess. Because the DDEC6 NACs follow a xed protocol of seven charge partitioning steps, they do not require any form of initial guess and converged to the unique solution (À0.0209,0.0209). These results prove the DDEC3 and IH methods do not have a convex optimization functional for some materials.

The DDEC6 method
DDEC6 partitioning is always performed using an all-electron density. For quantum chemistry calculations performed using effective core potentials (also known as pseudopotentials), the core electrons replaced by the effective core potentials are added back in prior to DDEC6 partitioning.
Also, throughout all stages of DDEC6 partitioning, zero tolerances are used to avoid division by zero errors. Specically, if-then loops were included to smoothly handle attempted divisions by numbers with magnitudes smaller than a chosen zero tolerance (e.g., 10 À10 ). Here, we present the DDEC6 steps and equations in a way that is independent of the integration grids employed. Specic integration procedures are described in Section S2 of the ESI. †

Conditioning steps and the equivalent reference ion weighting c equiv
The previous section demonstrated that using a xed number of charge partitioning steps (e.g., X ¼ 7) can alleviate the 'runaway charges' problem. In this section, we show how to construct schemes having a xed number of charge partitioning steps that achieve the DDEC goal of simultaneously optimizing where hi r A denotes spherical averaging. We could perform another conditioning step by reinserting r generic (r A ) into the right-hand sides of eqn (38) and (39). This process could be repeated until some desired number, c, of conditioning steps have been performed. Starting with the smoothed reference ions, r ref A second conditioning step produces Aer c conditioning steps, where hrðrÞ=r generic ðrÞi rA is a geometric average of all of the hr(r)/ r generic (r)i r A style terms. All of the previous DDEC schemes had Inserting eqn (45) into (7), taking the spherical average of both sides, and rearranging yields In general, {r some_ref A (r A )} may be produced by conditioning the smoothed reference ions r ref where the overbar denotes the (weighted) geometric average of hr(r)/r generic (r)i r A style terms (such as those appearing in eqn (40), (42) and (46)). Comparing eqn (46) and (47), the equivalent amount of reference ion weighting on a conditioning adjusted basis is therefore We now discuss specic examples. Case 1: DDEC/c1 and DDEC/c2 methods used no reference ion conditioning (i.e., c ¼ 0) with c ¼ 1/10 to give c DDEC2 equiv ¼ 1/10. 45 Case 2: The HD and IH methods use no reference ion conditioning (i.e., The ISA method has two completely equivalent representations: either an innite number of conditioning steps (i.e., c ¼ N) } are obtained in either representation. In practice, the two representations are not distinguishable and have the same computational algorithm.) When developing the DDEC3 method, Manz and Sholl showed that one conditioning step (i.e., c ¼ 1) combined with c ¼ 3/14 yields an appropriate balance between reference ion weighting and spherical averaging independent of the material. 19 A scheme containing a xed number of charge partitioning steps can be constructed to yield a similar c equiv value.
are conditioned four times to yield {w DDEC6 A (r A )} (i.e., c ¼ 4 and c ¼ 1) and hence ve times to yield the nal DDEC6 {r avg A (r A )}, this corresponds to c DDEC6 equiv ¼ 1/5. c DDEC3 equiv ¼ 3/17 lies between this value of 1/5 and the value of 1/6 that would be obtained from a total of six conditioning steps. We chose ve conditioning steps, because this is more conservative. Increasing the number of conditioning steps leads to a slight decrease in the atomic multipoles at the expense of losing some of the chemical transferability. Yet even with ve conditioning steps, we achieved DDEC6 atomic multipoles approximately 2-5% lower in magnitude on average (see Section 5.2) than the DDEC3 atomic multipoles. As explained in subsequent sections, this is due to using an accurate xed reference ion charge and a weighted spherical average during some of the conditioning steps.
We now return to the bifurcation or 'runaway charges' problem demonstrated in the previous section. The rst possible solution is to use a xed number of charge partitioning steps. In other words, to use c ¼ 4 and c ¼ 1 for a total of ve conditioning steps: c DDEC6 equiv ¼ 1/5. The second possible solution is to use 0 < c < 1 where {r A (r A )} are recovered by minimizing a provably convex optimization functional or path action. For example, F convex shown in eqn (32). Two conditioning steps (i.e., c ¼ 2) are involved in computing {r xed_ref A (r A )}. Using c convex ¼ 1/2 yields c convex equiv ¼ 1/4. We programmed both of these strategies and tested them for all systems described in this article. (Our computational method for computing the Convex functional NACs is described in Section S3 of the ESI. †) Both strategies used identical {q ref A } values. Both strategies alleviated the bifurcation or 'runaway charges' problem. While both approaches yielded reasonable results, the rst approach proved superior to the second. The rst approach allowed using a weighted spherical average in place of a simple spherical average during some of the conditioning steps. The second approach required a simple spherical average during the selfconsistency iterations to prove convexness of the functional (see eqn (36)). The rst approach was more computationally efficient and converged in 7 charge cycles compared to 11-14 charge cycles for the second approach. As shown in Tables 3 and  4, the rst approach yielded slightly lower atomic multipole moments on average, reproduced the electrostatic potential slightly more accurately on average, and described electron transfer trends in dense solids slightly more accurately on average. Therefore, in the end we decided to go with the xed number of charge partitioning steps (approach 1: DDEC6) as opposed to minimizing F convex (approach 2). Table 3 summarizes computational tests for six different DDEC algorithms. The column labeled 'Simple spherical average' uses an algorithm identical to the DDEC6 method, except r avg A (r A ) is used in place of r wavg A (r A ). The performance of this algorithm is substantially worse than DDEC6 but still an improvement over DDEC3. The column labeled 'Convex functional' uses the charge partitioning functional of eqn (32). The Convex functional also did not perform as well on average as DDEC6. The columns labeled '4 charge partitioning steps' and '10 charge partitioning steps' use an algorithm identical to the DDEC6 method but stop aer 4 and 10 charge partitioning steps, respectively, instead of the 7 charge partitioning steps used in the DDEC6 method. The overall performance of the '4 charge partitioning steps' and '10 charge partitioning steps' algorithms were similar to that of the DDEC6 method. Specically, results for about half of the materials are improved upon going from 7 to 4 charge partitioning steps, while results for the other half of the materials are improved upon going from 7 to 10 charge partitioning steps. We chose 7 charge partitioning steps for the DDEC6 method, because it offers a suitable compromise.
The overall performance of each method was scored based on twelve computational tests: 1. Squared correlation coefficient (R 2 ) between computed NACs and the X-ray photoelectron spectroscopy (XPS) coreelectron binding energy shis for the Ti-containing solids Ti, TiB 2 , TiO, TiN, BaTiO 3 74)/aug-cc-pvtz optimized geometry and electron distribution). An error closer to zero is better.
12. The time in minutes required to perform charge analysis on the NaCl crystal supercell containing 54 atoms running on a single processor core in Intel Xeon E5-2680v3 on the Comet supercomputing cluster at SDSC. This is the total wall time from program start to program nish, including the input le reading, core electron partitioning, valence electron partitioning, computation of multipole moments, output le printing, etc.
The target values for tests 4, 5, and 9 (dense solids) are charges that approximate the number of electrons in the volume dominated by each atom and were chosen to approximate Bader charges. 33 The target values for tests 6, 7, and 8 (molecules) are charges that will more closely reproduce the electrostatic potential and were chosen to approximate ESP charges.
NACs are oen used to construct force-elds for classical molecular dynamics and Monte Carlo simulations. For these applications, the NACs should approximately reproduce the molecular electrostatic potential (MEP). Conformational transferability is important for the construction of exible force-elds. Table 4 compares the accuracy of the DDEC6 and Convex functional methods for reproducing the electrostatic potential across various conformations of carboxylic acids, Li 2 O  molecule, and Zn-nicotinate metal-organic framework. Results listed under the 'carboxylic acids' column in Table 4 are the averages for the previously published conformations of ve carboxylic acids. 19,33 The Li 2 O conformations span angles from 90 to 180 in 10 increments. 33 The Zn-nicotinate conformations are described in Section 5.3. When NACs were optimized specically for each molecular conformation, V(r) was described most accurately using the Convex functional for the carboxylic acids and most accurately by the DDEC6 method for the Li 2 O molecule and Zn-nicotinate MOF. When the conformation averaged and low-energy conformation NACs were used to construct point-charge models for every system conformation, the DDEC6 method reproduced V(r) most accurately for all three types of materials. This shows the DDEC6 NACs are more accurate than the Convex functional NACs for reproducing V(r) across multiple system conformations. In summary, the DDEC6 NACs have better overall performance than the Convex functional NACs. Finally, we performed additional computational tests proving the equivalence relations described in this section. Specically, we also analyzed nearly all systems in this article using c ¼ 1 and c ¼ 1/3 (yielding c equiv ¼ 1/4) and obtained similar (but not strictly identical) results to the c ¼ 2 and c ¼ 1/2 case discussed above. The computational cost was increased to $24 charge cycles for convergence.

Determining the DDEC6 reference ion charge value (charge cycles 1 and 2)
The reference ion charge is the most signicant difference between the DDEC6 and DDEC3 methods. In the DDEC3 method, the reference ion charge is the same as the AIM NAC: some theoretical appeal, it also comes with an important disadvantage. In dense materials, the diffuse nature of anions can cause the number of electrons assigned to an anion to be much greater than the number of electrons in the volume dominated by the anion. This causes several related problems. First, NACs assigned with q ref A ¼ q A may fail to assign core electrons to the proper atom in some materials (see Section 5.5). Second, they do not properly describe electron transfer trends in some materials. 33 Third, the correlation between NACs and core electron binding energy shis will be weakened due to the overly delocalized assignment of electrons to the anions (see Ti-containing compounds in Table 3). Fourth, the accuracy of reproducing the electrostatic potential may be slightly degraded in some materials for which the anion charges are overestimated in magnitude. 33 There are two competing philosophies of electrons belonging to an atom: localized and stockholder atomic charges. For localized atomic charges, electrons in the volume dominated by an atom are assigned almost entirely to that atom. Non-overlapping compartments, such as those encountered in Bader's QCT and Voronoi cell partitioning, are the most extreme limit of localized charge partitioning. 50,57,58,62,63 Localized NACs (e.g., Bader or Voronoi cell NACs) convey useful information about charge transfer trends and core-electron binding energy shis, but they are not well-suited to reproducing the electrostatic potential surrounding a material. 19,33,45,75,76 On the other hand, stockholder-type charge partitioning schemes assign overlapping atomic electron distributions. 18 DDEC3 NACs, which do not incorporate localized atomic charge information, are usually more accurate than Bader NACs at reproducing V(r) but suffer the problems mentioned in the previous paragraph as shown in Section 5.5 and ref. 33.
To achieve the best of both worlds, the DDEC6 method uses a xed reference ion charge consisting of a weighted average of localized and stockholder charges. Specically, the DDEC6 reference ion charge value, q ref A , is set using the following scheme: is computed using the neutral atom reference densities (eqn (55)). N 2,Loc A is computed using charged reference ions (eqn (57)). The fourth power appearing in eqn (55) and (57) makes w s,Loc A (r A ) {s ¼ 1,2} change continuously while also becoming negligible in the volume of space for which r ref . When using an exponent of 4, the ratio r ref ¼ 2 corresponds to assigning 16 times higher density to atom A compared to atom B at this grid point. This provides an appropriate balance between transition sharpness and smoothness. Using an exponent less (more) than 4 would cause the density transition between atoms to be more gradual (sharper). An arbitrarily high exponent would correspond to the limit of non-overlapping atoms. Eqn (50) updates the reference ion charges by adding a fraction ' of this localized charge to a fraction (1 À ') of the stockholder charge based on minimizing the information distance to the (prior) reference ion densities. The nal reference ion charge (eqn (49)) determined in this manner is a compromise between counting electrons in the volume dominated by each atom and counting electrons in proportion to the (prior) reference ion densities.
The superscript numerals 1 and 2 refer to the charge cycle in which that quantity is computed. The atomic weighing factors for the rst charge cycle are computed using neutral reference atoms (eqn (54) and (55)). The atomic weighting factors for the second charge cycle are computed using eqn (56) and (57) based on the results of the rst charge cycle. Each of the rst two charge cycles rst computes W i,Stock (r) and W i,Loc (r) by a loop over atoms and grid points to compute the sum in eqn (53). Each of the rst two charge cycles then performs another loop over atoms and grid points to compute N i,Stock A and N i,Loc A via eqn (52). The stockholder, localized, and updated reference charges are then computed via eqn (50) and (51). The rst charge cycle yields the HD NACs: q HD (Even though the CM5 NACs are not used in DDEC6 charge partitioning, they were computed at this stage using the HD NACs and the CM5 denition of Marenich et al. 20 ) The second charge cycle yields the DDEC6 reference ion charges (eqn (49)).
The fraction ' of localized atomic charge (q s,Loc A ) used to compute these xed reference ion charges was decided through a scientic engineering design approach in which we tested dozens of alternatives. (See Section S1 of the ESI † for a summary of additional alternatives explored.) Table 5 compares results for ' ¼ 0 to 1 in 1/6 increments. The accuracy was quantied using ve tests: 1. Squared correlation coefficient (R 2 ) between computed NACs and the X-ray photoelectron spectroscopy (XPS) coreelectron binding energy shis for the Ti-containing solids Ti, TiB 2 , TiO, TiN, BaTiO 3 , TiCl 4 , PbTiO 3 , CaTiO 3 , SrTiO 3 , and TiO 2 . 33 R 2 closer to one indicates a stronger correlation.
2. The NAC of the Co atom in solid CoO 2 minus that in solid LiCoO 2 . Simple chemical arguments and electron density difference maps indicate this results should be signicantly positive. 23 3. The NAC of Mg in crystalline MgO. 33 A NAC closer to $1.7 was considered better.
4. The root-mean-squared error (RMSE) in kcal mol À1 for reproducing the electrostatic potential surrounding the H 2 O molecule. A lower RMSE is better.
5. The root-mean-squared error (RMSE) in kcal mol À1 for reproducing the electrostatic potential surrounding the H 2 PO 4 À molecular ion. A lower RMSE is better.
As shown in Table 5, some of these tests beneted from small values of ' while others beneted from large values of '. The minimum RMSE for H 2 O occurs near ' ¼ 2/3. Two of the remaining four tests exhibited optimal performance for ' > 2/3, while the remaining two tests exhibited optimal performance for ' < 2/3. Therefore, we have chosen as a suitable compromise to dene the DDEC6 method.

Computing the conditioned reference ion density (charge cycle 3)
The next step (i.e., third charge cycle) is to compute the conditioned reference ion densities, r cond A (r A ). This conditioning matches the reference ion densities to the specic material of interest. First, a loop of over grid points and atoms is performed to compute r ref (r) via eqn (41). Then another loop over grid points and atoms is performed to compute Y avg A (r A ) via eqn (40). In the DDEC6 method, each conditioned reference density, r cond and to integrate to the number of electrons in the reference ion: These constraints were introduced for theoretical appeal to ensure expected behavior. In tests we performed, these constraints had only a small effect on the NACs. They are not present in the DDEC3 method. {r cond A (r A )} is found by minimizing the functional where G I A (r A ) and F I A are Lagrange multipliers enforcing constraints (59) and (60), respectively. The minimum of h I is found by setting Inserting eqn (61) into (62) and using integration by parts to simplify gives the solution Because it directly follows that h I (r cond A (r A )) is a convex functional. Therefore, {r cond A (r A )} is uniquely determined for a given {Y avg A (r A )} input.  Fig. S1 of the ESI. † Because our algorithm cuts the size of the search domain by better than half in each reshaping cycle, it is mathematically guaranteed to always converge to the correct solution in a few reshaping cycles.
Aer computing {r cond A (r A )}, a loop over grid points and atoms is performed to compute Then another loop over grid points and atoms is performed to compute

Updating {w A (r A )} (charge partitioning steps 4 to 7)
First, we provide a brief overview of charge partitioning steps 4 to 7. The fourth through seventh charge partitioning steps compute the weighted spherical average and enforce the exponential tail constraints. Because it is advisable to enforce the N val A $ 0 constraint only aer the exponential tail constraints have already been enforced, the N val A $ 0 constraint is enforced beginning with the h charge partitioning step. k A is the Lagrange multiplier that enforces N val A $ 0. We now explain the distinction between the terms 'charge partitioning step' and 'charge cycle'. Here, the term 'charge cycle' refers to a sequence of loops that involves computing W(r) from {w A (r A )}, followed by a stockholder partition (eqn (68)) to compute {N A }, and nally the update of {w A (r A )}. Charge cycles that update {k A } but not any other contributions to {w A (r A )} are part of the same 'charge partitioning step'. Thus, in general, each charge cycle follows one of two paths: (a) if the value of update_kappa is FALSE, the charge partitioning cycle follows path 6a in which the exponential tail constraints are applied, and (b) if the value of update_kappa is TRUE, the charge partitioning cycle follows path 6b in which {k A } is updated. The rst charge cycle in a charge partitioning step follows path 6a and the subsequent charge cycles (if any) in a charge partitioning step follow path 6b. Fig. S2 in the ESI † is a ow diagram summarizing the DDEC6 method. Now, we provide the detailed sequence for charge partitioning steps 4 to 7. The fourth charge cycle uses: Initialize the following variables: update_kappa ¼ FALSE, com-pleted_steps ¼ 3, charge_cycle ¼ 4, and {k A } ¼ {0.0}. The fourth and later charge cycles use the following sequence of steps: 1. In the rst loop over grid points and atoms, the sum in eqn (8) is computed at each grid point.
2. In the second loop over grid points and atoms, the following quantities are computed: All of these quantities except {r A (r A )} are stored. 3. A weighted spherical average density, r wavg A (r A ), is then computed: Eqn (73) has the form of a weighted spherical average density where r A (r A ) is weighted at each grid point proportional to Examining eqn (75), the relative weight assigned to each point is bounded by The lower bound of $0.1 occurs, because it is not possible to have hw A (r A )/W(r)i r A ¼ 0 if w A (r A )/W(r) z 1 for any grid point on the same radial shell. For a specic r A , positions with larger w A (r A )/W(r) receive smaller u A (r A ), and positions with smaller w A (r A )/W(r) receive larger u A (r A ). Thus, r wavg A (r A ) weights portions of r A (r A ) that overlap other atoms more heavily than those portions that do not overlap other atoms. At this stage, we also compute: 4. On the fourth charge cycle, update_kappa is not altered. On the h and subsequent charge cycles, update_kappa is set to TRUE if N val A < À10 À5 electrons (e) for any atom. If update_ kappa is TRUE and the N A changes between successive charge cycles were less than 10 À5 electrons for each atom two consecutive times in a row then update_kappa is reset to FALSE and {k A } is reset to {0.0}. Otherwise, the current value of update_ kappa is not altered. This sequence of steps has the effect of: (a) setting update_kappa to TRUE when the number of valence electrons is negative for any atom on the h and later charge cycles, (b) setting update_kappa to FALSE when {k A } are converged for a particular charge partitioning step, and (c) starting each charge partitioning step with the initial guess {k A } ¼ {0.0}.
5. If the value of update_kappa is FALSE (i.e., {k A } are converged for that charge partitioning step), then completed_ steps is incremented by +1. If completed_steps ¼ 7, the iterative charge cycles are nished and exit at this point.
6a. If the value of update_kappa is FALSE, then {w A (r A )} is updated to impose the constraints which are illustrated in Fig. 2.
(i) Constraint preventing buried tails from becoming too diffuse: analogous to the DDEC3 method, 19 G A (r A ) is computed to make sure the tails of buried atoms do not become too diffuse. In the DDEC6 method, we set instead of the DDEC3 expression for s A (r A ). The constraints are imposed by minimizing the following optimization functional where G II A (r A ) and F II A are Lagrange multipliers enforcing constraints (80) and (82), respectively. 19 h II (G A (r A )) is a convex functional with the unique minimum: 19 Section S4.3 of the ESI † describes the iterative algorithm we used to compute {G A (r A )}. This description includes an algorithm ow diagram in Fig. S3 of the ESI. † Because our algorithm cuts the size of the search domain by better than half in each reshaping cycle, it is mathematically guaranteed to always converge to the correct solution in a few reshaping cycles.
(ii) Constraint preventing buried tails from becoming too contracted: H A (r A ) is then computed to make sure the tails of buried atoms do not become too contracted. Specically, the constraints are imposed by minimizing the following optimization functional where G III A (r A ) and F III A are Lagrange multipliers enforcing constraints (85) and (87), respectively. h III (H A (r A )) is a convex functional: The unique minimum is In practice, H A (r A ) can be easily found by starting with the initial estimate H est A (r A ) ¼ G A (r A ) and enforcing constraint (85) by recursively setting beginning with the second radial shell and continuing outward until the last radial shell. Finally, H A (r A ) is normalized to satisfy constraint (87): Comments on these exponential tail constraints: Constraint (80) preventing the tails of buried atoms from becoming too diffuse is the same for the DDEC3 and DDEC6 methods. The DDEC6 method adds constraint (85) to prevent the tails of buried atoms from becoming too contracted. The limiting exponent of 2.5 bohr À1 corresponds to w A (r A ) in the buried tail being cut in half for an r A increase of 0.277 bohr (0.147Å). There is no reason for w A (r A ) to decrease more rapidly than this in the buried tail. The integrals of the optimization functionals h I,II,III in eqn (61), (83) and (88) have similar forms, except that a square root appears in the denominator of the rst integral in h I,II but not in h III . This exponent in the denominator of the rst integral of the optimization functional is called the reshaping exponent. 19 A reshaping exponent x ¼ 1/2 is used in the optimization functional h II (eqn (83)) that enforces constraint (80) preventing tails of buried atoms from becoming too diffuse. 19 A reshaping exponent x ¼ 1/2 is also used in the optimization functional h I (eqn (61)) that reshapes the conditioned reference ion densities. The value x ¼ 1/2 is appropriate for these cases, because it shis electron density from the tail region into the intermediate region where atoms interface. 19 A reshaping exponent x ¼ 1 is used in the optimization functional h III (eqn (88)) that enforces constraint (85) preventing tails of buried atoms from becoming too contracted. Eqn (91) implementing constraint (85) adds electron density to the tail region thereby requiring density to be removed during renormalization. Removing electron density during renormalization with x < 1 is ill-behaved, because this will completely deplete the electron density in the buried tail region due to w A (r A ) x /w A (r A ) [ 1 when w A (r A ) ( 1. Using x ¼ 1 avoids this problem.
6b. If the value of update_kappa is TRUE, then {k A } is updated by setting where k old A is the value of k A before the update is applied. The {w A (r A )} is then updated via eqn (96). For an individual charge partitioning step, this process corresponds to minimizing the functional where G(r) and {k A } are Lagrange multipliers enforcing constraints (4) and (33), respectively. Within an individual charge partitioning step, the {H A (r A )} remain constant during the update of {k A }. Consequently, the curvature is given by for any |dr A (r A )| > 0. This positive denite curvature indicates a convex optimization landscape with a unique solution. For an atom without any overlaps (i.e., isolated atomic ion limit), u A ¼ 0 and the converged r A (r A ) is independent of k A . Therefore, when u A is negligible (e.g., u A < 10 À7 ) we set k A ¼ 0.0 to avoid division by zero in eqn (93). 7. The updated atomic weighting factor is The charge_cycle is incremented by +1, and the calculation returns to # 1 above to start the next charge cycle. Table 6 summarizes the ve differences between the DDEC3 and DDEC6 methods. First, the DDEC6 method uses xed reference ion charge values rather than self-consistently updating them as in the DDEC3 method. Second, the DDEC6 method uses c ¼ 4 and c ¼ 1 to yield a xed number of charge partitioning steps with c DDEC6 equiv ¼ 1/5. In contrast, the DDEC3 method uses a self-consistent iterative scheme with c ¼ 1 and c DDEC3 ¼ 3/14 to yield c DDEC3 equiv ¼ 3/17. Third, the DDEC6 method uses a weighted spherical average in place of the simple spherical average used in the DDEC3 method. Fourth, the DDEC6 method incorporates constraints to make the conditioned reference ion densities monotonically decreasing and to integrate to N ref A . Fih, the DDEC6 method adds a constraint to ensure the buried tails of w A (r A ) do not decay too quickly. Both the DDEC3 and DDEC6 methods include the same constraint to ensure the buried tails of w A (r A ) do not decay too slowly. These constraints make the buried tail of w A (r A ) decay exponentially with increasing r A .

Quantum chemistry calculations
We performed periodic quantum chemistry calculations using VASP 77,78 soware. Our VASP calculations used the projector augmented wave (PAW) method 79,80 to perform all-electron frozen-core calculations including scalar relativistic effects with a plane-wave basis set cutoff energy of 400 eV. Calculations specifying "2 frozen Na core electrons" or "10 frozen Na core electrons" used PAWs for the Na atom including 2 or 10 frozen core electrons, respectively. For all systems, the number of kpoints times the unit cell volume exceeded 4000Å 3 . This is enough k-points to converge relevant properties including geometries and AIM properties (NACs, ASMs, etc.). Except for the solid surfaces, geometry optimizations relaxed both the unit cell vectors and ionic positions. The solid surface calculations used the DFT-optimized bulk lattice vectors and relaxed the ionic positions. Where noted, experimental crystal structures or other geometries from the published literature were used. A Prec ¼ Accurate ($0.14 bohr) electron density grid spacing was used. Bader NACs were computed using the program of Henkelman and coworkers. 44 We performed non-periodic quantum chemistry calculations using GAUSSIAN 09 (ref. 81) soware. ESP NACs were computed in GAUSSIAN 09 using the Merz-Singh-Kollman scheme. 10,82 Some materials mentioned in the developmental tests of this article are studied in more detail in the sequel article.

Electrostatic potential expansion: atomic multipole moments and charge penetration terms
In this section, we review the basic principles of expanding the electrostatic potential, V(r), into atomic contributions. The electrostatic potential is important for understanding interactions in all chemical systems. 83-85 AIM methods provide a formally exact expansion of V(r) by partitioning the total electron distribution r(r) into atomic electron distributions, {r A (r A )}: where B A (r A ) and C A (r A ) are terms due to atomic multipoles (AMs) and penetration of the atom's electron cloud, respectively. 39,45,84,86,87 Outside r A (r A ), the charge penetration term C A (r A ) vanishes reducing V A (r A ) to a multipole expansion. 88 Although eqn (98) is formally exact for all AIM methods, in practice B A (r A ) and C A (r A ) are truncated at some nite orders leading to approximate V(r) expansions. 86,87,89 Therefore, AIM methods providing more rapidly converging V(r) expansions are more convenient for constructing force-elds. For constructing exible force-elds, the AIM NACs should preferably have good conformational transferability. Alternative expansions of V(r) and system multipole moments based on distributed multipole analysis and Gaussian density functions are given in the related literature. [36][37][38]40,41 For each system, we computed atomic multipoles up to quadrupole order using well-known formulas. Atomic dipoles, are the leading component of B A (r A ). Atomic quadrupoles have the form where T is a second degree polynomial. The ve linearly independent quadrupole components can be expressed as (i) is the nuclear position. The traceless atomic quadrupole tensor, Q A , is dened byT A ¼r ArA À ðr A Þ 2d =3, whered is the identity tensor. The three eigenvalues ofQ A are independent of molecular orientation and coordinate system, and they sum to zero. For a spherical atom, all three quadrupole eigenvalues are zero. However, three zero quadrupole eigenvalues does not mean the Non-iteratively set based on a special partitioning: How spherical averaging is incorporated Type of spherical average used  Simple  Weighted  4 Constraints applied to each conditioned reference ion density -Monotonically decreasing with increasing r A and integrates to N ref Tail constraints applied to w A (r A ) during 4 th and later charge cycles Tails of buried atoms decay no slower than exp(À1.75 r A ) Tails of buried atoms decay no slower than exp(À1.75 r A ) and no faster than exp(À2.5 r A ) atom is necessarily spherical, because such an atom could have non-zero dipole or higher order multipole (e.g., octapole) moments that indicate deviation from spherical symmetry. An oblate spheroidal density has one negative and two equal positive quadrupole eigenvalues. A prolate spheroidal density has one positive and two equal negative quadrupole eigenvalues. An ellipsoidal density can have three unequal quadrupole eigenvalues.
For nite clusters, molecules, and ions, the total dipole moment, m ¼ |m|, is given bỹ Therefore, a charge model including both NACs and atomic dipoles reproducesm exactly to within a grid integration tolerance. A point-charge only model includes the rst term in eqn (102) but neglects the atomic dipole terms. Unless a pointcharge only model is explicitly dened to reproducem (or higher order multipole) exactly, it will generally reproducem (or higher order multipole) only approximately except in cases wherem (or higher order multipole) is zero by symmetry. 90 (The general idea to constrain atom-centered point charges to exactly reproduce the molecular dipole moment is impossible for planar molecules placed in an electric eld perpendicular to the molecule's plane. For this reason, we abandon the idea to constrain atom-centered point charges to exactly reproduce the system's dipole moment.) The total p th order multipole of a nite cluster, molecule, or ion can be expressed as a sum over NACs and atomic multipoles up to p th order. 86,91 Spherical cloud penetration, C avg A (r A ), is the leading term in C A (r A ): Eqn (104) is the well-known result arising from basic electrostatic principles. The tail of r avg A (r A ) decays approximately exponentially with increasing r A : Inserting (105) into (104) yields, To determine the parameters ‫א‬ and ‫ב‬ for each atom, the CHAR-GEMOL program performed a linear least squares t of ‫ב‬ À ‫א‬ r A to ln(r avg A (r A )) over the range r min_t ¼ 2Å to r cutoff ¼ 5Å. 19 The Rsquared correlation coefficient for this linear regression was usually >0.99 indicating a nearly exact t. Previous studies have used different variations of exponential or Gaussian decaying densities (sometimes multiplied by polynomials in r A ) or exponential damping of the 1/r or multipole potential to provide approximations of charge penetration energies. 41,[92][93][94][95][96] A charge model's accuracy for reproducing V(r) can be quantied by the root mean-squared error (RMSE) in the electrostatic potential quantied over a chosen a set of grid points. [11][12][13]15,19,97,98 The specic method we used to compute RMSE is detailed in previous work and includes a constant potential adjustment to equalize the average electrostatic potentials (over the chosen set of grid points) of the charge model and full electron distribution. 13,19 The charge model may include point charges, dipoles and/or higher order multipoles, spherical cloud penetration, and/or aspherical cloud penetration terms. The relative root mean squared error (RRMSE) is the RMSE for the charge model divided by the RMSE of a null charge model having {V A (r A )} ¼ 0. 10,13,15,16,82 The RMSE and RRMSE were computed over a uniform grid of points lying between surfaces dened by g inner and g outer times the van der Waals (vdW) radii. We set (g inner ,g outer ) ¼ (1.4,2.0) for nonperiodic materials and (g inner ,g outer ) ¼ (1.3,20.0) for periodic materials. 15,16,82 For three-dimensional materials containing only nanopores (e.g., MOFs, zeolites), g outer ¼ 20.0 is sufficiently large that the RMSE grid points span the entire pores. We used the same UFF vdW radii as Campaña et al. 13 (REPEAT program) which are listed in the ESI † of Watanabe et al. 15

Ewald summation
In the periodic materials, the Ewald summation method of Smith, including NACs and (optionally) atomic dipoles, was used to compute electrostatic potentials for RMSE calculations. 99 This Ewald summation separates the Coulomb potential into a short-range portion summed in real space and a longrange portion summed in reciprocal space: We set the Ewald summation convergence parameter to a E ¼ p/ (10Å). Enough real space replications of the unit cell were included such that every point in the reference unit cell was surrounded by at least 3/a E real space distance in all directions: This corresponds to an erfc(a E r A )/r A cutoff of (a E /3)erfc(3) ¼ 6.9 Â 10 À7 bohr À1 . The reciprocal lattice vectors are dened bỹ The reciprocal space summation encompassed integer multiples b i of the corresponding reciprocal lattice vectors to yield the long-range portion of the electrostatic potential wherer A in eqn (113) is computed using (L 1 , L 2 , is excluded from the sum in eqn (113). V unit_cell is the unit cell volume. Our reciprocal space cutoff includes at least all reciprocal space vectors having 0\ k # 4 ffiffiffi ffi p p a E . Noting that each term in the reciprocal space term includes exp(Àk 2 /(4a E 2 )) as a multiplier, our reciprocal space cutoff corresponds to exp(Àk 2 /(4a E 2 )) z exp(À4p) z 3.5 Â 10 À6 . Because it is a short-range effect, spherical charge penetration can be included entirely in the real space summation using the analytic potential of eqn (106). While the spherical cloud penetration effect is small over grid points used to compute RMSE, it becomes increasingly important for smaller r A values.

Convergence speed
DDEC charge and spin partitioning use a cutoff radius (e.g., 5Å) to achieve a computational cost that scales linearly with increasing number of atoms in the unit cell aer the initial electron and spin density grids have been generated. 19,100 When combined with a linearly scaling quantum chemistry program (e.g., ONETEP), this provides computationally efficient charge analysis even for systems containing thousands of atoms in the unit cell. 101,102 Fig. 3 plots the wall time for computing DDEC3 and DDEC6 NACs, atomic multipoles, and electron cloud decay exponents for the NaCl crystal (ambient pressure) as a function of the number of atoms in the unit cell. The unit cells containing 16 and 54 atoms were constructed by forming 2 Â 2 Â 2 and 3 Â 3 Â 3 supercells, respectively, that were used as input for computing the density grid les in VASP. This calculation utilized a volume of 2 Â 10 À3 bohr 3 per grid point. The wall time in Fig. 3 begins when the CHARGEMOL program is rst entered prior to reading the VASP density grid les and continues until the moment aer the computed NACs, atomic multipoles, and electron cloud decay exponents have been written to the net_atomic_charges.xyz le. As expected, Fig. 3 shows the required wall time depends linearly on the number of atoms in the unit cell. Even though the computation was run on a single processor core, only six minutes were required for DDEC6 charge analysis of the unit cell containing 54 atoms. This was only one-h of the time required for DDEC3 charge analysis of the same material. Much larger times are required for DDEC3 or DDEC6 calculations reading in Gaussian basis set coefficients (e.g., GAUSSIAN 09 generated .wfx les), because in such cases the density grids must be explicitly computed within the CHARGEMOL program.
The main difference in computational cost between DDEC3 and DDEC6 arises from the number of charge cycles required for convergence. In fact, the electron partitioning scheme is the only computational difference between DDEC3 and DDEC6. As shown in Table 7, more charge cycles are required for DDEC3 convergence than for DDEC6 convergence. For all materials we studied, fewer than 200 DDEC3 charge cycles were required. 19 For all materials, seven DDEC6 charge partitioning steps are required. More than one DDEC6 charge cycle per charge partitioning step is required only when the N A $ N core A constraint is binding, because in this case {k A } must be iteratively computed. For Cs@C 60 with 54 simulated frozen core electrons, 18 DDEC6 charge cycles were required to complete the seven charge partitioning steps. All other materials we studied converged in seven DDEC6 charge cycles. It is gratifying that the extra accuracy of DDEC6 compared to DDEC3 comes with a reduced computational cost. Fig. 3 The computational cost of DDEC3 and DDEC6 charge partitioning scales linearly with increasing system size. Computation for NaCl crystal (ambient pressure) performed with serial Fortran CHARGE-MOL program executed on a single processor core in Intel Xeon E5-2680v3 on the Comet supercomputing cluster at the San Diego Supercomputing Center.

Atomic dipole magnitudes
To produce an efficiently converging atom-centered polyatomic multipole expansion, the atomic dipoles and higher atomic multipoles should not be too large in magnitude. The right panel contains the following dense materials: TiCl 4 crystal, SrTiO 3 surface slab, Pnma NaCl 3 crystal (2 frozen Na core electrons), P4mmm Na 2 Cl crystal (2 frozen Na core electrons), natrolite, NaF surface slab, Mo 2 C surface with K adatom, Cmmm Na 2 Cl crystal (2 frozen Na core electrons), Pd crystal with interstitial H atom, Pd 3 Hf crystal with interstitial H atom, Pd 3 In crystal with interstitial H atom, Pd 3 V crystal with interstitial H atom. The slopes of the best t lines constrained to have an intercept of (0,0) were 1.0462 (1.0222) with R-squared correlation coefficient ¼ 0.9263 (0.9496) for the surface atom materials (dense materials). This shows the atomic dipole magnitudes are about 2-5% larger in magnitude for DDEC3 compared to DDEC6. These small atomic multipoles allow DDEC6 NACs to approximately reproduce V(r) surrounding a material.
Why are the DDEC6 atomic dipole magnitudes slightly smaller on average than those for the DDEC3 method? The atomic dipole magnitude, m A , can be written as One strategy to slightly reduce m A is to make the assigned {r A (r A )} slightly less diffuse without signicantly altering r(r)/ W(r). Because the integral contributions in eqn (115) are proportional to r A , minimizing the contributions for large r A values will decrease m A . The DDEC6 anions are typically less diffuse than the DDEC3 anions, because DDEC6 uses a partially localized reference ion charge (q ref ) instead of the AIM charge used as reference (q ref A ¼ q A ) in DDEC3. A second strategy for minimizing m A is to make r(r)/W(r) as close to 1 as feasible in the bonding regions where atoms overlap. The integral contributions in eqn (115) do not depend on w A (r A ) in regions where atoms do not overlap signicantly, because w A (r A ) z W(r) in those regions. The weighted spherical average, r wavg A (r A ), weights more heavily the regions where atom overlaps are substantial. Thus, r wavg A (r A ) makes r(r)/W(r) as close to 1 as feasible specically within the regions where atom overlaps are substantial. This is precisely those regions where integral contributions to m A can be suppressed. For this reason, using r wavg A (r A )substantially outperforms the simple spherical average, r avg A (r A ), for the purpose of minimizing m A . This reduction in m A also causes the NACs to more accurately reproduce V(r) surrounding the material.

Conformational transferability and accuracy for reproducing the electrostatic potential
For some applications, the preferred strategy is to use NACs from quantum chemistry calculations to build an electrostatic model in exible force-elds for classical molecular dynamics and Monte Carlo simulations. Gas adsorption and diffusion in porous crystalline materials is a common example. 66 The simulations of large biomolecules is another common example. 103 For these applications, exibility of the material may play a key role. 104 Thus, it is important for the NACs to simultaneously have good conformational transferability and approximately reproduce the electrostatic potential around the material. This is a challenging criterion, because NACs directly t to the electrostatic potential (without additional tting criteria) oen have poor conformational transferability. 14,17 Fig . 5 shows the Zn-nicotinate metal-organic framework (MOF). This structure is comprised of one-dimensional pore channels having an approximately square cross-section. The electrostatic potential is most positive near the atomic nuclei and becomes most negative near the pore centers. In Fig. 5, a contour of electrostatic potential isovalue is displayed as a green surface.
To assess the effects of framework exibility on the Znnicotinate MOF, we performed an ab initio molecular dynamics (AIMD) calculation in VASP. This AIMD simulation used a planewave cutoff energy of 400 eV, the PAW method, and PBE functional for 1200 femtoseconds (fs) with a time step of 1 fs. (Electronic energies were converged to 10 À4 eV. A normal precision grid was used, which had a 0.36 (0.18) bohr uniform grid spacing along each lattice direction for the orbital (electron density) fast Fourier transforms.) A canonical ensemble at T ¼ 300 K was simulated using a Nosé thermostat. 105 The Nosé thermal coupling parameter 105 was adjusted until the period of thermal oscillations was approximately 16 fs (i.e., 16 time steps). This gave reasonable temperature uctuations that preserved the MOF's chemical integrity. A period of $250 fs was allowed for thermal equilibration. Twenty-one conformations were used for the subsequent charge analysis: the DFT-optimized minimum energy geometry and 20 AIMD conformations corresponding to time steps 250, 300, 350, . 1200. For each of these conformations, the valence and total all-electron densities and electrostatic potential were generated in VASP using singlepoint (xed-geometry) calculations with a PREC ¼ Accurate ($0.14 bohr) grid. Electrostatic potential tting NACs were calculated using the REPEAT method and associated soware code by Campaña, Mussard, and Woo. 13 For the REPEAT method, NACs were t outside surfaces dened by g R ¼ 1.0 and 1.3 times the atomic vdW radii. As previously noted, REPEAT NACs are highly sensitive to the particular value of this vdW multiplier g R . 13,15,16 Campaña et al. 13 recommended the value g R ¼ 1.0. Chen et al. recommended the value g R ¼ 1.3. 16 Table 8 summarizes electrostatic potential RMSE and RRMSE values averaged over all 21 system conformations. These were computed on a uniform grid dened by an inner vdW multiplier of 1.3 and an outer vdW multiplier limited only by the pore size. When using the conformation averaged NACs and the conformation specic NACs, the REPEAT method produced a more accurate representation of the electrostatic potential than the DDEC6 NACs. By denition, electrostatic potential tting methods (such as REPEAT) should produce a more accurate representation of the electrostatic potential than other types of atom-centered point charge models when using the conformation specic NACs. Including atomic dipoles in the conformation specic NACs dramatically lowered the DDEC6 RMSE from 2.99 to 0.55 kcal mol À1 , which was even better than the REPEAT values for both g R ¼ 1.0 and 1.3. This means that for reproducing the electrostatic potential of a rigid framework, the DDEC6 method including atomic dipoles sometimes outperforms the REPEAT NACs. As shown in Table  8, including the spherical charge penetration term had negligible effect. When using the low energy conformation NACs, REPEAT with g R ¼ 1.0 yielded the lowest RMSE and RRMSE values. For the low energy conformation NACs, the DDEC6  RMSE and RRMSE values were between the REPEAT values using g R ¼ 1.0 and 1.3. Table 9 summarizes information about the conformational transferability of the NACs. The DDEC6 NACs had excellent conformational transferability with a mean unsigned deviation (MUD) #$0.01 for each atom type. Moreover, the max and min DDEC6 NACs for each atom type differed by <0.1 e. The REPEAT method had better conformational transferability with g R ¼ 1.0 than with g R ¼ 1.3. For g R ¼ 1.0, three of the atom types exhibited uctuations > 0.5 e as measured by the difference between max and min NACs. For g R ¼ 1. What are the implications of these results for developing force-elds to reproduce the electrostatic potential surrounding materials? If the goal is to reproduce the electrostatic potential as accurately as possible surrounding a rigid material using an atom-centered point-charge model without regard for the chemical meaning of those NACs, then methods such as ESP, 10 Chelp, 11 or Chelpg 12 for molecular systems or REPEAT 13 for periodic materials or the Wolf-summation technique of Chen et al. 16 are preferable, because these methods minimize RMSE without regard for the chemical meaning of the NACs. If the goal is to produce chemically meaningful NACs that reproduce the electrostatic potential as accurately as possible surrounding a rigid material, the DDEC6 method is preferable with or without including atomic dipoles, because this method assigns atomic electron distributions to resemble real atoms and reproduce the electrostatic potential. For constructing exible, non-reactive force-elds, NACs based on the low-energy structure or an average across multiple system conformations can be used. Depending on the material and computational details, either the DDEC6, REPEAT 13 (especially its extension to simultaneously t multiple conformations 17 ), ESP, 10 Chelp, 11 Chelpg, 12 or Wolf-summation technique 16 may yield the more accurate exible force-eld NACs.

Quantifying the consistency between assigned NACs and ASMs
An AIM method should preferably yield chemically consistent NACs and ASMs. For a special type of system, the consistency between assigned NACs and ASMs can be quantitatively measured. Consider a single neutral atom or a +1 atomic cation having only one easily removable electron. For convenience, we refer to these atoms or atomic ions as containing only one labile electron. Next, consider an uncharged host system containing only deeply bound electrons that are paired. If we combine the atom or atomic ion having one labile electron with the host system having paired electrons to form a weakly or ionically bound endohedral complex, a portion of the labile electron's density may be transferred to the host system's atoms. Since there is only one labile electron in a background of strongly held effectively paired electrons (in the endohedral and host system atoms) and (optionally) strongly held like-spin unpaired electrons (in the endohedral atom), the labile electron's spin cannot be locally cancelled by any other electrons in the system. In this case, the amount of electron density transferred from the endohedral atom to the host system should equal the amount of spin magnetization density transferred from the endohedral atom to the host system. This leads to the following quantication of consistency D ¼ ðq total À NAC endohedral Þ |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} charge transferred to host þ ðM total À ASM endohedral Þ |fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl} spin magnetization transferred to host (116) where q total $ 0 is the total system charge (in atomic units), M total $ 0 is the system's total spin magnetic moment (in atomic units), and NAC endohedral and ASM endohedral are the assigned NAC and ASM of the endohedral atom in the endohedral complex. In the ideal case, D / 0, because the single labile electron should transfer equal amounts of spin magnetization and negative charge to the host. This condition should also be fullled in situations where a weakly bound endohedral atom has approximately zero labile electrons. It is not necessarily fullled in cases where the number of labile electrons exceeds one, because in such cases the multiple labile electrons might be transferred into orbitals of opposing spins leading to physically different amounts of transferred spin magnetization and transferred negative charge. Nor should it be fullled in cases where a strong covalent bond forms between the endohedral atom and the host.
As specic examples, we consider the Li@C 60 , N@C 60 , Cs@C 60 , Xe@C 60 , [Eu@C 60 ] 1+ , and [Am@C 60 ] 1+ endohedral fullerenes. The C 60 host has only deeply held paired electrons. 106 (Experiments show C 60 has a rst ionization energy of 6.4-7.9 eV, an electron affinity of approx. 2.6-2.8 eV, and a rst optical transition of approx. 3.2 eV. [107][108][109][110] ) These complexes were chosen as examples, because they span a wide range from light to heavy elements having zero to one labile electrons. The Li and Cs  113 This can be explained by the high electronegativity of the N atom, which retains its three unpaired electrons. Thus, we consider none of the electrons in N@C 60 to be labile. We optimized the geometries and electron distributions of these endohedral fullerenes in VASP using the PBE functional with the PAW method and a 400 eV plane-wave cutoff. A 20Å Â 20Å Â 20Å cubic unit cell was used. The positions of all atoms in the system were optimized until the forces on every atom were negligible. Due to the almost spherical nature of the C 60 enclosure, only the equilibrium displacement of the endohedral atom from the cage's center should be considered signicant. Therefore, we did not attempt multiple initial geometries with different angular variations in the endohedral atom's position. Fig. 6 displays the optimized geometries. In Table 10, the optimized offset is the distance of the endohedral atom from the center of the C 60 group. (The center of the C 60 group was computed by averaging the (x, y, z) coordinates of the carbon atoms.) The N, Xe, and Cs atoms were located at the center of the C 60 group (within a computational tolerance). The central position of the N atom and quartet spin state agree with electron paramagnetic resonance (EPR) and electron-nuclear double resonance (ENDOR) experiments. 113 As reviewed by Popov et al., a wide variety of spectroscopic experiments show the noble gas atom in Ng@C 60 complexes (Ng ¼ He, Ne, Ar, Kr, or Xe) resides at the cage's central position. 114 The Cs@C 60 geometry optimization converged to an energy minimum having a centrally located Cs atom, even though the calculation was started using a non-zero offset of 0.46Å. This shows the central Cs position is at least a local (and perhaps global) energy minimum. In the optimized structures, the Li, Eu, and Am ions were displaced by >1Å from the center. Our calculated offset for Li@C 60 is in good agreement with previous computational studies. 115,116 Prior calculations on neutral Eu@C 60 and Am@C 60 complexes also indicate an off-center position. [117][118][119] In Table 10, the mean absolute inconsistency between the assigned NACs and ASMs is quantied as the average of the absolute value of D for the six materials. Because CM5 is a correction to the HD NACs, the CM5 method utilized the HD ASMs. Among the four methods, the HD NACs and ASMs were the most inconsistent with an average inconsistency of 0.6 electrons. The DDEC6 NACs and ASMs were the most consistent with an average inconsistency of 0.15 electrons. The CM5 and Bader methods had intermediate performance. Because the HD and DDEC6 ASMs were nearly the same, the poor performance of the HD method must have been due to its inaccurate NACs.
Examining the DDEC6 results in Table 10, $1 electron was transferred from the Li and Cs atoms, leaving a Li 1+ or Cs 1+ cation in the center having negligible unpaired spin. The CM5 NAC of 1.468 for the Cs atom seems too high, because this implies removal of some of its outer core electrons. The HD, CM5, and DDEC6 methods all gave $0.3 electrons transferred in the Xe system, but the Bader method gave $0.1 transferred electrons in this material. All four methods gave the least amount of electron transfer for the N system compared to other systems. In the Eu and Am cationic systems, $0.4 electrons were transferred from the endohedral metal atom to the C 60 host to give a NAC of $1.4 and an ASM of $7.4 for the endohedral metal atom. Overall, these results demonstrate reasonable consistency between the assigned DDEC6 NACs and ASMs. We were rst motivated to improve upon the DDEC3 method by a series of calculations on sodium chloride crystals having unusual stoichiometries. Specically, we computed NACs for the ten high-pressure crystal structures reported by Zhang et al. 120 and the ambient-pressure NaCl structure shown in Fig. 7. We generated the electron density in VASP using the PBE functional and (a) the experimental X-ray diffraction geometries 120 for the ten high-pressure crystals and (b) the PBEoptimized geometry for the ambient-pressure Fm3m-NaCl. As shown in Table 11 and Fig. 8, the DDEC3 NAC for at least one Na atom is larger than +1.0 for the following cases: (a) 1.311 for Na(3) atoms in Cmmm-Na 2 Cl crystal at 180 GPa, (b) 1.235 for Na(1) and 1.295 for Na(2) atoms in Cmmm-Na 3 Cl 2 crystal at 280 GPa, (c) 1.219 for Na atoms in Imma-Na 2 Cl crystal at 300 GPa, (d) 1.035 for Na(1) and 1.108 for Na (2) 121 ) Based on these results, we concluded that the DDEC3 method overestimates atomic charge magnitudes in some materials. If ten Na core electrons are frozen, the DDEC3 NACs for the Na atoms are constrained to be #+1.0 as shown in Table 11, but this is not a satisfactory solution because we want NACs to be approximately independent of the number of frozen core electrons. This observation led us to explore numerous potential modications to the DDEC method, which aer testing dozens of potential modications culminated in the DDEC6 method. As shown in Table 11 and Fig. 8, the DDEC6 NACs have the expected behavior being #+1.0 for each of the Na atoms. Moreover, the DDEC6 NACs were nearly insensitive to whether 2 or 10 frozen Na core electrons were used.
Bader's quantum chemical topology 62-64 cannot be used to compute NACs for some of these materials, because it assigns compartments not belonging to any atom (or to multiple atoms simultaneously) in the following cases: (a) Cmmm-Na 2 Cl crystal at 180 GPa irrespective of the number of frozen Na core electrons, (b) P4/m-Na 3 Cl 2 crystal at 140 GPa irrespective of the number of frozen Na core electrons, (c) P4/mmm-Na 3 Cl crystal at 140 GPa when using 10 frozen Na core electrons, and (d) P4/ mmm-Na 2 Cl crystal at 120 GPa when using 10 frozen Na core electrons. Bader compartments for these four materials are detailed in Table 12. As it should be, the assignment of these Bader compartments was based on the full (i.e., valence + (frozen) core) electron density, not simply the valence density or the valence pseudodensity. At rst, one might propose  each non-nuclear attractor could be assigned to one of the nearby atoms, but this is not satisfactory because in some cases such an assignment cannot be made without destroying the crystal symmetry. For example, the P4/mmm-Na 2 Cl crystal at 120 GPa (modeled with 10 frozen Na core electrons) contains one non-nuclear attractor whose closest atoms are the two equivalent Na(3) atoms; therefore, it is impossible to assign this non-nuclear attractor to one of the closest atoms without breaking the crystal symmetry. Alternatively, one could propose to divide the electron density and/or volume of each non-nuclear attractor amongst the nearby atoms in a way that preserves the system's symmetry, but it is not presently clear whether this could be done in a way that preserves most of the important properties of the Virial compartments. Specically, each of the Bader compartments satises the Virial theorem and behaves as an open quantum system, but divided pieces of such compartments may not. 62,64 It might be possible that divisions of a non-nuclear attractor could be made that satisfy the net zero ux condition (and Virial theorem) over each division volume but not the local zero ux condition in the bounding surfaces, but it is not presently clear whether such a partitioning would  67. e Bader NACs cannot be reported because Bader analysis yields more compartments than atoms (see Table 12). For P4/mmm-Na 3 Cl crystal at 140 GPa and P4/mmm-Na 2 Cl crystal at 120 GPa, the Bader NACs for two Na frozen core electrons are listed but non-nuclear attractors occur for ten Na frozen core electrons.
always have a unique denition even if constrained to preserve the system's symmetry.
In materials for which there is a one-to-one correspondence between Bader compartments and atoms (i.e., no non-nuclear attractors), the Bader NACs are computed by integrating the number of electrons over each compartment. In such cases, the Bader method oen yields reasonable NACs for dense ionic solids. Examining Table 11, the DDEC6 and Bader NACs using 2 frozen Na core electrons exhibited similar trends for all of the sodium chloride crystals where the Bader NACs were dened. Of particular interest, the Cl NAC was signicantly more negative than À1.0 for some of the materials. The Bader NACs were more sensitive than the DDEC6 NACs to whether 2 or 10 frozen Na core electrons were used. For example, in Imma-Na 2 Cl crystal at 300 GPa the DDEC6 NAC for the Cl atom was À1.628 (2 frozen Na core electrons) and À1.570 (10 frozen Na core electrons) compared to the Bader Cl NAC of À1.351 (2 frozen Na core electrons) and À0.633 (10 frozen Na core electrons). The reason for this larger sensitivity of the Bader NACs on the number of frozen core electrons is that according to an integration routine now used in popular Bader analysis programs the frozen core electrons are assigned wholly to the host atom while non-frozen electrons crossing into neighboring Bader compartments are divided amongst several atoms. 44 This artifact could be removed by partitioning all electrons (i.e., both frozen and non-frozen) according to their density in each of the Bader compartments, yet even so the sensitivity of the number of Bader compartments on the number of frozen core electrons (e.g., P4/mmm-Na 3 Cl crystal at 140 GPa and P4/mmm-Na 2 Cl crystal at 120 GPa) would persist. Alternatively, one could choose a small number of frozen core electrons to ensure the amount of frozen core electron density spilling into neighboring compartments is negligible. Consequently, Bader NACs with 2 frozen Na core electrons are more reliable than those with 10 frozen Na core electrons.
Interestingly, a recent computational study has predicted the existence of even more high-pressure sodium chloride phases with unusual stoichiometries. 122 That study also performed Fig. 8 Largest magnitude Na atomic charges in compressed sodium chloride crystals. These were computed using 2 frozen Na core electrons. Based on chemical arguments, at least 10 electrons should be assigned to each Na atom. The DDEC3 method gives many Na atom charges >1, which indicates some electrons are not assigned to the correct atom. The DDEC6 method fixes this problem. Bader analysis on several high-pressure sodium chloride crystals, but all but one of those computations were performed for different pressures or different phases than the Bader charge results presented here. 122 The non-nuclear attractor for P4/m-Na 3 Cl 2 (at 125 GPa) and Cmmm-Na 2 Cl (at 200 GPa) was noted in that study. 122

NACs usually follow electronegativity trends
A method for computing NACs should accurately describe electron transfer directions. Table 13 lists the Spearman rank coefficient between DDEC6 NACs and Pauling scale electronegativity for 14 materials containing four or more different elements. The Spearman rank coefficient quanties how well two variables can be related by a monotonic function. A Spearman rank coefficient of +1.0 (À1.0) indicates the two variables are related by a perfectly monotonically increasing (decreasing) function. A Spearman rank coefficient of 0.0 indicates the two variables are not correlated at all. For each material, the average DDEC6 NAC was computed for each element. For each material, the elements were ranked from highest to lowest electronegativity. For example, in B-DNA the elements ranked according to electronegativity were 1. O, 2. N, 3. C, 4. P, 5. H, 6. Na. A linear least-squares regression between the average NACs and the whole number electronegativity ranks was then performed. The Spearman rank coefficient equals Pearson's correlation coefficient R for this linear regression, where 0 # R 2 # 1 is the squared correlation coefficient. Nine of the 14 materials had a Spearman rank coefficient of 1.00, indicating the average DDEC6 NACs followed exactly the same order as the element electronegativities. The remaining ve materials had Spearman rank coefficients between 0.60 and 0.94, indicating the average DDEC6 NACs followed approximately but not exactly the same order as the element electronegativities. These results show DDEC6 NACs usually (but not always) follow Pauling scale electronegativity trends. The exceptions are not to be regarded as a deciency of either the DDEC6 NACs or the Pauling scale electronegativities, because element electronegativities can only describe the usual direction of electron transfer. The specic direction of electron transfer is affected by the chemical environment. For example, while electrons are usually transferred from carbon to the more electronegative oxygen, experiments show carbon monoxide is an exception with electron transfer from oxygen towards carbon. 123 Boron monouoride is another exception with electrons transferred from uorine towards boron. 124 Furthermore, multivalent cations can sometimes acquire a positive NAC greater than that of less electronegative monovalent cations, because the multivalent cations may acquire a NAC greater than +1. For example, the multivalent P, Al, and Si atoms in B-DNA and natrolite acquired higher NACs than the monovalent Na atoms.

Conclusions
The main utility of net atomic charges (NACs) is they concisely convey important information about the electron distribution in materials. Due to the continuous nature of the electron cloud in a material, there is some exibility in how to partition the total electron distribution among atoms-in-materials. 125 In this article, we introduced a new method, called DDEC6, for dening atoms-in-materials and computing NACs in periodic and non-periodic materials. The DDEC6 NACs are well-suited both for understanding charge-transfer in materials and for constructing exible force-elds for classical atomistic simulations of materials. Our method can be applied with equal validity to small and large molecules, ions, porous and non-porous solids, solid surfaces, nanostructures, and magnetic and nonmagnetic materials irrespective of the basis set type used. This broad applicability makes it ideally suited for use as a default atomic population analysis method in quantum chemistry programs. Actually incorporating DDEC6 into popular quantum chemistry programs will require additional work. For example, it might be desirable to implement DDEC6 on the same integration grid already used in the respective quantum chemistry program.
We used a scientic engineering design approach to achieve nine performance goals: (1) the total electron distribution is partitioned among the atoms by assigning exactly one electron distribution to each atom, (2) core electrons remain assigned to the host atom, (3) NACs are formally independent of the basis set type because they are functionals of the total electron distribution, (4) the assigned atomic electron distributions give an efficiently converging polyatomic multipole expansion, (5) the assigned NACs usually follow Pauling scale electronegativity trends, (6) NACs for a particular element have good transferability among different conformations that are equivalently bonded, (7) the assigned NACs are chemically consistent with the assigned ASMs, (8) the method has predictably rapid and robust convergence to a unique solution, and (9) the computational cost of charge partitioning scales linearly with increasing system size. The DDEC6 method makes ve modications relative to the DDEC3 method: (a) a xed reference ion charge is used based on a weighted average of the electron population in a localized atomic compartment and a stockholder partition, (b) the conditioned reference ions are constrained to decay monotonically with increasing r A and to integrate to the correct number of electrons (N ref , (c) four conditioning steps instead of (r some_ref A ) c (r avg A ) 1Àc are used to construct w A (r A ), (d) a weighted spherical average improves the effect of spherical averaging during charge partitioning, and (e) the atomic weighting factor w A (r A ) is constrained to decay no faster than exp(À2.5 r A ) in an atom's buried tail.
We now summarize key computational results. We showed for the rst time that the DDEC3 and IH methods sometimes converge to non-unique solutions that depend on the starting guess, because their optimization landscapes are sometimes non-convex. For example, the DDEC3 and IH methods assigned NACs that severely broke the molecular symmetry for the H 2 triplet molecule with constrained 50 pm bond length. The DDEC6 method removes this problem by using a xed reference ion charge with a total of seven charge partitioning steps. For a series of high-pressure sodium chloride crystals with unusual stoichiometries, we found the DDEC3 method sometimes gives NACs in excess of +1.0 for the Na atoms and Bader's quantum chemical topology sometimes yields non-nuclear attractors while the DDEC6 method exhibits neither of these problems. For six endohedral fullerenes containing zero to one labile electrons, the consistency between assigned NACs and ASMs was quantied for the Hirshfeld, CM5, Bader, and DDEC6 methods. Among these four methods, the DDEC6 method gave the most consistent agreement between assigned NACs and ASMs. A study of various conformations of the Zn-nicotinate MOF showed the DDEC6 NACs have excellent conformational transferability and are ideally suited for constructing exible force-elds to approximately reproduce the electrostatic potential across various system conformations. Computational tests for various molecular and solid materials showed the DDEC6 atomic dipole magnitudes are slightly smaller on average than the DDEC3 ones. This is desirable to produce an efficiently converging atom-centered polyatomic multipole expansion for reproducing the electrostatic potential surrounding the material. The DDEC6 NACs also follow Pauling scale electronegativity trends on average, but of course the specic electron transfer direction between two elements is affected by their chemical environment. Due to the fewer number of required charge partitioning steps, DDEC6 charge partitioning converges several times faster than DDEC3 charge partitioning.
In summary, DDEC6 offers substantially improved accuracy and lower computational cost than DDEC3. We therefore recommend the DDEC3 method be replaced with the DDEC6 method. We performed additional computational tests of the DDEC6 method across the wider set of materials described in the sequel article. 33 All of those results agree with the ndings presented here.