Optimization of parameters for semiempirical methods. III Extension of PM3 to Be, Mg, Zn, Ga, Ge, As, Se, Cd, In, Sn, Sb, Te, Hg, Tl, Pb, and Bi

Using a recently developed procedure for optimizing parameters for semiempirical methods,1 PM3 has been extended to a total of 28 elements. Average ΔHf errors for the newly parameterized elements are Be: 8.6, Mg: 8.4, Zn: 5.8, Ga: 14.9, Ge: 11.4, As: 8.5, Se: 11.1, Cd: 2.6, In: 11.3, Sn: 9.0, Sb: 13.7, Te: 11.3, Hg: 6.8, Tl: 6.5, Pb: 7.4, and Bi: 10.9 kcal/mol. For some elements the paucity of data has resulted in a method, which, while highly accurate, is likely to be only poorly predictive.


INTRODUCTION
Until recently, the rate determining step for obtaining parameters for semiempirical methods was determined by the time taken to optimize the parameters. When a new method was developed, parameters were normally only determined initially for four elements: H, C, N, and 0. A new automatic optimization procedure' has been developed which changes the rate determining step to the assembly of suitable experimental reference data. Using that procedure, parameters for 12 elements were simultaneously optimized. Average errors in heats of formation for 11 of the 12 elements are less' than those obtained using MNDO or AMl, AM1 aluminum being the exception.'3 Here, the PM3 method has been extended to include 13 main group metals and three transition metals. The transition metals all have filled d shells, allowing them to be considere'd using only a sp basis set. A comparison of the chemistries of these elements is presented here.
In principle, parameters for all elements available within a given method should be optimized simultaneously. In practice, serial optimization has hitherto proved necessary, for two reasons. First, the computational requirements have precluded the simultaneous optimization of parameters for many elements. Second, once a set of parameters had been made generally available, it was considered important4 that their value should not be changed until significant improvement has been made. Although the first obstacle to large-scale optimization of parameters has been overcome, the second reason for not optimizing all parameters for all elements simultaneously is still valid. For this reason, parameters for the original 12 elements reported earlier were held constant while the parameters for the 16 elements reported here were optimized.
For several metals the dearth of reference data resulted in an apparent paradox: the optimized parameters allow PM3 to accurately reproduce the reference data, but the parameters are so poorly defined that the method is likely to perform poorly when used as a predictive tool. Only when enough reference data are used to uniquely define the parameter set will the accuracy of prediction equal that of the set used in the optimization of the parameters.

METHOD
The basic technique used in optimizing the parameters for the elements reported here was similar to that used earlier. In the case of cadmium, the scarcity of experimental data precluded a unique definition of the minimum in parameter space. To allow cadmium to be studied, the Gaussian core-core terms were omitted from the theoretical framework of PM3.
The optimization was carried out in the following manner: parameters for each new element were optimized using all data available for that element, but excluding data involving the other 15 new elements. A subsquent optimization was then carried out for each element in which all compounds of that element, including compounds between the element being parameterized and one or more of the other new elements, were used. This optimization used as trial parameters for the new elements the optimized parameters which had been obtained as a result of the first step. The results of these optimizations are reported in Table 1 Table I. Optimized parameters for MNDO-PM3.

STRUCTURE OF TABLES
STEWART DIPOLE MOMENTS AND IONIZATION POTENTIALS As with the earlier work, the tables involved are quite large. Because of this, the tables have been structured to allow rapid location of any particular species. However, the large number of inorganic compounds made the Cox & Pilcher sequence25 used earlier unsuitable here. Instead, the location of a particular species within a table is determined by the following rules. Each new element is presented in order of increasing atomic number. Within each set, the order of appearance of species is the same as that in the JANAF Thermochemical This uses a modified Hill indexing system (J. Am. Chem. SOC., 22,478 (1900)], and is purely alphabetic, based on the empirical formula. Compounds involving two new elements are cited for both elements: thus, germanium telluride will appear under germanium and under tellurium; however, in the statistical analysis each compound is counted only once.

HEATS OF FORMATION
Computed and observed heats of formation are presented in Table 11. A summary of the average errors in AH, for all species studied is given in Table 111. In order to allow comparison with the fist 12 PM3 elements parameterized, average errors for these elements are also presented in Table 111. The averages reported in Table 111 are for all compounds reported here and in the earlierz7 work. Several faults in the earlier tables have been corrected, and about 100 more compounds have been surveyed; these additions and corrections are reflected in Table 111.
The average error in AH, for the newly parameterized elements is 9.6 kcal/mol, exactly the same as that for the fist set of 12 elements. Preliminary attempts to determine parameters for the remaining main group elements (i.e., the alkaline metals and alkaline earths) indicate that average errors for these elements are likely to be larger than those reported here.

STATISTICAL GEOMETRIES
A comparison of experimental and computed geometries is given in Table IV, and a statistical summary of the errors in bond lengths is given in Table  V. As with the AH,, comparison with the original 12 elements is useful; average errors in bond lengths for the original 12 elements are also given in Table  V. Average bond length and angle errors for all elements investigated are given in Table VI. With the exception of some magnesium and gallium compounds, most geometries are of useful accuracy.

Beryllium
Two possible geometries have been reported for dicyclopentadienylberyllium: a C,, y lo 2x and an y l-y' stru~ture.2~ PM3 predicts a symmetric DSh structure, although the beryllium atom is not rigidly held in place, the vibrational frequency for horizontal motion being only 126 cm-'.
Although all four magnesium dihalides are observed to be 1inear;O PM3 predicts the X-Mg-X angle to be 109.9" (fluoride), 154.9" (chloride), 165.8' (bromide) and 180.0' (iodide). An attempt was made to generate a set of magnesium parameters which would predict the observed angle; this was not successful, and further work is obviously needed. That a limited sp basis set should not only predict MgF, to be strongly bent, but also that it should remain bent despite efforts to make it linear, is unexpected, and may indicate a limitation in either the parameterization or the NDDO approximation.

Zinc
Most of zinc chemistry is relatively simple: zinc is almost always two coordinate, with the ligand-zincligand angle being 180". Some nonclassical structures do exist, however, examples being the pentahapto complexes involving cyclopentadienyl rings. Both PM3 and AM1 correctly predict the y5 structure of cyclopentadienylmethylzinc, and the y~ '-11, structure of bis(pentamethylcyclopentadienyl)zinc, Figure 1.

Gallium
Average PM3 AH, errors for gallium compounds are very large. No obvious reason for this is apparent, although faulty optimization or inaccurate experimental data are the prime suspects. same as for the set reported here. A recent71 X-ray structure of 2,2,5,5-tetramethyl-l,3-diselena-2-germacyclohexane has been published. This structure Germanium is qualitatively reproduced by PM3 (Fig. 2). In it, both germanium and selenium have organometallic and Of all metals reported here, the largest number of experimental reference data is available for germanium. It is likely that the predictive power of PM3 when applied to germanium compounds will be the intermetallic bonds. Of the compounds investigated, this had the most complicated structure.
The ground state of the germanium atom is incorrectly predicted by PM3 to be 4 .~~4~~. Attempts   Ref.    Ref.  The incorrect atomic configuration was preferred as the lesser of two evils.

Arsenic
With the exception of triethylarsine, the thermochemistry and steriochemistry of arsenic is predicted with satisfying accuracy. As with triethylphosphine and triethylstibine, the experimental AH, of triethylarsine in unexpectedly large (13.4 kcal/mol) particularly when compared to the trimethylarsine (2.8 kcal/mol). PM3 predicts the triethyl derivative to be 5.5 kcal/mol more stable than the trimethyl. If this compound is ignored, the av-erage error for arsenic compounds drops to 7.4 kcall mol.

Selenium
Along with tellurium, selenium forms the widest range of types of bonds, bonding to 12 different elements, ranging from the highly ionic, SeOF,, to the 100% covalent Se,. In addition, oxidation states from -2 (H,Se) to + 6 (SeF,) are reproduced.

Cadmium
Because of the paucity of experimental data on cadmium compounds, the number of reference data available used is very small. As a result, the ensuing    parameters allow the available data to be reproduced with unprecedented accuracy. An unfortunate consequence of the small number of reference data is that the parameter set could not be well defined, and it is highly likely that the predictive power of PM3 when applied to cadmium compounds will be very poor. Because of the small number of reference data, the Gaussian parameters for cadmium were omitted.

Indium
With the exception of the In-X-In angles for the oxide and selenide, all indium geometries are in good agreement with experiment. PM3 predicts the In-X-In angles to be 180". This fault does not occur in In'Te, but does occur in Ga,O.

Tellurium
Tellurium exhibits oxidation states of -2 , 0, 1, 2 , 3, 4, 5, and 6. Three hypervalent compounds are represented here: TeO,, TeF,, and TezFlo. Te2F,o is predicted as having a DM structure with an unusually long Te-Te bond (3.18 A). The total bonding between the two TeF, groups is quite large: 0.943, composed of a bond3' of order 0.690 to the other tellurium, four bonds of order 0.053 to the nearer fluorine atoms, and long distance bonds of order 0.039 to the distant axial fluorine.
A H f ( g ) of PbC1, as -132.0 ? 20 kcal/mol,"j while the National Bureau of Standards r e p~r t s "~ a value of -78.7 kcal/mol for the AH,- (1). As heats of vaporization are always positive, 'the AH,( g ) must be more positive than -78.7 kcal/mol. PM3 predicts the AH,(g) of PbC1, to be -61.8 Kcal/mol. This suggests that the NBS value is likely to be more accurate than the JANAF value.

Lead
An inconsistency in the reported value of the AH, of PbC1, has been found. The JANAF tables give the       Ref.

Bismuth
As with the phosphorus, arsenic, and antimony analogues, triethylbismuth is predicted to have a heat of formation much lower than that observed. Since the AHf of both the trimethyl and triphenyl derivatives are accurately reproduced, it is likely that the experimental value for the AH, of the triethyl derivative is incorrect.

General
The spectrum of errors in AHf range from 39.5 (InO) to -37.8 kcal/mol (Sb3+); this is much less than that however, in Group V that is true only for nitrogen (Table VII). In the case of phosphorus, experimental data are available for all the methyl and ethyl phosphines. From these, Figure 3, we see that the anomaly only appears in the case of triethylphosphine. If this effect is genuine, and is also found in the chem- istries of arsenic, antimony and bismuth, then a theoretical explanation must be sought. At the PM3 level no such phenomenon is predicted.

Atoms
Isolated atoms are used in semiempirical methods in the definition of the zero of energy. Consequently, their AHf should be independent of the parameters. In two instances, however, (A1 and Ge) the error in the AH, of isolated atoms is finite. For those atoms, the ground state of the atom, as calculated by PM3, is different from that observed experimentally. Within the optimization, the option exists to force the correct electronic configuration. However, when this was done, the overall SSQ became considerably larger than when an incorrect atomic electronic configuration was allowed. Instead of forcing the correct configuration, the atoms involved were given the default weight, and the parameters optimized. In the case of germanium the atomic configuration predicted by PM3 is 4s14p", i.e., the same as in normal germanium chemistry. Figure 2. X-ray and PM3 geometries for 2,2,5,5,tetramethyl-l,3-diselena-2-germacyclohexane (A: PM3 geometry B: X-ray geometry").

Predictive Power of MNDO/PM3
When the original 12 elements were parameterized, only about a third of the data reported27 was used in the optimization calculation, the remainder being used in the surveys only. If a surveyed molecule was badly predicted, it was then added to the optimization set. This meant that, although a large number of data were "predicted," the manner in which the optimization was done predisposed the method to be as accurate in prediction as it was in reproducing the set used in the optimization. When parameters are being optimized there does not appear to be any way in which the resulting method can be demon-  strated to have predictive power: if a trial method did not have predictive power in that poor predictions were made, then the method could be improved by making use of that poor prediction. An optimum method would thus implicitly involve all available data.

Experimental AHf of Phosphines
In extending the method to new elements the predictive power of the first set of elements becomes apparent. If the method was not predictive, then the errors for the next set of elements, even when fully optimized parameters were used, would be much larger than for the first set.
Examination of Table 111 shows that the predictive power of PM3 when applied to the first 12 elements is virtually the same as the accuracy: errors in AHf for seven elements became slightly larger, four became slightly smaller, and one was unchanged; similar effects are observed for the geometries, Tables V and VI. Obviously none of the compounds reported here could have been used in determining the parameters for the first set of 12 elements. Therefore it follows that the predictive accuracy of the first set of 12 elements is similar to the accuracy reported in the surveys. Differences between these tables and those in reported earlierz7 are due to various errors in the original work being corrected and, in the case of AM1, to new parameters being made available. No mixed parameter sets were involved in this work. A systematic error was found in the earlier which is corrected in Table V presented here.
The predictive accuracy of PM3 for the set of elements reported here is likely to be less than that of the surveys given here for cadmium and bismuth, but likely to be much better for tin, selenium, and germanium. Of course, the predictive power of the current set of elements could be determined by ex-tending PM3 to compounds containing elements which have not yet parameterized.
At the present time there is no easy way to partition errors between those resulting from theoretical limitations in the computational method and those from experimental inaccuracies. However, Computational errors tend to be systematic, whereas experimental errors are random. By making post hoc corrections to the computed values, very high accuracy results can be obtained. Thus for the n-alkanes with more than three carbon atoms, a correction to the PM3 values of the AH, of -3.5 + 0.56 times the number of carbon atoms results in predicted AH, within 0.1 kcal/mol of experiment.

CONCLUSION
The PM3 method has been extended to a total of 28 elements with only minor loss of accuracy in geometry prediction. PM3 is based upon the NDDO% approximation, as in MNDO and AM1. In PM3, the NDDO approximation has been shown to be extremely robust, able to represent the chemistry of a large number of elements involved in a wide spectrum of bonds, ranging from purely covalent to highly ionic to the nonclassical bonds found in the cyclopentadienyl complexes to dative or donor-acceptor bonds.
In addition, PM3 has been shown that the NDDO approximation can accommodate a wide range of oxidation states, from -3 in, e.g., ammonia to + 6 in such hypervalent systems as H2S04, H3P04, and TeF, .
The range of errors in AHf is smaller than that found for the organic elements. This may be due to the limited number of reference data available; if so, then, as more data become available, highly inaccurate predictions will be made. In particular, for Cd and Bi, the lack of data has resulted in a spuriously high accuracy, an accuracy which will certainly drop as more data are generated.
The parameters for the sixteen elements reported here were optimized simultaneously, but not in one large optimization of the type used for the fist twelve elements. Each parameter set was optimized individually, using the previously optimized parameters and the current set of parameters for the other elements reported here. Although advocated earlier, a general optimization of all parameters for all elements being studied proved inefficient relative to the cyclic optimization used here. The main limitation of a general optimization was that a single faulty data-set could rapidly corrupt parameters for many elements, a disaster which would be avoided when serial or cyclic optimization was done. Irrespective of which scheme is used, the final parameters should be the same.