Excitation Energies of Canonical Nucleobases Computed by Multiconfigurational Perturbation Theories

In this computational work, we assessed the performance of ab initio multireference (MR) methods for the calculation of vertical excitation energies of five nucleobases: adenine, guanine, cytosine, thymine and uracil. In total, we have studied 38 singlet and 30 triplet excited states. Where possible we used the multireference configuration interaction (MRCI) method as a reference for various flavors of multireference perturbation theory to second order. In particular, we have benchmarked CASPT2, NEVPT2 and XMCQDPT2. For CASPT2, we have analyzed the single‐state, multistate (MS) and extended MS variants. In addition, we have assessed the effect of the ionization potential electron affinity (IPEA) shift. For NEVPT2, we have used the partially and the strongly contracted variants. Further, we have tested the commonly used RI‐CC2, RI‐ADC2 and EOM‐CCSD methods. Generally, we observe the following trends for singlet excited states: NEVPT2 is the closest MR method to MRCISD+Q, closely followed by CASPT2 with the default IPEA shift. The same trend is observed for triplet states, although NEVPT2 and CASPT2‐IPEA are getting closer. Interestingly, the n, π* singlet excited states were described more accurately than π, π* excited states, while for triplet states the trend is inverted except for NEVPT2. This work is an important benchmark for future photochemical investigations.


INTRODUCTION
The five canonical nucleobases (NBs) (adenine, cytosine, guanine, thymine and uracil) (Fig. 1) are the essential building blocks of DNA and RNA. They absorb UV radiation below 300 nm, which can trigger a photochemical process that, eventually, can lead to structural damage. Fortunately, the NBs have very high photostability due to efficient ultrafast radiationless relaxation mechanisms to the ground state that prevents destructive photochemical reactions (1). The manifestation of these protective mechanisms is found experimentally in ultrashort excitedstate lifetimes. To better understand these mechanisms, their photophysics and photochemistry have been the subject of extensive theoretical studies that have been recently summarized in a review (2). The molecular basis of the photoprotection mechanisms, attributed to natural selection and evolution (3), can be explained by the presence of conical intersections (CIs) that are topological features of potential energy surfaces (PESs) that allow radiationless decay. It is important to emphasize that the existence of CIs does not guarantee a radiationless decay, because the presence of an energy barrier on the PES can prevent the access of a CI. However, if an efficient radiationless decay is feasible, it eliminates harmful photochemical processes that could lead to DNA photolesions.
The accurate description of the photochemical pathway starts with the NB being promoted from the ground state to an excited-state PES. Hence, a precise calculation of the relative position of the excited states and oscillator strengths is required for understanding the primary photochemical events in this important class of biomolecules. Therefore, the vertical excitation energies of the five NBs have been assessed for a broad range of electronic structure methods (4)(5)(6)(7). In addition, computational studies require the localization of stationary points on the PESs and the determination of relaxation pathways connecting the critical points and passing through regions of degeneracy between singlet states, or states of the same spin multiplicity in general, the so-called conical intersections. Recent studies suggest that also valence excited triplet states take part in the photochemical mechanisms of NBs (8)(9)(10). Therefore, the relative position of triplet states must be considered as well in the calculations. Through the intersystem crossings, regions of degeneracies between states of different spin multiplicities (that is singlet and triplet states), they might be also involved in the photochemistry. Hence, the description is highly demanding and requires quantum chemical methods able to recover not only significant amounts of static and dynamic electron correlation, but also to describe PESs near degeneracies (11). These requirements are met by multireference methods (MR). In contrast to single-reference methods (SR) that rely on a single reference function, multireference methods are inherently more flexible as they are based on a multiconfigurational wave function. These multiconfigurational methods allow a balanced description of the valence excited states of different nature, for example p, p*and n, p* electronic states, PESs, state degeneracies and minimum energy paths connecting the most relevant structures (12)(13)(14).
Multireference methods can be divided into three major categories: (i) multireference perturbation theory (MRPT) (15), (ii) multireference configuration interaction (MRCI) (11) and (iii) multireference coupled-cluster (16) methods. Despite significant progress in methodological development (17,18) and substantial hardware acceleration, the MRPT family of methods including perturbation up to second order (MRPT2) is dominating the computational photochemistry studies of NBs due to its accuracy at a favorable computational cost. A fourth category was put forward by Li Manni et al. (19), who proposed to add some amount of density functional correlation to the CASSCF (20) (complete active-space self-consistent field) wave function. This method is called multiconfiguration pair-density functional theory. An alternative approach for combining CASSCF wave functions and DFT theory was presented recently by Hubert et al. (21) (long-range CASSCF short-range DFT), with promising results for computing excitation energies of NBs. However, none of these recent methods is available in commonly available packages yet.
The complete active-space second-order perturbation theory (CASPT2) (20,(22)(23)(24) is the most widely used MRPT2 method and considered as the most successful method for computing excited state properties of medium to large system sizes, including NBs. The so-called CASPT2//CASSCF protocol has been applied successfully to investigate the photochemistry of canonical NBs and derivatives, as reviewed recently by several authors (25)(26)(27). The CASPT2 method is based on a zeroth-order CASSCF reference wave function. Examples of the early applications of the CASPT2 method to investigate the spectroscopy of organic (including NBs) and inorganic compounds are documented in the literature (24,28,29), from which it becomes apparent that the accuracy of the computed excitation energies is 0.1-0.2 eV, at a moderate computational cost. The original formulation of the CASPT2 theory (hereafter called single-state CASPT2, SS-CASPT2) uses internally contracted configuration spaces spanning the first-order interaction space of the reference wave function exactly. The advantage is that large CASSCF configuration expansions can be used as reference wave functions. The disadvantage is that internally contracted configurations are state-specific and introduce errors, for instance, in the region of avoided crossings and when valence-Rydberg mixing occurs at the CASSCF level. The solution proposed by Roos and coworkers (30) to deal with these problematic situations involves the coupling of several electronic states at second order via an effective Hamiltonian approach, known as the multistate (MS) CASPT2 method. The presence of intruder states is another frequent problem encountered during CASPT2 calculations, which is remedied with the imaginary level-shift technique proposed by Forsberg and Malmqvist (31). Another drawback of the CASPT2 method is the unbalanced description of closed-shell systems in comparison with unpaired electrons (32). To overcome this problem, Ghigo et al. (33) proposed a modified definition of the zeroth-order Hamiltonian employed in the CASPT2 method, coined as the ionization potential electron affinity (IPEA) shift. The recommended value for the correction was optimized with a benchmark set of 49 diatomic molecules. The computations were performed with the SS-CASPT2 variant, ANO-RCC basis sets and including scalar relativistic effects via the Douglas-Kroll Hamiltonian. Based on the fitting of the benchmark against experimental quantities, they obtained a value of 0.25 Hartree for the shift. This IPEA shift (0.25 Hartree) became the default value even if the studied molecule is very different from those included in the benchmark set. Recently, the generalized use of this IPEA parameter has been questioned in several studies (34)(35)(36).
The n-electron valence state perturbation theory (NEVPT), developed by Angeli, Cimiraglia and collaborators (37)(38)(39)(40)(41)(42)(43), is another multiconfigurational perturbation theory based on a zeroth-order CASSCF wave function. The main difference between the CASPT and NEVPT is the choice of the zeroth-order Hamiltonian, which is the bielectronic Dyall Hamiltonian (44) in the case of NEVPT. This choice of the zeroth-order Hamiltonian provides several advantages: (i) It is size consistent and (ii) free from intruder states. Therefore, there are less parameters and shifts to be used in comparison with the CASPT2 method. The second-order NEVPT treatment can recover the most relevant portion of the dynamic electron correlation effects and is abbreviated by the NEVPT2 acronym.
Besides the approaches mentioned above, the XMCQDPT2 (extended multiconfiguration quasi-degenerate perturbation theory) has been developed by Granovsky (45). The XMCDQPT2 theory employs a zeroth-order Hamiltonian (H 0 ) invariant with respect to unitary transformation of the model space, in contrast to the one considered in the MS-CASPT2 theory. Due to the choice of H 0 , the XMCQDPT2 method is also invariant and the perturbations are taken as the true two-particle operator. As proposed by Granovsky, the XMCQDPT2 is equivalent to the MR-MP2 theory for a SR case, stable with respect to the model space, and does not include artificially large off-diagonal elements, making it more efficient for the treatment of conical intersections and avoided crossing regions than the above-mentioned methods. In XMCQDPT2, intruder state problems can be handled with the intruder state avoidance (ISA) technique (46). It is interesting to point out that with a proper selection of the shift parameter the ISA and imaginary shift technique can be identical. Recently, this extension was adapted to the CASPT2 approach by Werner and coworkers (47). The resulting extended multistate (XMS) CASPT2 method was recently implemented in MOLCAS 8 by Malmqvist. The main difference of the XMS-CASPT2 method with respect to the XMCQDPT2 method is the internal contraction, preserving some of the later properties as, for instance, the invariance under unitary transformations of the model space; in the case of the former method, intruder states can be treated with the imaginary level-shift technique proposed by Forsberg and Malmqvist (31). In addition to the well-established MRPT2 methods, there has been recent development of new emerging approaches to add dynamic correlation effects to multiconfigurational wave functions. Among them are the Van Vleck perturbation theory (48), the canonical transformation theory (49,50), the GASPT2 (second-order perturbation theory for generalized active-space selfconsistent-field wave functions) (51), the SplitGAS (52,53) and new developments of state-specific MRPT (54). In addition, efficient approximations have been introduced to reduce the computational cost and provide favorable scalability, for example the frozen natural orbital complete active-space second-order perturbation theory method (FNO-CASPT2) (55) and the domainbased local pair natural orbital theory applied to the NEVPT2 method (56).
There is a growing interest in the assessment of multireference methods for an accurate description of excited-state energies in organic molecules, including DNA and RNA building blocks. The work by Thiel and coworkers (4) has been considered as a reference work. In that paper, the singlet and triplet excitation energies and one-electron properties of a set of organic molecules were computed at CASPT2 level and a hierarchy of coupled-cluster methods (CC2, CCSD and CC3). The most accurate method in this benchmark is CC3, an iterative coupled-cluster approach including connected triples (57), which was used as reference instead of experimental values. The choice is motivated by the fact that most spectra are recorded in solution, that different peaks might overlap and that vertical excitation energies do not correspond to absorption maxima, making it difficult to directly compare experimental values to the ones obtained with quantum chemical methods. However, the CC3 method is too expensive for the calculation of NBs. Hence, the benchmark study by Thiel and coworkers (4) is based on values obtained at the CASPT2 level, including only singlet states. Recently, one of the present authors (6) benchmarked the NEVPT2 method for the calculation of excitation energies of the same set of molecules. However, it is not clear how the different MRPT2 methods compare among themselves, because a reference value is missing. To resolve this problem, we have used internally contracted multireference configuration interaction singles and doubles (MRCISD) with the Davidson correction (+Q) as a reference, if feasible. The MRCISD+Q method is based on the same reference wave function as the MRPT2 methods but is expected to be the most accurate method in this work (58). This makes it an ideal reference to assess which is the most accurate MRPT2 method for NBs. The focus of our present work is to compare the performance of the above-mentioned MRPT2 methods: SS-CASPT2 and MS-CASPT2 with and without IPEA shift correction, XMS-CASPT2, NEVPT2 and XMCQDPT2 methods for the calculation of vertical excitation energies and oscillator strengths of all five NBs. To ensure compatibility, we employ the same geometry and basis sets in all cases, allowing us to derive our own set of reference data, without resorting to experimental results. In addition, we have extended the set of excitation energies by computing triplet excited states.
To complement the multireference methods, we have also computed the excitation energies using accurate SR methods that are frequently used in computational photochemistry studies. In particular, we have tested the second-order approximate coupledcluster (59) (CC2) and algebraic diagrammatic construction to second-order (60,61) (ADC(2)) methods, owing to high accuracy at low computational demand, which holds especially for their resolution-of-the-identity (RI) variants RI-CC2 (62,63) and RI-ADC(2) (64). The results obtained with these methods will therefore also be compared.

METHODOLOGY
All calculations in this work were carried out consistently with the benchmark reported by Thiel and coworkers (4), that is with the fully optimized contracted Gaussian basis sets of triple-f valence quality (TZVP), developed by Ahlrichs and coworkers (65), and geometries optimized at MP2/6-31G* level of theory with frozen core orbitals. The geometry of guanine is missing in the Thiel benchmark; therefore, we have optimized it at the same level of theory employing ORCA software (66). The coordinates for this optimized geometry are given in Table S1. The five NBs included in this work are in their most relevant biological form. The C s point group symmetry was imposed in all calculations, computing each multiplicity and symmetry block separately, for consistency with previous results (4); in the C s point group symmetry, the pp Ã states belong to the A 0 irreducible representation and the np Ã to the A 00 irreducible representation. The CASSCF wave functions were computed employing the state-averaged procedure for computing several states of the same spatial and spin symmetry at the same time. The excitation energies for the n; p Ã -type states were obtained using the ground-state energy computed with the same active space employed for the calculation of the corresponding excited states.
Calculation of excitation energies. The singlet excitation energies were computed employing the same active space and number of states reported by Thiel and coworkers (4), except for guanine which was not considered in the original work. In the case of guanine, the active space encompasses all p and p Ã orbitals, as well as the n orbitals, including in the calculation a total of five 1 A 0 ( 1 ðp; p Ã Þ) and four 1 A 00 ( 1 ðn; p Ã Þ) states. As to the triplets, three 3 A 0 ( 3 ðp; p Ã Þ) and three 3 A 00 ( 3 ðn; p Ã Þ) excited states were computed, with the same active space used for the calculations of the singlet excited states. The employed active spaces for all NBs were visualized combining grid viewer (GV) (67) and POV-Ray (68). Further details can be found in the corresponding sections.
Internally contracted MRCI calculations (69)(70)(71) were carried out with the MOLPRO 2015.1 software (72,73), with the default values. The relaxed Davidson correction is employed to obtain the MRCISD+Q energies (74). Note that MRCI calculations were performed for the pyrimidine NBs (i.e. cytosine, thymine and uracil) only, because the active spaces employed for the purine NBs (i.e. adenine and guanine) resulted in very large configuration interaction expansions.
CASPT2 excitation energies were computed for single-state (SS-CASPT2) and MS (MS-CASPT2) treatments, with and without the default IPEA (33) shift correction (0.25 and 0.00 Hartree, hereafter coined as IPEA(0.25) and IPEA(0.00), respectively). Intruder states were treated with an imaginary level-shift correction (31) of 0.2 atomic units (au). The use of the imaginary level shift is the recommended approach in MOLCAS. However, in contrast Thiel and coworkers employed the real level-shift technique in the original benchmark which might lead to small differences in the excitation energies between the original benchmark and the present work. The extended MS (XMS-CASPT2) calculations were carried out without IPEA shift correction (IPEA = 0.00), but with an imaginary level shift of 0.2 au. The CASPT2 calculations were performed with the MOLCAS8 software (75) and the XMS-CASPT2 calculations were carried out with a developers' version of the same program.
NEVPT2 calculations were carried out with the ORCA software (66). Although different spatial and/or spin symmetry blocks can be averaged with ORCA, they were computed separately, as described above, to guarantee full compatibility with the results reported by Thiel and coworkers (4) and other results obtained in this contribution. It is also worth mentioning that the results reported here based on ORCA calculations refer to the strongly contracted (SC) NEVPT2 variant (38,39); all calculations were made without truncation of the three-and four-body density matrices (nef d3tpre ¼ 0:00 and nev tpre ¼ 0:00 options in ORCA). Doubly occupied and virtual orbital energies in the Dyall Hamiltonian (44) were obtained with a state-specific Fock operator (nev_canonstep 1 option in ORCA). Furthermore, the CASSCF energy and orbital gradient convergence thresholds were set to 10 À9 and 10 À5 Hartree, respectively, and the augmented Hessian Newton-Raphson step threshold set to 10 À9 Hartree; as stated previously, the NEVPT2 approach is free from intruder states problems, so no level-shift technique is required. Also partially contracted (PC)-NEVPT2 excitation energies were computed employing MOLPRO. The lower degree of contraction should, in principle, yield more accurate energies than the SC variant. To ensure consistency, we have compared the SC-NEVPT2 excitation energies from MOLPRO and ORCA that were found to be identical. However, we have encountered some difficulties for the calculation of the reference energy for the excitation energies of ðn; p Ã Þ states. These calculations required the computation of the ground-state energy with the same active space employed for computing the energies of the ðn; p Ã Þ excited states, that is with the inclusion of ntype orbitals in the active space, which is difficult because their occupation number was close to 2.0. For this purpose, we have frozen one or two orbitals in the active space in the MOLPRO calculations.
XMCQDPT2 energies were computed using FIREFLY software (version 8.0.1, build number 8540) (76), which is partially based on the GAMESS (US) source code (77). In these calculations, the ISA shift was set to 0.02 au to avoid intruder states. The CASSCF step was carried out employing the graphical unitary group CSF (configuration state function) approach, considering the calculation converged when the maximum permissible asymmetry in the Lagrangian matrix was less than 1.0 9 10 À5 au and the energy change smaller than 1.0 9 10 À8 atomic units. The XMCQDPT2 calculations were carried out employing the recommended thresholds for numerical cutoffs. Inclusion of n-type orbitals in the active space for the ground-state calculations leads to orbital rotation as reported above for the ORCA calculations. In order to obtain this energy, we have decided to approximate the state-specific solution with an extrapolation over a series of state-averaged solutions without using point group symmetry. For uracil, thymine and cytosine, the S 0 , S 1 , S 2 and S 3 and for guanine the S 0 , S 6 , S 7 and S 8 states were initially taken with equal weights, and then, the weights of the additional excited states were successively decreased by a factor of two. The series of the obtained CASSCF and XMCQDPT2 energies were quadratically extrapolated to a weight of zero for the excited states (Figs. S1 and S2). In a similar manner, but with employing C s point group symmetry, we obtained the energies for the three lowest lying 3 A 00 states of guanine, but only one additional state, that is 4 1 (np*), was taken into account ( Fig. S3).
Linear-response RI-CC2 (59,62,63) and RI-ADC2 (60,61,64) calculations were performed with the Turbomole 7.0 software (78,79), with frozen core orbitals, and the default TZVP auxiliary basis sets. The SCF convergence criterion was set to 1.0 9 10 À8 au; otherwise, default values were used. The reported excitation energies are in general agreement, that is AE0.01 eV, with previously published results for ADC(2) and CC2 calculations using different programs without RI-approximation, see Refs. (7) and (80), respectively. The only larger deviations are found for the ADC(2) calculations of the 4 1 (n, p*) states for uracil and thymine with deviations of 0.05 and 0.06 eV, respectively. EOM-CCSD calculations were realized with Gaussian 09 (81) using the default values for convergence criteria. It is known that this method yields the same excitation energies as linear-response CCSD calculations, whereas differences in the oscillator strength between these methods exist (82). In the following, we only report the excited states of the SR methods that mainly exhibit the right excitation character, that is p; p Ã for A 0 symmetry and n; p Ã for A 00 .
Although multireference calculations can take advantage of computational procedures for speeding up integral calculations, as for instance the RI (resolution of identity) and Cholesky decomposition, they were not employed in this work. However, the effect of such approximations in multiconfigurational calculations carried out with MOLCAS can be verified in the results published in the literature (83,84), which indicate that in the case of Cholesky decomposition with the default decomposition threshold (10 À4 au) the mean error is about 0.001 eV, and smaller for larger basis sets.
Calculation of oscillator strengths. In general, the oscillator strength (f) in dipole form is given as: (85) This equation shows the two components of the oscillator strength: DE ij is the excitation energy for a transition from state i to j in atomic units and hw i jljw j i is the electronic transition dipole moment in the length representation. It is not possible for all methods to calculate the oscillator strength at the same level of theory as the excitation energies because the transition densities might not be available, in particular for MRPT2 methods. Hence, in some cases the oscillator strength is computed with excitation energies at one level of theory and transition dipole moments at a lower level of theory.
For the SR methods CC2, ADC(2) and EOM-CCSD, the oscillator strengths were computed from all entities at the same level of theory. This also holds for the MRCISD calculations in MOLPRO. However, the Davidson correction (+Q) is only applied to the energies and is not affecting the wave function; hence, it is not considered in the calculation of transition densities. At the SS-CASPT2 level, the oscillator strengths were computed with CASSCF transition densities and the corresponding SS-CASPT2 excitation energies. The same procedure is also applied to calculate NEVPT2 oscillator strengths. However, at the MS-CASPT2 level, the oscillator strengths were obtained by combining the electronic transition dipole moments computed with the perturbatively modified wave function and the MS-CASPT2 energies, using the CAS state interaction (CASSI) method (86). The XMS-CASPT2 as well as the XMCDPT2 oscillator strengths are computed following the same principle.

Excitation energies
Uracil. The discussion starts with the uracil pyrimidine NB. Its active space comprises all p-orbitals, that is 10 electrons in 8 orbitals for p ! p Ã excitations. In addition two occupied lone pairs, n-type orbitals, localized on the oxygen atoms are taken into account for description of the n; p Ã states, resulting in an active space of 14 electrons in 10 orbitals (CAS(14,10)) ( Fig. 2).
Uracil's excitation energies are listed in Table 1. Starting with the 1 ðp; p Ã Þ states, it is observed that the relaxed Davidson correction leads to a lowering of the excitation energies when going from MRCISD to MRCISD+Q. For the 3 ðp; p Ã Þ states, this correction increases the excitation energies, whereas it decreases them again for both singlet and triplet n; p Ã -type states. In the following, the MRCISD+Q values are taken as the reference.
Strongly contracted NEVPT2 underestimates excitation energies for the 1 ðp; p Ã Þ states, but it is the best performing multireference method for these type of excitations. Surprisingly, the second best method is the PC NEVPT2 with slightly lower excitation energies. For the 3 ðp; p Ã Þ states, excitation energies are overestimated for both NEVPT2 methods and they perform rather similarly. In contrast, for the electronic states with n; p Ã character, the results for the strongly contracted variant scatter around the reference values, so on average the method does neither overestimate nor underestimate the excitation energies, whereas the PC method underestimates excitation energies. Finally, the maximum absolute deviation of 0.42 eV found with strongly contracted NEVPT2 for uracil is the smallest of all multireference methods.
Turning to the CASPT2 results, it is worth mentioning that the difference between SS and MS calculations is relatively small with excitation energies differing less than 0.2 eV. However, the use of the IPEA shift leads to an upward shift of excitation energies of at least 0.23 eV and up to 0.63 eV. The IPEA(0.00) excitation energies are found to be too low for all types of excited states investigated, and therefore, the use of the IPEA shift increases agreement with the reference value. In the case of 3 ðp; p Ã Þ states, the IPEA(0.25) CASPT2 calculations scatter around the reference values, resulting in the most accurate results for these excited states. For the other excited states, that is 1 ðp; p Ã Þ, 1 ðn; p Ã Þ and 3 ðn; p Ã Þ, the excited-state energies are still too low.
The performance of XMS-CASPT2 and XMCQDPT2 is close to each other. In the case of 1 ðp; p Ã Þ states, their excitation energies are similar to the IPEA(0.00) CASPT2 results with no clear trend and similar deviations from the reference values. The excitation energies for the 3 ðp; p Ã Þ states are systematically higher than for the IPEA(0.00) CASPT2 calculations and their deviations from the reference values are slightly smaller than for NEVPT2. However, for the excited states with n; p Ã character, the two methods severely underestimate the excitation energies by at least 0.62 eV and their mean absolute deviations of around 1.00 eV are the highest of all employed methods.
For the SR methods, we find that EOM-CCSD yields excitation energies for 1 ðp; p Ã Þ and 1 ðn; p Ã Þ states that are the closest to the reference values and they spread in both directions. The deviations of its excitation energies for the 3 ðp; p Ã Þ states are similar to those of IPEA(0.00) CASPT2 and for the 3 ðn; p Ã Þ states, the excitation energies scatter more widely around the reference values than for strongly contracted NEVPT2, whereas the mean deviation is similar. For RI-CC2, the deviations for p; p Ã excitation energies are smaller than for n; p Ã . For the former, the differences to the reference are even slightly smaller than for strongly contracted NEVPT2, whereas for the latter, its performance is placed between IPEA(0.25) and IPEA(0.00) CASPT2 calculations. This trend of different performance for different transitions also holds for RI-ADC(2), whereas its general accuracy is lower than that observed with the RI-CC2 method. For 3 ðp; p Ã Þ, 1 ðn; p Ã Þ and 3 ðn; p Ã Þ states, its performance is close to the IPEA(0.00) CASPT2 results; for 1 ðp; p Ã Þ, it is even slightly closer to the reference values than the IPEA(0.25) CASPT2 calculations. Note that the mean deviation of this method for the 3 ðp; p Ã Þ states is similar to the ones for EOM-CCSD. Overall and to sum up the discussion for uracil, NEVPT2 is the best performing multireference method and there is a clear trend in accuracy for the SR methods: EOM-CCSD > RI-CC2 > RI-ADC(2) with the exception of 1 ðp; p Ã Þ states for which EOM-CCSD performs only as good as RI-ADC (2).
Regarding the state ordering, all methods agree that the lowest singlet state is of n; p Ã character, followed by a p; p Ã -type bright state. However, the difference in energies for the lowest excited state of both symmetries depends strongly on the method used for the calculation and ranges from 0.1 eV for SS-and MS-CASPT2 without IPEA shift to 0.85 eV for XMS-CASPT2 and XMCQDPT2. For MRCISD+Q, this difference is 0.71 eV. It is best reproduced by XMS-CASPT2 and XMCQDPT2 for multireference methods and for SR methods by RI-ADC(2) followed by RI-CC2 and EOM-CCSD. For the other methods, that is SSand MS-CASPT2 with and without IPEA shift and NEVPT2 in both variants, this difference is underestimated. In addition, the lowest lying excited triplet state of both symmetries exhibits excitation energies that are lower than the first bright transition for all methods employed, but the energy difference also depends on the actual method. Therefore, the various benchmarked methods agree in the ordering of the excited states, but energy differences between the states vary strongly, which might influence the computational studies of the ultrafast deactivation mechanism.
Thymine. The replacement of uracil's hydrogen that is bound to the carbon atom in position five of the ring by a methyl group results in the thymine NB (see Fig. 1). Its active space is similar to uracil, with one additional occupied p-type orbital, that is 12 electrons in 9 orbitals for pp Ã and 16 electrons in 11 orbitals for np Ã -transitions (Fig. 3).
As can be seen in Table 2, its excitation energies are similar to uracil. In general, singlet and triplet excitation energies for n; p Ã states are slightly higher compared to uracil, while the opposite is observed for the 3 ðp; p Ã Þ states, whereas no clear-cut trend is found for the 1 ðp; p Ã Þ states. As for uracil, the Davidson correction for the MRCISD calculation leads to higher excitation energies for the 3 ðp; p Ã Þ states and otherwise lowers them. Again, strongly contracted NEVPT2 is the best performing multireference method for the 1 ðp; p Ã Þ, 1 ðn; p Ã Þ and 3 ðn; p Ã Þ states; the influence of the IPEA shift is larger than the use of SS-or MS-CASPT2 calculations; IPEA(0.25) CASPT2 yields the smallest deviations for the 3 A 0 states; XMS-CASPT2 and XMCQDPT2 perform similarly by underestimating excitation energies for n; p Ã states significantly; the accuracy of the SR methods decreases from EOM-CCSD over RI-CC2 to RI-ADC(2) with the exception of 3 ðp; p Ã Þ for which RI-CC2 is the closest to the reference and EOM-CCSD and RI-ADC(2) perform about equally well. Also the state ordering is similar to uracil, that is the fact that a dark 1 ðn; p Ã Þ state lies below the first bright 1 ðp; p Ã Þ transition holds for most of the methods, with the exceptions of IPEA(0.00) MS- and SS-CASPT2 and PC NEVPT2 and that the lowest lying triplet state of both symmetries exhibits lower excitation energies than the first bright singlet transition. Overall, performance of the methods with regard to the MRCISD+Q reference and the ordering of the lowest energy states are comparable to that observed in uracil.
Cytosine. Cytosine is related to uracil by substitution of the oxygen atom in position four with an amino group and adding a double bond between the carbon atom connected to this group and the nitrogen atom in position 4. It has the same number of double bonds and occupied p-orbitals as uracil and, therefore, their p-systems have the same size (10 electrons in 8 orbitals) (Fig. 4). For describing the n; p Ã excited states, the lone pairs belonging to the oxygen and nitrogen atoms have to be taken into account resulting in an active space of 14 electrons in 10 orbitals, which is the same size as the corresponding active space for uracil.
The computed excitation energies for cytosine are displayed in Table 3. Going from the MRCISD to the MRCISD+Q excitation energies, we observe that the different classes of cytosine excited states exhibit the same trends reported previously for uracil and Table 1. Vertical excitation energies DE (eV) for the lowest lying excited states of uracil obtained at the ground-state MP2/6-31G* geometry and TZVP basis sets. The MRCISD+Q energies are taken as the reference values (in bold). Highest root included the following: five 1 A 0 ( 1 ðpp Ã Þ), four 1 A 00 ( 1 ðn; p Ã Þ), three 3 A 0 ( 3 ðpp Ã Þ) and three 3   thymine. Also the performance of various methods for the different classes of transitions follows roughly the same general findings as for uracil and thymine. Strongly contracted NEVPT2 is the multireference method for the 1 ðp; p Ã Þ states that performs best, but for cytosine it is similar to the IPEA(0.25) CASPT2 calculations for the other states. The PC NEVPT2 method is the multireference approach closest to the MRCISD+Q reference for the triplet states of both symmetries. The difference between SSand MS-CASPT2 calculations is still small compared to the influence of the IPEA shift, but it is more pronounced for cytosine than for uracil and thymine. The IPEA shift now alleviates the problem of the underestimated excitation energies for the 3 ðp; p Ã Þ, 1 ðn; p Ã Þ and 3 ðn; p Ã Þ states leading to smaller deviations from the reference values for the n; p Ã transitions. However, excitation energies of IPEA(0.25) CASPT2 are still systematically too low for the 1 ðp; p Ã Þ states. The XMS-CASPT2 and XMCQDPT2 methods give results similar to each other, again, performing slightly better for p; p Ã transitions than the IPEA(0.00) CASPT2 methods, but significantly worse for the n; p Ã transitions underestimating these excitation energies by around 0.85 eV.
Referring to the SR methods, the same general trend in accuracy observed previously is reproduced for cytosine. The EOM-CCSD method is the best performing SR approach for the 1 ðp; p Ã Þ, 1 ðn; p Ã Þ and 3 ðn; p Ã Þ states; the RI-CC2 method is about as accurate as strongly contracted NEVPT2 for the pp Ã transitions and is placed between IPEA(0.25) and IPEA(0.00) CASPT2 for n; p Ã excitation energies. As for the RI-ADC(2) method, the deviations observed for the 1 ðn; p Ã Þ and 3 ðn; p Ã Þ excitation energies are slightly worse than those obtained with the IPEA(0.00) CASPT2 method, but better than those computed with XMS-CASPT2 and XMCQDPT2. Furthermore, the 1 ðp; p Ã Þ excitation energies are similar to those obtained with the IPEA (0.25) CASPT2 approach, and the differences in excitation energies for the 3 ðp; p Ã Þ states fall in between those reported for the IPEA(0.25) and IPEA(0.00) CASPT2 methods. Finally, the deviations of EOM-CCSD from the reference for the 3 ðp; p Ã Þ states are similar to the ones of RI-ADC(2).
For cytosine, the p; p Ã electronic states have significantly lower excitation energies than for uracil and thymine. Therefore, the corresponding lowest singlet state is now lower in energy than the dark n; p Ã state. As before the difference in energy between the two states depends strongly on the quantum chemical method: The coupled-cluster methods yield differences that are too small and the multireference methods normally overestimate this value. Exceptions are found for XMS-CASPT2 and XMCQDPT2, for which, due to the underestimation of the energies for the n; p Ã states, the differences are too small. Furthermore, the two lowest lying 3 ðp; p Ã Þ states have now, in general, smaller excitation energies than the 3 ðn; p Ã Þ states, whereas for uracil and thymine the first 3 ðn; p Ã Þ state has excitation energies lying between the two lowest 3 ðp; p Ã Þ states. Finally, these two 3 ðp; p Ã Þ states are either below the excitation energy of the first bright singlet transition or relatively close to it, so that they might be of importance for a deactivation via intersystem crossing.
Adenine. The purine NBs adenine and guanine have active spaces significantly larger than those of the pyrimidine-derived NBs. To describe the p; p Ã adenine excited states, we have employed an active space with all p and p Ã electrons and orbitals, resulting in a total of 12 electrons in 10 orbitals (Fig. 5). As to the n; p Ã electronic states, it is necessary to add the lone pairs from the three nitrogen atoms to the active space, which then contains 18 electrons in 13 orbitals.
The described active spaces proved to be too large for MRCISD calculations, and therefore, the discussion below is based on the results obtained with the strongly contracted NEVPT2 method as a reference. The excitation energies are listed in Table 4. Taking strongly contracted NEVPT2 as the reference, mean absolute deviations are in general lower than for the smaller NBs, mainly caused by the different choice of the reference method. However, apart from that, similar conclusions are reached. PC NEVPT2 is rather close to the reference values, in particular for the triplets. For the CASPT2 results, one notices a Table 2. Vertical excitation energies ME (eV) for the lowest-lying excited states of thymine obtained at the ground state MP2/6-31G* geometry and TZVP basis sets. The MRCISD+Q energies are taken as the reference values (in bold). Highest root included: six 1 A 0 ( 1 (pp*)), four 1 A″( 1 (np*)), three 3 A 0 ( 3 (pp*)), and three 3 A″ ( 3 (np*)). Active spaces: A 0 : CAS (12,9); A″: CAS (16,11 higher impact on the excitation energies when the IPEA shift is used, than from variations between MS or SS calculations. The MS formulation improves the agreement with the reference value (NEVPT2) by about 0.10 eV for p; p Ã excitation energies, whereas the deviations for n; p Ã states are independent of this choice. As before, XMS-CASPT2 and XMCQDPT2 have mean absolute deviations for p; p Ã excitation energies lying between IPEA(0.25) and IPEA(0.00) CASPT2, while the excitation energies for n; p Ã states are underestimated. For the SR methods, RI-CC2 and EOM-CCSD show about the same deviations for the 1 ðp; p Ã Þ states. This is most likely caused by the change in the reference method as strongly contracted NEVPT2 systematically underestimates this kind of excitation energies for the pyrimidinebased NBs and the RI-CC2 excitation energies are found to be lower than the EOM-CCSD ones for these states. For the other classes of states, the same ordering in accuracy as before is found: EOM-CCSD > RI-CC2 > RI-ADC(2) with the exception of the 3 ðp; p Ã Þ states, for which EOM-CCSD is the worst performing SR method. Overall, EOM-CCSD is the method which agrees the best with the reference for the 1 ðp; p Ã Þ, 1 ðn; p Ã Þ and 3 ðn; p Ã Þ states. RI-CC2 performs similarly for the 1 ðp; p Ã Þ states and its deviations are otherwise close to the IPEA(0.25) CASPT2 calculations. RI-ADC(2) has slightly larger deviations for all kinds of excited states, with deviations closer to the IPEA(0.25) than to the IPEA(0.00) CASPT2 calculations. The state ordering is similar to cytosine in the sense that the lowest singlet excited state is of p; p Ã character and in general two triplet states of the same symmetry are lower in energy than this state. However, the difference in energy between the lowest excited state in each symmetry is now smaller than in  the case of cytosine and the first 3 ðn; p Ã Þ state lies either between the two lowest 3 ðp; p Ã Þ states or is at least close to the second one.
Guanine. In comparison with adenine, guanine has an even larger active space requiring 14 electrons distributed in 11 orbitals for the description of the p; p Ã valence electronic states (Fig. 6). As for adenine, three lone pairs have to be considered, one for each of the two nitrogen atoms and one localized on the oxygen atom. So the active space comprises 20 electrons in 14 orbitals to study n; p Ã electronic states. As can be seen in Table 5, the deviations of the methods from the SC-NEVPT2 reference exhibit similar trends as noted for adenine above. The largest difference in the trend is found for the RI-CC2 and RI-ADC(2) methods, specifically for states of n; p Ã character. The former has now mean absolute deviations that are larger than for the IPEA(0.25) CASPT2 calculations, and the latter mean deviations that are even larger than the IPEA (0.00) CASPT2 calculations. But for the p; p Ã excited states, both methods perform similar as for adenine. Regarding the state ordering, as it was found for adenine, the lowest singlet transition has p; p Ã character. Compared to adenine, the n; p Ã states are higher in energy, so that the energy difference between the lowest 1 ðn; p Ã Þ state and the lowest 1 ðp; p Ã Þ state is larger. As to the triplets, in all methods the two lowest lying states are of p; p Ã nature.

Oscillator strengths
The oscillator strengths computed at different levels of theory will be compared and discussed in this section. The excited state calculations were carried out using C s symmetry; therefore, there are no transition density matrices for states of different symmetries in the CASSCF calculations. For this reason, the oscillator strengths cannot be computed for vertical excitations between the ground state and the n, p* states.
The first vertical excitation of uracil has significant oscillator strength which means it is a spectroscopic state. Hence, the photoreaction path observed in uracil starts from a lowest lying 1 ðp; p Ã Þ excited state (2 1 (p, p*)), the so-called bright state (26). As can be seen in Table 6, all methods considered in this study predict the existence of a low-lying (2 1 (p, p*)) state with large oscillator strength. The sequence of excited states ordered by the absorption intensity, as predicted by the oscillator strengths values, is the same for all computational methods: 5 1 (p, p*) > 2 1 (p, p*) > 4 1 (p, p*) > 3 1 (p, p*). NEVPT2 and CASPT2 are in general very close to the MRCISD values. However, we observe that for MS-CASPT2 the oscillator strength is increasing for states with small intensity at SS-CASPT2 level. While for the brightest state the MS-CASPT2 oscillator strength is significantly decreasing. The largest deviations from MRCISD reference are found for the XMS-CASPT2 method. Surprisingly, the XMCQDPT2 method is performing significantly better. The analysis of the SR methods shows that the oscillator strengths for the lowest four states are in good agreement with the reference values. However, the value for the highest excited state is strongly underestimated.
The computed oscillator strengths for thymine are displayed in Table 7. As expected (26), the bright state, predicted at all levels of theory, is the lowest 1 ðp; p Ã Þ excited state in the Franck-Condon region. The computed decreasing order of the oscillator strength is also the same for all MR methods: 5 1 (p, p*) > 2 1 (p, p*) > 4 1 (p, p*) > 3 1 (p, p*) > 6 1 (p, p*). However, for the SR methods the 6 1 (p, p*) state has higher oscillator strength than 2 1 (p, p*). In addition, RI-ADC(2) is severely underestimating the strength of the 5 1 (p, p*) excitation. Otherwise, we observe the same trends for the methods as in the case of uracil. Table 8 displays the oscillator strengths computed for cytosine. All methods predict that the 4 1 (p, p*) electronic state carries most of the intensity at the Franck-Condon region. The order of intensity for two lowest lying 1 ðp; p Ã Þ states is reproduced for all the methods except for XMS-CASPT2, where the first excited state is more intense than the second one. Also the sixth state seems to be problematic. Only RI-CC2 can correctly reproduce the trend of MRCISD.
Adenine is known to exhibit two low-lying electronic states of p; p Ã nature (26), the 1 ðp; p Ã L a Þ and 1 ðp; p Ã L b Þ states, the former being higher in energy and related to the electronic transition bearing the largest intensity (oscillator strength). The relative energetic order of the 1 ðp; p Ã L a Þ and 1 ðp; p Ã L b Þ states in adenine is controversial; while ab initio methods predict the 1 ðp; p Ã L b Þ state to be lower in energy than the 1 ðp; p Ã L a Þ state, recent magnetic CD spectra assign the inverse order (L b < L a ), which is corroborated by time-dependent density functional calculations (87). The present study cannot be directly compared to the experiment because the calculations are carried out in the gas phase, while the experiment is carried out in the solvent. The solvation is expected to play an important role for the relative position of these two excited states because of their different dipole moments. Table 4. Vertical excitation energies DE (eV) for the lowest lying excited states of adenine obtained at the ground-state MP2/6-31G Ã geometry and TZVP basis sets. The SC-NEVPT2 energies are taken as the reference values (in bold). Highest root included the following: seven 1 A 0 ( 1 ðpp Ã Þ), four 1 A 00 ( 1 ðnp Ã Þ), three 3 A 0 ( 3 ðpp Ã Þ) and three 3 A 00 ( 3 ðnp Ã Þ). Active spaces: A 0 : CAS(12,10); A 00 : CAS (18,13 Figure 6. Guanine CAS (20,14) in terms of state-averaged valence natural orbitals. The active space was visualized using GV (67) and POV-Ray (68).
As MRCISD calculations were not feasible, we are using NEVPT2 oscillator strengths as reference values, which does not mean these are the most accurate values. However, as can be seen in Table 9, we observe similar trends as in the case of pyrimidine NBs: (i) SS-CASPT2 and XMCQDPT2 values are the closest to NEVPT2 among all MRPT2 methods, (ii) the order of MS-CASPT2 values is inverted with respect to SS-CASPT2, (iii) XMS-CASPT2 performs significantly inferior than XMCQDPT2.
We observe that NEVPT2 and both SS-CASPT2 and the XMCQDPT2 methods yield similar oscillator strength. For these methods, the first excited state has a high values, while the second one is almost zero. For MS and XMS-CASPT2 calculations as well as for the SR methods, the second excited state has a higher oscillator strength. The difference between the SS and MS methods can be connected to the difficulty of these methods in the description of the two lowest lying p; p Ã states due to their energetic proximity (Table 4): When the calculation is done averaging five singlet states, the lowest pair of p; p Ã electronic states does not interact so strongly, as compared to the case of seven states, and the right order of the two transitions is obtained. The SR methods predict the brightest state in agreement with NEVPT2; however, all the other states are in a different order. Table 5. Vertical excitation energies DE (eV) for the lowest lying excited states of guanine obtained at the ground-state MP2/6-31G Ã geometry and TZVP basis sets. The SC-NEVPT2 energies are taken as the reference values (in bold). Highest root included the following: five 1 A 0 ( 1 ðpp Ã Þ), four 1 A 00 ( 1 ðnp Ã Þ), three 3 A 0 ( 3 ðpp Ã Þ) and three 3 A 00 ( 3 ðnp Ã Þ). Active spaces: A 0 : CAS (14,11); A 00 : CAS (20,14 It is a consensus that the bright state of guanine is the 1 ðp; p Ã L a Þ state (26). However, the second low-lying state of p; p Ã character ( 1 ðp; p Ã L b Þ) also has a large oscillator strength and can be populated at higher excitation energies. But this trend can be inverted in solution-the biologically relevant environment-as predicted experimentally (88) and theoretically (89). As stated above for the case of adenine, a comparison of the calculated excitation energies to the measured absorption maxima is not possible.
As listed in Table 10, we observe as a trend, that all MR methods except XMCQDPT2 predict large oscillator strengths for the two lowest lying p; p Ã states, and the correct decreasing order (2 1 (p, p*) > 3 1 (p, p*)). However, XMCQDPT2 and the SR methods have the order inverted for these two states. The trend for the higher lying states (4 1 (p, p*) > 3 1 (p, p*))) is the same for all methods.

CONCLUSION
We have presented a benchmark of MRPT2 methods for vertical excited-state energies of five NBs. The present work is an important extension of the previous study of Thiel and coworkers by: (i) guanine extension of the four NBs; (ii) providing a new reference for pyrimidines, MRCISD+Q, that is based on the same zeroth-order wave function as the MRPT2 methods; (iii) complementing the set with triplet excitation energies; and (iv) including new MRPT2 variants such as XMS-CASPT2 and XMCQDPT2.
Among the MRPT2 methods, we find the following trends: For singlets, the best agreement with the MRCISD+Q reference values is obtained for NEVPT2 and it is worth noting that the strongly contracted variant performs slightly better than the PC one. CASPT2 calculations with the default IPEA shift follow closely. This trend is also observed for triplet states, but in this case, NEVPT2 in both variants and CASPT2-IPEA exhibit nearly the same accuracy. Also the use of SS or MS formulation does not have much influence on the excitation energies, but the latter method yields results a bit closer to the reference. In contrast, the use of the IPEA shift has the largest impact on the excitation energies: Employing this shift, the excitation energies increase, ameliorating the underestimation found for Table 8. Oscillator strengths for the electronic transitions from the ground-state to the lowest lying 1 ðpp Ã Þ excited states (A 0 symmetry) of cytosine, obtained at the ground-state MP2/6-31G Ã geometry and TZVP basis sets. Highest root included the following: six 1 A 0 , active space: 1 A 0 : CAS (10,8 the CASPT2 method. For ðp; p Ã Þ electronic states (A 0 symmetry), the performance of XMS-CASPT2 and XMCQDPT is slightly better than CASPT2 without IPEA shift. However, among all MR methods these two methods are the farthest away from the chosen reference in the case of states with ðn; p Ã Þ character (A 00 ) symmetry. Overall, the excited states of 1 ðn; p Ã Þ nature (A 00 symmetry) were described more accurately than 1 ðp; p Ã Þ states (A 0 symmetry), while for triplet states the trend is inverted except for NEVPT2. In contrast, the SR methods exhibit smaller deviations from the reference for states of p; p Ã -type (A 0 symmetry) compared to those of n; p Ã nature (A 00 ). The agreement with the reference increases from RI-ADC(2) to RI-CC2 to EOM-CCSD with the exception of the 3 ðp; p Ã Þ states. For singlets, EOM-CCSD performs slightly better than NEVPT2, RI-CC2 similar to MS-CASPT2 with IPEA shift and the accuracy of RI-ADC(2) lies between CASPT2 calculations with and without IPEA shift. For triplets, the accuracy of the SR methods decreases: The deviations found for RI-ADC(2) are now close to the CASPT2 calculations without IPEA shift and RI-CC2 exhibits errors lying between CASPT2 calculations with and without IPEA shift; for EOM-CCSD, its accuracy is close to RI-ADC (2) in case of 3 ðp; p Ã Þ states, whereas the 3 ðn; p Ã Þ states tend to be in similar good agreement with the reference as the excitation energies of the singlet states.
As to the oscillator strengths, there are clear trends for MRPT2 methods. NEVPT2 and SS-CASPT2 produce oscillator strengths that are close to the MRCISD reference values. There is a deviation between SS and MS variants, the latter is found to underestimate the oscillator strengths of the bright states and overestimate them for the dark states with respect to the calculated intensities. For the SR methods, we find a closer agreement to MRCI, which is due to the transition densities being calculated at the same level of theory as the excitation energies. In case of MRPT2 methods, these transition densities come from CASSCF calculations, which is less accurate. However, the order of bright states computed with SR methods is inverted in some cases when these states are close in energy and have comparable oscillator strengths.
We believe the present assessment of MRPT2 methods will serve as a reference in photochemical studies of the five NBs. Further, it will be important for photochemical studies of the reactivity that rely on a accurate description and the correct order of the excited states in the Franck-Condon region.
Acknowledgements-This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement 678169 "PhotoMutant"). A.C.B. and A.V.S.A. thank the Conselho Nacional de Desenvolvimento Cient ıfico e Tecnol ogico (CNPq) for research fellowships. These authors also want to thank STI Superintendência de Tecnologia da Informac ßão) of the University of São Paulo for services and computer time. Further, we would like to thank the Regional Computing Center of the University of Cologne (RRZK) for providing CPU time on the DFG-funded supercomputer CHEOPS and for their support. Dr. A. A. Granovsky is acknowledged for the assistance with FIREFLY calculations, and Dr. V. Veryazov is thanked for providing a modified version of the GV. The authors thank Dr. Per-Ake Malmqvist for providing the developers' version of MOLCAS with the XMS-CASPT2 implementation.

SUPPORTING INFORMATION
Additional Supporting Information may be found in the online version of this article: Figure S1. The quadratic extrapolation of the S 0 energy of cytosine, thymine, and uracil from 4 roots state-averaged calculation with CASSCF and XMCQDPT2 by decrease of the weight of the second, third, and fourth root. Figure S2. The quadratic extrapolation of the S 0 energy of guanine from 4 roots state-averaged calculation with CASSCF and XMCQDPT2 by decrease of the weight of the sixth, seventh, and eighth root. Figure S3. The quadratic extrapolation of the 1 3 A″, 2 3 A″, and 3 3 A″ energies of guanine from 4 roots state-averaged calculation with CASSCF and XMCQDPT2 by decrease of the weight of the fourth root. Table S1. Cartesian coordinates of guanine molecule.