Exact method for numerically analyzing a model of local denaturation in superhelically stressed DNA

Local denaturation, the separation at specific sites of the two strands comprising the DNA double helix, is one of the most fundamental processes in biology, required to allow the base sequence to be read both in DNA transcription and in replication. In living organisms this process can be mediated by enzymes which regulate the amount of superhelical stress imposed on the DNA. We present a numerically exact technique for analyzing a model of denaturation in superhelically stressed DNA. This approach is capable of predicting the locations and extents of transition in circular superhelical DNA molecules of kilobase lengths and specified base pair sequences. It can also be used for closed loops of DNA which are typically found in vivo to be kilobases long. The analytic method consists of an integration over the DNA twist degrees of freedom followed by the introduction of auxiliary variables to decouple the remaining degrees of freedom, which allows the use of the transfer matrix method. The algorithm implementing our technique requires O(N) operations andO(N) memory to analyze a DNA domain containing N base pairs. However, to analyze kilobase length DNA molecules it must be implemented in high precision floating point arithmetic. An accelerated algorithm is constructed by imposing an upper bound M on the number of base pairs that can simultaneously denature in a state. This accelerated algorithm requires O(MN) operations, and has an analytically bounded error. Sample calculations show that it achieves high accuracy ~greater than 15 decimal digits ! with relatively small values of M (M,0.05N) for kilobase length molecules under physiologically relevant conditions. Calculations are performed on the superhelical pBR322 DNA sequence to test the accuracy of the method. With no free parameters in the model, the locations and extents of local denaturation predicted by this analysis are in quantitatively precise agreement with in vitro experimental measurements. Calculations performed on the fructose-1,6-bisphosphatase gene sequence from yeast show that this approach can also accurately treat in vivo denaturation.@S1063-651X~99!06003-1#


INTRODUCTION
Unconstrained linear DNA molecules in solution at physiological temperatures and ionic conditions adopt the wellknown Watson-Crick B-form structure, a right handed double helix conformation.However, DNA can occur in several other conformations.The biologically most important alternate conformation is the locally denatured ͑i.e., strand-separated͒ state, in which the base pairing between the two strands of the B-form DNA duplex is locally disrupted.Because local denaturation is an essential step in both transcription and replication, the two central functions of DNA, its occurrence must be stringently controlled in vivo.One means of exerting this control involves topological regulation of the unwinding torsional stresses that are imposed on the DNA ͓1͔.
DNA within living systems is organized into topological domains, typically several kilobases in length, consisting either of circular molecules or of closed loops within larger molecules ͓2͔.The topological constraint on a closed-loop domain is precisely equivalent to that on a circular molecule.In either case the linking number L of the domain is fixed.L is the number of times either strand of the DNA links through the closed circle formed by the other strand.Equivalently, L is the total number of turns either strand of the DNA makes about the central axis curve of the domain, counted according to sign, when that central axis curve is planar.The linking number L is a global topological invariant of the domain: so long as both strands of the DNA remain continuous, the value of L cannot change.However, enzymes can alter the linking numbers of domains by processes involving transient strand breakage and relinking ͓3͔.In vivo, these enzymes typically act ͑often in concert with other processes͒ to decrease the linking number L below the value L o characteristic of the unstressed B-form double helix ͓3,4͔.The resulting ͑negative͒ linking difference ␣ϭLϪL o Ͻ0 imposes untwisting torsional stresses on the DNA domain involved, placing it in a ͑negatively͒ superhelical state.The domain can accommodate this condition in two ways.First, the B-form helix can bend and twist, deformations that require energy.Second, local regions of the DNA domain can undergo conformational transitions, such as denaturation, that decrease the helicity of the sites involved.These transitions accommodate part of the imposed linking difference ␣, which allows the rest of the domain to relax by a corresponding amount.Denaturation will be energetically favored when the energy of deformation relieved by this partial relaxation exceeds the cost of the conformational transition ͓5͔.The localization of denaturation at specific sites within a domain *Author to whom correspondence should be addressed.
results from the sequence dependence of the denaturation energy.Under a wide variety of environmental conditions AT base pairs on average require less free energy to separate than do GC pairs, with significant modulation due to near neighbor effects ͓6-9͔.Hence, sites of local denaturation tend to be concentrated at AT-rich regions within a negatively superhelical domain.
The global superhelical constraint of fixed linking number L effectively couples together the conformational states of all the base pairs within a DNA domain.A transition at any one site alters its helical twist, which changes the distribution of the linking difference ␣ throughout the domain, and thereby alters the stresses experienced by all other base pairs.This effective global coupling can lead to qualitatively entirely different types of transition behaviors than occur in the thermal denaturation of unconstrained polymers ͓10,11͔.In unstressed linear polymers undergoing thermal denaturation, the probability of transition of each monomer typically increases monotonically with temperature.However, the imposition of negative superhelical on a DNA domain can lead to much more complex transition behavior.For example, the probability of denaturation of individual base pairs need not increase monotonically with the denaturing constraint ␣.Instead, as this negative superhelicity becomes more extreme, denaturation at new sites may be coupled to reversions back to B form ͑i.e., rejoining͒ of sites that had been denatured ͓1͔.Moreover, whereas local changes of base sequence have at most local effects on thermally driven transitions ͓12-14͔, in stress-induced denaturation small local sequence alterations can have global consequences.For example, deletion of a 16-bp ͑base pair͒ region in a 4-kb circular DNA completely changes the locations and extents of superhelical denaturation throughout the molecule ͓15,16͔.
Stress-induced local denaturation has been shown experimentally to be involved in several important biological processes.The unique replication origin in the E. coli genome contains a stress-destabilized site located at a specific position relative to other markers ͓15͔.Sequences that have been modified in a way that preserves the susceptibility of this site to superhelical denaturation retain their in vivo activity, while mutations that either degrade this susceptibility or move the position of the denaturing region by as few as 50 base pairs destroy the function of the replication origin.No other sequence specificity is observed around this position.Stress-induced denaturation also plays several known roles in DNA transcription.For example, expression of the c-myc oncogene is regulated in part by the upstream far upstream sequence element ͑FUSE͒ region, which is denatured in vivo under conditions where this gene is transcriptionally active ͓17͔.Transcription of c-myc requires binding of the FUSE binding protein ͑FBP͒ regulatory protein to the unpaired DNA strands at the FUSE site.In a second example, expression of the ilvP G operon in E. coli is enhanced by binding of integration host factor ͑IHF͒ to a site 90 base pairs upstream of the transcription start site.However, this enhancement only occurs when the negative superhelicity of the DNA is sufficient to drive denaturation of the region abutting the IHF binding site.Under these circumstances IHF binding forces this site to revert to B form, which causes the next most easily destabilized region, around the transcription start site, to denature ͓18͔.Stress-induced denaturation also has been implicated in the termination of transcription ͓19͔.Sequence alterations that degrade the susceptibility to denature within the 3Ј terminal flank of the yeast FBP1 gene decrease its frequency of correct termination in vivo.Stress-denaturable sites are involved in the binding of DNA to other cellular structures.These include sites where the DNA attaches to the chromosomal matrix ͓20,21͔ as well as the centromere region, where the DNA binds to the cellular apparatus that separates the two copies of a chromosome at cell division ͓22͔.
Local denaturation is the most extreme form of stressinduced DNA duplex destabilization.However, less extreme forms also may be biologically important.A process requiring local separation of the duplex strands may be controlled by proteins that can contribute some energy to this unpairing event, but not enough to drive it unless the site involved is already marginally destabilized.Such sites may become active when stresses are imposed on the DNA that decrease the energy required for their local denaturation without necessarily causing complete opening.For this reason it is also important to understand how imposed stresses affect the incremental energy required to denature individual sites within a DNA sequence.
Several approximate methods have been developed to analyze conformational transitions in superhelical DNA molecules.The first theoretical treatment demonstrated that local denaturation can be energetically favored in molecules experiencing untwisting torsional stresses ͓23͔.It performed a simple two-state analysis in which only the competition between the untransformed state and the single energetically most favored denatured state was considered.Thereafter, three approximate statistical mechanical methods were developed to treat this problem.
The first of these approaches was a modification of the standard method to analyze helix-coil transitions in unconstrained linear molecules ͓24,25͔.DNA that is circular was analyzed as though it were linear, with one unopenable base pair added to each end to reduce end effects.Then an energy renormalization step was performed to account approximately for the effects of the superhelical constraint.This approach did not impose the correct topological condition on the DNA.It also incorrectly assumed that sites of denaturation were torsionally undeformable, so that no part of an imposed linking difference ␣ could be absorbed by interstrand twisting at the denatured sites.Single stranded DNA actually is highly flexible, so that large amounts of the imposed linking difference can be absorbed by such twisting at little cost in energy ͓26͔.These oversimplifications severely limited the accuracy and utility of this method.Recently this approach has been extended by developing a more sophisticated self-consistent renormalization technique ͓27͔.
A second approximate analytic method has been developed that imposes the correct topological condition on the DNA domain and includes the torsional deformability of the unpaired regions ͓5,16͔.In this approach an energy threshold is specified, and all states of denaturation are found whose energies exceed that of the minimum energy state by no more than this threshold amount.The cumulative effect of all states whose energies do not satisfy this threshold condition is estimated by a density of states calculation.From this information an approximate partition function is constructed, and approximate ensemble average values of important pa-rameters are calculated.These include both the denaturation probability and the destabilization energy of each base pair in the DNA sequence.The accuracy of this method increases as the energy threshold is raised, although the number of included states, and hence also the computation time, grow approximately exponentially.This technique has internal controls that allow the user to achieve a specified level of accuracy by setting the energy threshold appropriately.Sample calculations show that reasonable accuracy can usually be achieved using moderate thresholds, for which the algorithm implementing this approximate method executes efficiently.
Extensive calculations have shown that the predictions of this approximate method are in close quantitative agreement with experimental results ͓16͔.Experimentally determined energy parameter values are used in these calculations, so they have no free parameters.However, comparisons with experiments show that they correctly predict the sites and extents of denaturation as functions of the imposed linking difference, as well as the substantial effects on transition behavior that can result from even modest base sequence modifications.The close accord with experiment achieved by this method has enabled its use to predict the stress-induced destabilization properties of DNA sequences for which experimental information is not available ͓1,28͔.
Despite its successes, this approximate method has several shortcomings.It treats denaturation as copolymeric, with one separation energy ascribed to AT base pairs and another to GC pairs.As presently constructed it cannot handle more detailed transition energies, such as arise from the influence of near neighbor base pair identities, chemical modification of bases, bound ligands, abasic sites, pyrimidine dimers, or other molecular lesions.All of these local alterations of DNA are known to occur in vivo, and all can have a variety of important biological effects ͓29-33͔.This method would also be difficult to extend to analyze competitions between local denaturation and other types of transitions in topologically constrained DNA molecules.And for technical reasons it cannot handle cases where the low energy states contain four or more distinct sites of simultaneous denaturation, as can occur in very long molecules (NϾ15 000 bp) or at high temperatures.
The third method that was developed to analyze conformational transitions in superhelical DNA is a generalized Monte Carlo sampling technique ͓34͔.This approach can treat some of the special cases that neither of the previously described methods could handle, such as high temperatures and very long molecules.Also, it can be easily extended to treat multiple competing transitions.However, it is difficult to determine the frequency of occurrence of high energy states accurately using Monte Carlo sampling, so this method can estimate the destabilization energies of only the most strongly destabilized base pairs.It is comparatively slow to execute, and its accuracy in calculating many quantities is often less than that achieved by alternative methods.For these reasons Monte Carlo is the method of choice only in cases where no alternative approach is feasible.
In this paper we present a numerically exact technique to calculate the equilibrium properties of the denaturation ͑strand separation͒ transition in circular superhelical DNA molecules of specified base sequence and kilobase length.
This approach has several advantages over its predecessors.It imposes the correct topological constraint on the DNA domain, it can treat any dependence of base pair separation energies on base sequence, and it can handle many situations which other methods find intractable.These include large numbers of simultaneously open regions, high temperatures, near neighbor effects, and the presence of chemical modifications such as abasic sites, methylated bases, lesions, or adducts.It explicitly treats fluctuations of interstrand twisting within denatured regions.Some earlier treatments ignored this phenomenon entirely ͓24,25͔, while others assumed these torsional deformations occurred at a mechanical equilibrium configuration of minimum energy ͓5,16,23,27͔.The exact method also can be extended to treat competitions between denaturation and other types of transitions.As the exact contributions from all states are included, the accuracy of the method is limited only by its computational implementation.We also discuss an accelerated approximate implementation of the algorithm with analytic error bounds which can provide a speed-up of about an order of magnitude while retaining high accuracy.

DERIVATION OF THE METHOD ''Effective Hamiltonian'' and free energy considerations
We consider a closed circular DNA molecule containing N base pairs, on which a linking difference ␣ has been imposed.Each base pair is regarded as being susceptible to transition to an alternative secondary structure ͑i.e., different helical structure and/or separation of the strands of the du-plex͒.Here the alternative secondary structure is assumed to be local denaturation ͑strand separation͒, although other possibilities can be treated with the same formalism.We will describe each state available to the DNA molecule, and ascribe an energy to that state.
We explicitly model only the DNA molecule itself.However, because the energy parameters used as inputs into the model have been determined from in vitro experiments, they implicitly include the effects of solvation, ionic conditions, and other environmental factors.The resulting ''effective Hamiltonian'' therefore implicitly incorporates a dependence on these environmental conditions.
More generally, consider a Hamiltonian H 0 (x ជ ,y ជ ), where x ជ refers to the DNA degrees of freedom which will be explicitly considered, and y ជ refers to any other DNA degrees of freedom as well as to the environmental degrees of freedom.The constant temperature and pressure partition function Z of the DNA plus environment ͓35͔ is then given by with the Gibbs free energy Here, Tr x ជ ,y ជ refers to sums or integrals over x ជ and y ជ as appropriate and ␤ϭ1/(k B T), where k B is Boltzmann's constant and T is the absolute temperature.Equation ͑1͒ can also be written in the form

͑3͒
where Because of the form of Eq. ͑5͒, H(x ជ ) is sometimes considered to refer to the ''free energy'' of a particular system A plus its environment B, for a fixed configuration of that system A. Alternatively, one can consider H(x ជ ) as an ''effective Hamiltonian,'' with interactions between the x ជ degrees of freedom renormalized by the environment ͑and possibly temperature dependent͒.We will primarily use the terminology ''effective Hamiltonian'' and ''energy'' in this paper.For a quantity O which depends only upon the x ជ degrees of freedom ͓OϭO(x ជ )͔, the expectation value is given by Thus, as long as an effective Hamiltonian H(x ជ ) is used, expectation values can be calculated as usual.This is the formal basis for the procedure we will follow.

States and their energies
In each state of the DNA molecule the linking difference ␣ is partitioned among three factors.First, the secondary structure is specified by describing which base pairs are denatured in that state.Denaturation decreases the unstressed helicity of the involved base pairs from that characteristic of the B-form duplex to that of the untwisted condition.If there are n denatured base pairs in the molecule in the given state, the total change in unstressed helicity resulting from this transition is Ϫn/A, where Aϭ10.4 bp/turn for denaturation ͓36͔.Only when nϭϪA␣ does the extent of denaturation exactly relax the imposed linking difference.All states for which n ϪA␣ will experience some level of uncompensated superhelicity.The resulting torsional stresses cause the two single strands comprising a denatured region to twist around each other.We denote the total change of twist arising from this effect by T. Finally, the residual linking difference ␣ r is that portion of ␣ that is not expressed by either of the above two structural alterations.According to the formalism presented above, this residual deformation need not be decomposed further, since the energetics associated with ␣ r have been determined experimentally.
These deformations are all coupled together by the topological constraint arising from the constancy of the linking difference ␣: With ␣ and the secondary structure of each base pair specified, the torsional deformation T and the residual linking difference ␣ r still can vary, provided they do so in a reciprocal manner consistent with Eq. ͑7͒.We consider in turn each type of deformation and its associated energetics.

Local denaturation
Let n j , 1рjрN, be a variable whose value is n j ϭ1 when the base pair at position j is denatured ͑sometimes also called ''separated'' or ''open''͒ and n j ϭ0 when it is in the B form ͑i.e., ''bonded'' or ''closed''͒.Because the molecule is circular, base pairs 1 and N are neighbors.To accommodate this periodic boundary condition, we set n Nϩ j ϭn j as needed.Specifying the value of each n j determines a unique state of secondary structure of the molecule, in which the total number of denatured base pairs is We denote the energy required to denature base pair j if it is at the edge of an open region by b j .The values of b j can be assigned individually to each base pair.In contrast to previous approaches ͓5͔, this method places no restrictions on the values they can have.This energy of denaturation is known to vary in complex ways with base sequence and environmental conditions.Values of b j have been measured experimentally as functions both of base pair composition ͓6͔ and of ionic strength ͓37͔.The near-neighbor sequence dependence of the enthalpy and entropy of denaturation have been determined under various environmental conditions ͓7-9͔.Energies of denaturation have been evaluated for methylated bases, and for abasic sites ͓31,32,38͔.Previous theoretical analyses of superhelical DNA denaturation assumed copolymeric transition energetics, with a single value b AT ascribed to each AT base pair, and another value b GC given to each GC pair ͓5͔.Under the environmental conditions of the experiments used to detect superhelical denaturation, these are b AT ϭ0.26 kcal/mol and b GC ϭ1.31 kcal/mol ͓16͔.
A ''run'' is a region composed entirely of separated base pairs.Since n j only changes with j at the boundary of a run, the number r of runs in a state of a circular molecule can be expressed as An initiation energy a is required to nucleate a run of denaturation.This arises in large part from the energy needed to break the extra hydrophobic ''stacking'' interaction that must be disrupted when the first base pair in a run is separated.The initiation energy for denaturation is large, a Ϸ10-13 kcal/mol, depending on environmental conditions ͓16,39-42͔.In the calculations reported below we use the value aϭ10.8 kcal/mol that is appropriate for the experimental conditions under which superhelical denaturation is measured ͓16͔.
Hence the total chemical energy needed to denature the base pairs in the state is

Interstrand twisting within denatured regions
Because single stranded DNA is highly flexible and the denatured regions within a superhelical molecule generally remain torsionally stressed, the unpaired strands comprising them will tend to interwind.If the base pair at site j is denatured (n j ϭ1), and has a helical twist of j rad/bp, the energy associated with this deformation is This energy arises in part from configurational restrictions due to helical interwinding.The value of the torsional stiffness, CϷ9.3ϫ10 Ϫ21 erg cm, is known from experiments ͓16,42͔.We do not explicitly model fluctuations in the twist of bonded ͑nondenatured͒ base pairs, as the torsional stiffness of B-form DNA is about two orders of magnitude greater than that of the individual denatured strands ͓26͔.
Instead, this effect is subsumed within ␣ r , the residual superhelicity.
We will consider the torsional deformations j at two levels of detail.In case ͑1͒, j is set to for each separated base pair j, so that the total twist T of the open regions is This is done primarily to enable comparisons with previous treatments ͓5,16͔.In case ͑2͒, we allow the j associated with each denatured base pair to fluctuate independently, giving Note that n j , and hence this summand, is nonzero only at the n denatured base pairs in the state under consideration.

Residual superhelicity and total energy
Once the separated base pairs and their torsional deformations are specified, the residual linking difference is determined as This residual linking difference is comprised of twisting of the B-form regions, as well as bending deformations.However, for present purposes, ␣ r need not be decomposed into these constituents because the energy associated with it is known from experiments.
The energy associated with superhelical deformations has been measured by several experimental techniques to be quadratic in the linking difference under conditions where no denaturation occurs ͓43-45͔.͑In that case ␣ r ϭ␣.͒The same quadratic functional form also has been experimentally found for the residual linking difference when denaturation does occur ͓16,42͔: The coefficient K has been determined experimentally to vary inversely with molecular length N, having the value K Ϸ2220RT/N at the physiological temperature 37 °C.The total energy associated with a state depends on the manner in which the torsional deformations j are being modeled.Adding all the contributions, we find that the Hamiltonian H 1 for case ͑1͒, where j ϭ, is ͕͑aϩb j ͒n j Ϫan j n jϩ1 ͖.

͑16͒
In case ͑2͒, where each of the n separated base pairs is torsionally deformed at a rate of j rad/bp, the Hamiltonian is ͕͑aϩb j ͒n j Ϫan j n jϩ1 ͖.

Calculation of the partition function
The calculation of the partition function involves summing and integrating the usual Boltzmann factor e Ϫ␤H over all the states available to the system, where H here refers to the Hamiltonian of a given state and ␤ϭ1/(k B T).We proceed by first eliminating the degrees of freedom associated with the j 's, the twisting of the separated DNA strands.
First, consider case ͑1͒, where j ϭ for each separated base pair.For each number n of separated base pairs and imposed linking difference ␣, we minimize Eq. ͑16͒ with respect to .This leads to the condition ␣ r Kϭ2C.͓The same condition follows when Eq. ͑17͒ is minimized with respect to the j 's.͔If we replace by this value, as was done in previous work ͓5,16͔, the effective Hamiltonian H 1 , now dependent only on the n j 's, becomes ͕͑aϩb j ͒n j Ϫan j n jϩ1 ͖.

͑18͒
Under this minimizing assumption the partition function associated with H 1 has the form where denotes summation over the 2 N states of secondary structure.
In case ͑2͒, where each j corresponding to an open base pair is allowed to fluctuate, the partition function has a similar form, differing only in the prefactor Q 2 (n).One can integrate over the n continuous degrees of freedom j to obtain the expression

͑22͒
Performing a matrix version of the completion of squares, one may evaluate this integral to be . ͑23͒ In each case the partition function may be expressed as where equals 1 or 2 depending on the case being considered.We will henceforth drop the subscript unless specifically noted, as our calculation strategy will apply in both cases.
In both cases ͑1͒ and ͑2͒ the partition function may be written in the form Z ϭTr͕exp(Ϫ␤H )͖, where H is a function of the n j 's only.From Eqs. ͑20͒ and ͑23͒, one of the terms in each H is ln͕Q (n)͖, which contains the factor (␣ϩn/A) 2 .Since n 2 ϭ ͚ j,kϭ1 N n j n k , this shows that a coupling is induced in H between every pair ͑j,k͒ of the base pairs.
A naive calculation of the partition functions of Eq. ͑24͒ would require a computation time growing exponentially with the number of sites N.However, an expression having the functional form of Eq. ͑24͒ can be evaluated in polynomial time using the following procedure.First we write Q(n) as where ␦ m,n is the Kronecker ␦ function: ͑26͒ Expressing the ␦ function in the form where iϭͱϪ1, we obtain

͑28͒
Placing this expression for Q(n) into Eq.͑24͒, using the fact that nϭ ͚ jϭ1 N n j , and rearranging terms, yields where and with and . ͑33͒ H(k) has the form of a one-dimensional lattice gas ͑or, equivalently, Ising model͒, in which the chemical potential ͑magnetic field͒ is site dependent and complex.
The F(k) term derives only from the chemical energies associated with base pair separations, while the q(k) term depends upon mechanical parameters associated with topological and geometric factors.These are the imposed linking difference ␣, the residual superhelicity ␣ r , and the torsional deformations of the denatured regions.Separating the factors arising from the denaturation transition from those derived from the topological constraint enables efficient evaluation of the entire expression.F(k) requires a summation over all states S of the secondary structure of the molecule.This summation will be performed using the transfer matrix method ͓46͔, as described below.Once F(k) and the appropriate prefactor Q(m) have been evaluated, the balance of the computation is straightforward.We will show that all the expectation values of interest can be calculated using equations having the general form of Eqs.͑29͒-͑33͒.

Transfer matrix method
We begin by briefly reviewing the transfer matrix method ͓46͔, originally formulated for the Ising model.First, sup-pose M 1 ,...,M N are qϫq square matrices.We number rows and columns of these matrices from 0 to qϪ1, and here denote the element in the ith row and the jth column of M l by m i, j l .The product of all these matrices is and the trace of P is To illustrate the transfer matrix method, we calculate the F(k) of Eq. ͑31͒ needed to compute the partition function.We set where we here choose H l (k) to have the symmetric form

͑37͒
which is the form we will use in our algorithmic implementation.͑This choice is somewhat arbitrary; we know of no particular advantage to using a symmetric versus a non-symmetric form.͒Each H l (k) only depends upon the variables n l and n lϩ1 .
Because the collection of all states of the system S is exhausted by permitting each variable n l , lϭ1,...,N, to take on every possible value, it follows that In this form F(k) has the same structure as that given in Eq. ͑35͒, with qϭ2.This shows that F(k) can be expressed as the trace of the product of 2ϫ2 transfer matrices M l (k), l ϭ1, . . .,N, one for each base pair in the molecule.The transfer matrix M l (k) has entry m(k) i, j l ͑iϭ0,1 and j ϭ0,1) corresponding, respectively, to the values of e Ϫ␤H l (k) arising when n l ϭ0,1 and n lϩ1 ϭ0,1.Thus, the matrix M l (k) that occurs when one uses the symmetric form of This shows that the function F(k) in Eq. ͑31͒ may be expressed as

͑40͒
Computations using this method require an evaluation of products of large numbers of matrices.In the standard onedimensional Ising model, the transition energetics are identical at every position, so the transfer matrices are all the same and the trace can be expressed as the sum of powers of eigenvalues ͓46͔.In the present case the energy of transition varies with base sequence, so the transfer matrices M l (k) associated with different base pairs will not be identical.As multiplication of these matrices does not commute, this operation must be performed numerically.The numerical implementation of this approach is described in a later section.

Calculation of ensemble averages
The ensemble average values of several quantities provide important insights into the transition behavior of a superhelical DNA molecule.These include the average numbers n ¯of denatured base pairs and r ¯of runs of transition, and the average total twist T ¯of the denatured regions.The ensemble average residual linking difference ␣ ¯r , obtainable from T ¯, is an important parameter to calculate because it can be experimentally measured using gel electrophoresis techniques ͓42͔.The biologically most interesting information involves the locations where denaturation occurs, and the relative extents and energy costs of transition at those locations.We evaluate locations and extents of transition by calculating the probability of transition p l ϭn ¯l individually for each base pair 1 рlрN.The resulting transition profile is typically displayed graphically by plotting p l against sequence location l. ͑An example is displayed as the upper graph in Fig. 4 below.͒A method to calculate the destabilization energies of individual base pairs is described in a later section.
The additional quantities that must be calculated in order to evaluate these ensemble averages all have the functional forms expressed in Eqs.͑29͒-͑33͒.Only the forms of the prefactors Q(m) and the summands of F(k) may differ.Hence they can be evaluated by the general procedure that was described above for the partition function.

Calculation of the average number n ¯of separated base pairs
The ensemble average number of separated base pairs n ¯is where This expression may be evaluated using the same technique as was described above for the partition function, with the sole modification that Q(n) is replaced by nQ(n).As this is still a function of n alone, the procedure applies unchanged.Alternatively, one may evaluate n ¯as the sum of the transition probabilities n ¯l of the individual base pairs, which we now consider.

Calculation of n ¯l
The transition profile of a DNA sequence at linking difference ␣ is a graph of the equilibrium probability of transition p l , 1рlрN, of each base pair in the molecule.Here p l ϭn ¯l is given by where Z(n l ) is the contribution to the partition function from all states in which base pair l is separated: Q͑n ͒n l e Ϫ␤ ͚ jϭ1 N ͕͑aϩb j ͒n j Ϫan j n jϩ1 ͖ .͑44͒ Z(n l ) may be cast in the functional form of Eq. ͑29͒, with F(k) replaced by Here the H j (k)'s are the same as those used in the evaluation of the partition function.This has the effect of replacing the transfer matrix M l (k) by that corresponding to n l exp(Ϫ␤H l (k)), All the other NϪ1 transfer matrices remain unchanged, as do the q(k)'s.Because Z(n l ) is the partition function with base pair l always separated, one can calculate the free energy ⌬G(l) required to separate base pair l as ⌬G͑l ͒ϭ͑ Ϫk B T ͕͒ln Z͑n l ͒Ϫln Z͖ϭϪ ln p l ␤ .͑47͒

Calculation of the average number r ¯of runs
The ensemble average number of runs of transition, r ¯, is given by where Z͑r ͒ϭ ͚ S Q͑n ͒re Ϫ␤͚ jϭ1 N ͕͑aϩb j ͒n j Ϫan j n jϩ1 ͖ .͑49͒ Expressing r as where Z͑n l n lϩ1 ͒ϭ ͚ S Q͑n ͒n l n lϩ1 e Ϫ␤ ͚ jϭ1 N ͕͑aϩb j ͒n j Ϫan j n jϩ1 ͖ .

͑52͒
The expressions Z(n l ) have been calculated above for each l.
The term Z(n l n lϩ1 ) again has the functional form of Eq. ͑29͒, with F(k) now replaced by In this case the transfer matrix M l (k) is replaced by that corresponding to n l n lϩ1 exp"Ϫ␤H l (k)…, while all other transfer matrices remain unchanged.͓Equivalently, one could replace exp "Ϫ␤H l (k)… by n l exp"Ϫ␤H l (k)… and exp"Ϫ␤H lϩ1 (k)… by n lϩ1 exp"Ϫ␤H lϩ1 (k)…, giving two matrices having the form of Eq. ͑46͒.͔

Calculation of the average total twist T We
now consider the ensemble average total twist T ¯of the unpaired ͑open͒ regions, which we write in the form For both cases ͑1͒ and ͑2͒ (ϭ1 and 2͒, we find that PRE 59 3415 EXACT METHOD FOR NUMERICALLY ANALYZING A . . . where The expressions Q ,T that result are functions of n alone, so the evaluations of Z(T ) proceed in the same manner as was described above for the partition function.

Calculation of residual twisting ␣ ¯r
Using Eq. ͑7͒ to express ␣ r in terms of the other deformations, one obtains The procedures used to calculate the two average values on the right hand side of this equation have already been described.

Calculation of ''energy'' H We
now calculate the H ¯'s, the ensemble averages of the effective Hamiltonians of Eqs.͑16͒ and ͑17͒.These averages are used in the calculations of base pair destabilization energies, discussed below.As a reminder, ϭ1 and 2 refer to our different assumptions regarding twisting deformations, with ϭ2 the more generally physically relevant.
For case ͑1͒ (ϭ1), we use the H 1 of Eq. ͑16͒ to obtain The partition function Z 1 has already been calculated.Z 1 (H) is given by where, from Eq. ͑18͒, As the first term of Eq. ͑60͒ has the general form of the partition function of Eq. ͑24͒, and Z 1 (n j ) and Z 1 (n j n jϩ1 ) have been evaluated above, H ¯1 can be calculated using the procedure developed above for Eqs.͑29͒-͑33͒.
Integrating over the j 's, we find that case ͑2͒ reduces to the form of Eqs.͑60͒ and ͑61͒ as well.Specifically, Z 2 has already been evaluated, and Z 2 (H) is given by where

Calculation with fixed base pair separations
Several types of externally imposed conditions may affect the secondary structure of specific base pairs in vivo.Sitespecific DNA binding proteins or enzymes may hold particular base pair͑s͒ either open (n l ϭ1) or closed (n l ϭ0).Alternatively, abasic sites are created when the purine or pyrimidine base at a site is lost.This does not disrupt the continuity of the sugar-phosphate backbone, so the topological constraint is unaffected.However, there being no base at that site, Watson-Crick pairing is impossible.Enforced openings or closures can have a significant effect on the destabilization experienced by other base pairs throughout the domain.For example, the externally enforced separation of a base pair, as occurs at an abasic site, permanently nucleates denaturation at that site, so the large initiation energy a needed to start a run at any other position is not needed there.This increases the probability that additional denaturation will occur in that region over what would be expected in the intact molecule ͓23͔.
The partition function that arises when base pair l is constrained to be open is given by which is a sum only over states where n l ϭ1.Similarly, the partition function with base pair l held closed is ͚ S Q͑n ͒͑ 1Ϫn l ͒e Ϫ␤ ͚ jϭ1 N ͕͑aϩb j ͒n j Ϫan j n jϩ1 ͖ ϭZϪZ͑n l ͒.

͑66͒
If base pairs l and lЈ are both held open, for example, the partition function becomes Z͑n l ,n l Ј ͒ϭ ͚ S Q͑n ͒n l n l Ј e Ϫ␤ ͚ jϭ1 N ͕͑aϩb j ͒n j Ϫan j n jϩ1 ͖ .

͑67͒
The ensemble averages derived previously can all be calculated with these base pairing constraints.For example, the probability n ¯l(lЈ) that site l is separated when site lЈ is held open ͑n l Ј fixed at 1͒ is given by The probability that site l is separated with the base pair at site lЈ held closed is similarly given by For certain averages and under general base pair separation constraints it is necessary to calculate quantities of the general form

͑70͒
This is accomplished by modifying the appropriate transfer matrices, as has been done in Eqs.͑46͒ and ͑54͒ above.

Calculation of destabilization energies
Sites where stress-induced destabilization occurs but is not sufficient to drive denaturation may be important as targets for other molecules, such as helicases, whose activities affect strand separation.If these other molecules can contribute only marginally to the energy needed to denature their target DNA site, they may be able to induce separation only when that site is already partially destabilized.In this way imposed torsional stresses ͑such as those produced by negative superhelicity͒ may facilitate events involving denaturation, even where these stresses do not drive separation to completion.The calculations of destabilization energies developed here are designed to find such partially destabilized regions.
One estimator of the energy required to separate a particular base pair l can be obtained by comparing the usual ensemble average H ¯with the ensemble average H ¯(l) found when base pair n l is held open.The difference ⌬H ¯(l) ϭH ¯(l)ϪH ¯provides a measure of the extent to which base pair l is destabilized by the imposed stresses: the smaller this difference, the more destabilized the base pair ͓28͔.
This effective destabilization energy H ¯(l) must be calculated for each site l.Restoring the subscript referring to twisting assumptions, we have with Z͑n l ,H ͒ϭ ͚ S ͕R ͑ n ͒n l e Ϫ␤ ͚ jϭ1 N ͕͑aϩb j ͒n j Ϫan j n jϩ1 ͖ ͖ϩ ͚ jϭ1 N ͑aϩb j ͒Z ͑n j n l ͒Ϫ ͚ jϭ1 N aZ ͑n j n jϩ1 n l ͒. ͑72͒ The methods to evaluate each term in this expression have been described above.The ⌬H ¯(l)'s are typically less noisy estimators of site-specific destabilization energies than the ⌬G(l)'s of Eq. ͑47͒ ͓28͔; however, they are also computationally more expensive.

Evaluation of an alternate strategy for treating the linking number constraint
As seen from Eq. ͑7͒, we impose an exact constraint on the linking number ␣.Another possible approach could be to use a ''linking number potential'' ͑LNP͒, which we denote by .This strategy would be implemented by retaining ␣ r as an independent variable, using K␣ r 2 /2 in Eqs.͑16͒ and ͑17͒ instead of the rightmost term of Eq. ͑15͒, adding a term Ϫ͓Ϫ(n/A)ϩTϩ␣ r ͔ to the effective Hamiltonians of Eqs.͑16͒ and ͑17͒, and then adjusting until the expectation value of the right side of Eq. ͑7͒ achieves the desired value of ␣.This type of approach, analogous to going for example from the canonical to the grand canonical ensemble, can be effective in cases where the Hamiltonians are homogeneous ͑at least on some length scale͒ in the limit of large systems.One example of such a ''thermodynamic limit'' system is a homogeneous Ising ring of a few thousand sites, where a constraint on total spin and the use of the appropriate mag-netic ''potential'' would give practically indistinguishable results.However, this approach is not well controlled for the systems we treat.
First, this approach is not strictly applicable in our case because the linking number is not an extensive variable in the usual sense.That is, there is no intensive ''linking density'' whose integration over the molecule yields L. To see this, consider two curves.The first is a figure eight with a single contact point, and the second is the same curve after a strand passage through the contact point.As these conformations differ only infinitesimally, all intensive parameters describing them also differ infinitesimally.So the integrals of any intensive quantities, as they are taken over a fixed and finite length, will also differ infinitesimally.But the linking numbers of these configurations differ by 2. It follows that the linking number is not generally computable from an intensive density.Instead it is expressed using a Gaussian double integral.This means it does not depend on strictly local quantities, but rather on how each part of the molecule is positioned relative to every other part ͓47͔.
Second, the thermodynamic limit itself is much more problematic in our model, in part because nonrandom heterogeneity can lead to situations where only a small part or parts of the system are active.As a simple example, inserting 100 AT base pairs into a superhelical circular DNA molecule consisting of 5000 GC base pairs would have a dramatic effect on whether and where denaturation would occur, the property we are interested in.But the addition of 100 sites to a 5 K bp homogeneous Ising ring would in general have a negligible physical effect.This behavior has been observed experimentally: in a circular 4000 base pair plasmid, the removal of a particular 16 consecutive base pairs was shown to dramatically affect when and where local denaturation occurred ͓15͔.In general, it is not clear that the systems we study are in the ''thermodynamic limit'' in the usual sense.
Third, the use of a linking number potential requires taking weighted averages over ranges of linking differences ␣.However, ␣ can only assume values that differ by discrete integers, while use of a LNP places no such constraints on the right side of Eq. ͑7͒.Moreover, when local denaturation first occurs, the case in which we are often most interested, experimental systems ͑and our model͒ are very sensitive to the precise value of the linking number, with small ͑integer-valued͒ changes having large effects on the denaturation behavior.Under these circumstances it is not clear how accurate a weighted average over a continuous range of linking differences would be for our ͑finite͒ systems.
Lastly and perhaps most importantly, as discussed following Eq.͑24͒, direct long-range interactions between all the base pairs are generated in our model when the twist variables are integrated over.However, only nearest-neighbor interactions between base pairs are generated if one uses the LNP approach and integrates over ␣ r and the twist variables.In general, it is not a well controlled strategy to try to calculate the effects of long-range interactions using a model that contains only nearest-neighbor interactions.
Using a ''linking number potential'' approach might lead to accurate results some of the time, perhaps often.However, we do not in general know how well controlled such an approach is for our systems.Indeed, it seems most likely to fail precisely in the cases in which we have the greatest interest.In fact, the only way to reliably test an LNP approach would be to compare with a numerically exact method such as the one presented in this paper.Further, we develop below a fast, approximate but well-controlled and very accurate implementation of the preceding exact method that can simulate problems of biological interest ͑molecules several kilobase pairs long͒ in a few hours on a high-end work station.This makes the implementation of a fast but uncontrolled approximate approach even less important.

Operations count
In this analysis all quantities requiring calculation are sums having the general structure of Eq. ͑29͒: Although the (k)'s generally will differ in the calculation of different quantities, all involve a discrete Fourier transform of Nϩ1 terms ͓see, e.g., Eq. ͑30͔͒.The F(k)'s, which will also typically vary in calculations of different quantities, are expressed as traces of products of N 2ϫ2 transfer matrices.
To describe how these calculations are implemented numerically, we consider three illustrative cases-the calculation of the ensemble average total twist T ¯, of the transition profile ͑which involves evaluating all the n ¯l's, 1рlрN͒, and of the average number of runs r ¯ϭ ͚ jϭ1 N (n ¯jϪn j n jϩ1 ͒.The techniques needed in these cases also apply to all others.We will show how all of these calculations together can be performed in O(N 2 ) steps with O(N) memory.͓A minimal memory of O(N) is required simply for storing the base pair sequence.͔ We consider first the calculation of the partition function Z and the total twist T ¯.In the calculation of Z, the (k) of Eq. ͑73͒ is given by the q(k) of Eq. ͑30͒.In calculating T ϭZ(T )/Z, the (k) for Z(T ) derives from the Fourier transform of the Q T (n) of Eq. ͑57͒.For both Z and Z(T ), F(k) is given by the F(k) in Eqs.͑31͒-͑33͒.
Each F(k) in the sum of Eq. ͑73͒ requires O(N) operations to evaluate, as it involves the multiplication of N 2ϫ2 matrices.Calculating each (k) also requires no more than O(N) operations.As there are Nϩ1 different values of k, a total of O(N 2 ) operations is required to compute Z and T ¯. ͓The prefactor can be reduced somewhat by using the fast Fourier transform, which requires roughly O(N ln N) rather than O(N 2 ) operations to compute all Nϩ1(k)'s; however, O(N 2 ) total operations are still required for the F(k)'s.͔ We now consider the transition profile and the total number of runs r ¯.The N different Z(n l )'s and Z(n l n lϩ1 )'s, necessary to compute the n ¯l's and r ¯, are given by and with F l (k) and F l,lϩ1 (k) given by Eqs.͑45͒ and ͑53͒, respectively.A direct calculation of these 2N quantities would involve O(N 3 ) operations.However, this can be reduced to O(N 2 ) by using the fact that the transfer matrices whose products must be evaluated have a high degree of similarity.Specifically, the matrix products used to evaluate F(k), F l (k), and F l,lϩ1 (k) differ only in the matrix at position l, as was noted previously ͓see Eqs.͑46͒ and ͑54͔͒.
To take advantage of this similarity, we compute two sets of matrix products P l (L) and P l (R) , 1рlрN.P l (L) is the ''left'' product of the transfer matrices from 1 to l, ) is the ''right'' product of the transfer matrices from l to N, Recursive evaluation of all the P l (L) and P l (R) matrices involves O(N) operations, and their storage requires O(N) space.Once these matrices have been calculated, F l (k) may then be evaluated as and F l,lϩ1 (k) as with M l Ј and M l Љ given by Eqs.͑46͒ and ͑54͒.This again requires O(N) operations for all N values of l.Hence, using this approach, all N of the F l (k)'s and F l,lϩ1 (k)'s for a given l may be computed in O(N) time using O(N) memory.Therefore, the separation probabilities n ¯l and average number of runs r ¯can all be calculated in O(N 2 ) time with O(N) memory.This matrix multiplication procedure is numerically stable.
A possible alternative procedure with the same time and memory scaling would be to define Starting with one could recursively calculate all the P l 's: Using the cyclic property of the trace, one then has However, because of the repeated application of both M l 's and (M l ) Ϫ1 's, this alternative procedure is numerically unstable.
It requires an additional O(N 2 ) steps to compute the above quantities for a given set of imposed base pair separations or closures.Hence it requires O(N 3 ) steps and O(N 2 ) memory to calculate all N destabilization energies ⌬H ¯(l), 1рlрN, from Eqs. ͑71͒ and ͑72͒.
One can use the following procedure to reduce the number of operations required by approximately a factor of 2. We will illustrate with the calculation of the partition function Zϭ ͚ kϭ0 N F(k)q(k) since, as discussed above, calculations of all the observables involve similar techniques.
Because m is an integer, it follows from Eq. ͑30͒ that where the star denotes complex conjugation.Similarly, since nϭ ͚ jϭ1 N n j can also assume only integer values.If N is even, we hence have where Re(z) denotes the real part of a complex number z.If N is odd, we obtain Re͕F͑k ͒q͑ k ͖͒.͑88͒ Since only half the k values are now required, the computational time is halved.
As mentioned above, the F(k) terms in Eq. ͑73͒ derive from the chemical properties of denaturation, while the (k)'s are determined by mechanical properties associated with the twisting deformations and residual superhelicity.Thus the same set of F(k)'s can be used with different (k)'s if only twisting parameters are changed ͑and vice versa͒.In particular, once the F(k)'s have been calculated for one value of the linking difference, they do not need to be recalculated to handle additional values.So the incremental cost of treating a range of linking differences is reduced.

Catastrophic cancellation
This algorithm can experience a severe sign cancellation problem when applied to large DNA molecules.A loss of precision occurs when certain summations are performed, because the magnitude of the final sum is much smaller than the magnitude of the largest term in its summand.If the ratio of the total sum to this largest term becomes smaller in magnitude than machine precision, then the final sum will consist only of round-off noise.In our calculations this ratio becomes small exponentially with molecular length.
To illustrate how this problem arises, consider a simple example.Suppose that the energy of initiation a is zero and that the base pair separation energies b j are also zero.Further, assume that Q(n) has the form Q͑n ͒ϭe Ϫn , ͑89͒ with Ͼ0.This simplified Q(n) shares the essential feature with the Q(n)'s of Eqs.͑20͒ and ͑23͒ in that it decays with an exponent that is asymptotically linear in n, for large n.With these assumptions, the partition function Z becomes In our algorithm Z is written in the form of Eq. ͑29͒, where Under the conditions of this example, z(0) may be shown to have the real part of greatest magnitude of all the z(k)'s.

͑95͒
Hence, the ratio of the partition function Z to its largest term z(0) obeys which decays exponentially with system size.For any fixed machine precision, there will be a molecular length N beyond which the calculation of Z will consist entirely of round-off error.Sample calculations have been performed to determine the extent of this problem in realistic cases.Molecules of varying lengths were analyzed, each at a linking difference ␣ϭϪ0.055L0 which corresponds to physiological levels of superhelicity in bacteria.The energy parameters used in these calculations are the ones which have been shown to apply in the environmental conditions of the experimental procedure of Kowalski and co-workers, by which superhelical denaturation is detected ͓16,48͔.The sequences analyzed were the first N base pairs ͑i.e., jϭ1 . . .,,N͒ of the pBR322 DNA molecule, for 100рNр2400.The transition probability p l of each base pair was calculated, both in quadratic precision ͑on a 32 bit machine͒ and using the arbitrary precision FORTRAN package developed by Bailey ͓49͔ with 200 decimal digits of precision.The degradation of precision of the quadratic precision calculation with molecular length was measured by determining the number of decimal digits of agreement between the p l 's calculated each of the two ways, and selecting its minimum value over the sequence analyzed 1рlрN.͑We note that the measured average number of decimal digits of agreement over the entire sequence differed from this minimum value by less than 1% in all calculations.͒ The results of this procedure are shown in Fig. 1.The number of digits of accuracy of the quadratic precision calculation fell from the maximum of 33.7 that is available in the Hewlett-Packard implementation of quadratic precision to zero at a molecular length of Nϭ2400 base pairs.The observed decrease in the number of significant digits was nearly linear with molecular length, as the regression line shows, in qualitative agreement with the exponential form of Eq. ͑96͒.
These results demonstrate the need to implement this algorithm in high precision arithmetic.The calculations needed to analyze a 10 000-bp molecule under these conditions can be estimated from these sample runs to suffer the loss of approximately 140 decimal digits of accuracy due to catastrophic cancellation.Thus high precision implementations using floating point arithmetic with 200 decimal digits of accuracy will generally suffice to analyze biological sequences of this size.Our calculations were implemented using Bailey's multiprecision FORTRAN package MPFUN, which allows the user to specify the level of precision ͓49͔.
Implementing these calculations using multiprecision arithmetic significantly slows their execution speed.The CPU times required in the sample calculations described above are shown in Fig. 2 as functions of molecular length.These computations were performed with 200-decimal-digit accuracy on one R10000 64-bit processor of a Silicon Graphics Power Challenge computer.These execution times grow quadratically with molecular length N, as expected.
As noted above, lower precision calculations suffice for small molecules, while higher precision is required to treat larger ones.If the run time for a particular problem depended strongly on the required precision, an additional scaling with FIG. 1. Stress-induced denaturation is analyzed in molecules of different lengths whose linking differences are chosen so that each has a superhelix density ϭ␣/L 0 ϭϪ0.055, a common physiological value.This analysis is performed using the exact method in both quadratic precision and in high precision with 200 decimal digits of accuracy.The number of significant digits of accuracy of the quadratic precision implementation was assessed by comparing its extent of agreement with the high precision results in each case.The figure plots this accuracy at various sequence lengths as squares, and the fitted regression line is also shown.One sees a rapid, linear, loss in the number of significant digits with sequence length due to catastrophic cancellation.Extrapolation of this regression line allows one to estimate the arithmetic precision needed to achieve a prescribed accuracy in the analysis of a DNA domain of any length.
system size N could be introduced.However, the run time for calculations of up to 1000 decimal digits of accuracy in the multiprecision implementation of Bailey used here is dominated by the overhead from invoking the arbitrary precision subroutines, so that execution speed does not depend significantly on the precision specified ͓49͔.

Accelerated algorithm
Under typical physiological conditions the most frequently occupied states of superhelical DNA denaturation will have a relatively small number of open base pairs.We have taken advantage of this fact to develop a modified algorithm that retains a prescribed degree of accuracy at greatly reduced computational cost.In essence, this confines attention in a controlled way to the terms that dominate the partition function.
Combining Eq. ͑24͒ for the partition function Z with Eq. ͑25͒ for Q(n), rearranging terms, and dropping the subscript referring to different treatments of the twist, we obtain where

͑98͒
The index m in Eqs.͑97͒ and ͑98͒ refers to the number of open base pairs in the states being considered.For each m the set of states summed over in Eq. ͑98͒ are those for which the total number of open base pairs is ͚ jϭ1 N n j ϭm.In what follows we denote by Z (M ) the partial sum of the terms in the partition function up to and including mϭM : To find an upper bound M that suffices to guarantee a specified level of accuracy we proceed as follows.For each value of m, we find a lower bound Z L (m) and an upper bound Z U (m) for the term Z(m): Hence, The expression on the right hand side of this inequality is an upper bound on the fractional accuracy that is sacrificed when the true partition function Z is replaced by Z (M ) , which is the value obtained when states having more than M simultaneously open base pairs are ignored.Suppose that, for a specified ⑀, we can find an M such that Then Eq. ͑103͒ shows that this ⑀ also bounds the error arising when Z is approximated by Z (M ) : We note that the same bound also applies to calculations of other quantities needed to evaluate ensemble averages.For example, consider the quantity Z(n l ) that is defined in Eq. ͑44͒ and used to calculate the probability p l that base pair l is separated.Denote by ͕Z(n l )͖(m) the contribution to Z(n l ) from all states with m open base pairs.Because 0 рn l р1, ͕Z(n l )͖(m)рZ(m) for all m.Therefore

͑109͒
where bϭb min when calculating Z U (m), and bϭb max when calculating Z L (m).One may then compute easily calculable bounds on Z L and Z U by considering bounds on the sum The ratio of successive terms in this sum is which is monotonically decreasing with r.For given m and N, the rϭ1 term f m (1) is the largest of the f m (r)'s when ͑ mϪ1 ͒͑ NϪmϪ1 ͒Ͻ2e ␤a .͑112͒ This will be true for all m when it holds for the m which makes the left hand side of this inequality largest, which is mϭN/2.At Tϭ310 °K with aϭ10.84 kcal/mol, one finds that the above inequality holds for all m whenever N Ͻ18 700 base pairs.Because the ratio f m (rϩ1)/ f m (r) is monotonically decreasing, its largest value is m ϭ f m (2)/ f m (1).This yields the following bounds: Moreover, when e Ϫ␤a (NϪmϪ1)(mϪ1)Ͻ1, one has m Ͻ0.5 for all m, so that the upper bound does not exceed 2 f m (1).Under the above posited conditions this occurs for all m whenever NϽ13 200 base pairs.Also, f m (1)ϭe Ϫ␤a , independent of m.Insertion of these bounds into Eqs.͑101͒ and ͑102͒ gives the upper bound and the lower bound

͑115͒
It requires O(N) operations in total to calculate B L and all the upper bounds B U (M ) (0рMрN).If we can find an M for which B U (M )/B L Ͻ⑀, then Eqs.͑103͒, ͑114͒, and ͑115͒ together show that ⑀ also provides an upper bound on the error that arises when one disregards states with more than M open base pairs.This simplification reduces the operation count from O(N 2 ) to O(M N), which in practice can reduce the computational time by an order of magnitude or more.For example, consider the pBR322 DNA molecule containing 4363 base pairs, short enough for the above bounds to be valid under physiological conditions.This analysis guarantees that an accuracy of at least ⑀ϭ10 Ϫ9 will be achieved when M ϭ442.In practice the actual accuracy may greatly exceed that suggested by the above estimate.
We also note that, in the accelerated algorithm, both the summation variables m and k in Eqs.͑29͒-͑33͒ now take on only the values ͓0, . . ., M͔ rather than ͓0, . . ., N͔.Hence, for a given temperature and base pair sequence, each calculation at an additional linking difference requires only the smaller O(M 2 ) incremental computational time.

Performance of the accelerated algorithm
Sample calculations were performed to evaluate the dependence on M of the execution time and accuracy achieved by this accelerated algorithm.The strand separation behavior was analyzed in pBR322 DNA ͑Nϭ4363 base pairs͒ supercoiled to a linking difference of ␣ϭϪ27 turns.This is the linking difference the molecule is found to have, on average, when it is extracted from bacteria.The energy parameters were assigned the values that apply under the conditions of the nuclease digestion procedure by which superhelical strand separation is experimentally detected ͓16,49͔.
The analysis was performed using the accelerated algorithm with various values for the upper bound M in the range 100рM р200 base pairs.For comparison this transition was also analyzed using the complete exact algorithm ͑i.e., M ϭN͒.In both cases the probability p l of strand separation was calculated for every base pair 1рlрN.For each value of M the accuracy of each p l was calculated as A͑l ͒ϭϪlog 10 ͉p l Ϫp l ͑ M ͉͒, ͑116͒ the number of decimal digits of agreement with the exact value.͓Here p l and p l (M ), respectively, denote the values calculated using the exact and accelerated algorithms, the latter with threshold M.͔ The accuracy of the accelerated algorithm was evaluated by finding both the minimum value of A(l) over the entire sequence and its average value.These results show that the accelerated algorithm achieves very high accuracy at values of M considerably smaller than those estimated in the previous section.More than 18 decimal digits of accuracy are achieved when M ϭ200, which is less than 5% of the number of base pairs in this molecule.Figure 3 also displays the dependence of execution time on M when the calculations are performed on an HP 9000/ 735 computer, which has a RISC-based 32-bit processor.The CPU time required to perform these calculations is seen to increase linearly with M to high accuracy.
The above results show that under reasonable conditions one can retain a very high degree of accuracy with a moderate value of M Ϸ0.05N, reducing the required computation time by an order of magnitude or more.The value of M can be selected using formulas giving rigorous bounds, as was done in the preceding section, or M can be estimated and the accuracy checked by comparing simulation results for different values of M. We find that simulations analyzing DNA molecules having lengths of biological interest ͑several kilobase pairs͒ typically require a few hours on a high-end workstation, which makes the accelerated algorithm a practical method for calculating superhelical strand separation behavior under a wide variety of circumstances.

In vitro experiments
The locations and extents of in vitro superhelical denaturation can be determined experimentally using the mung bean nuclease digestion procedure developed by Kowalski, Natale, and Eddy ͓48͔.This enzyme cuts single strands of DNA but does not cut the duplex.Sites of cutting may then be located by sequencing, and the relative frequencies of cutting at different locations may be determined.The most detailed experimental analysis of superhelical denaturation applied these procedures to the pBR322 DNA molecule (N ϭ4363 bp) ͓48͔.
Sample calculations have been performed on the pBR322 DNA molecule using the exact algorithm developed above.The linking difference was chosen to be ␣ϭϪ27 turns, consistent with physiological values.Copolymeric transition energies were assumed, which ascribe one value b AT to each AT base pair, and another value b GC to each GC base pair.These and all other energy parameters were assigned values that were previously shown to be accurate under Kowalski and Eddy's experimental conditions ͓16͔. Figure 4 shows the results of these calculations.The top portion of Fig. 4 gives the computed transition profile, the graph of the ensemble average probability p l of denaturation versus position l.The bottom graph depicts the variation of the destabilization free energy ⌬G(l) with position, as calculated from p l using Eq.͑47͒.The bars denote the locations where denaturation has been experimentally determined to occur ͓48͔.
Denaturation under these conditions was found experimentally to be confined to two locations.The primary location is between positions 3181 and 3300, coincident with the terminator of the ␤-lactamase gene.The secondary location is between positions 4130 and 4250, at the promoter of the same gene.The amount of denaturation detected at the secondary site was 7% of that found at the primary site.The predictions of the exact method, like those of the previously developed approximate method, are in precise quantitative agreement with these experimental results.Transition is predicted to be confined to these two sites.Moreover, the areas under the transition probability curve in these regions, which give the expected number of denatured base pairs in each, agree with the relative amounts of denaturation experimentally observed there.This shows that the exact method developed here, applied to the model of Eq. ͑17͒ with no adjustable parameters, can provide quantitatively correct predictions of the denaturation behavior of DNA under the conditions of the nuclease digestion procedure by which it is experimentally detected ͓48͔.It also suggests that the sites that are destabilized by superhelical stresses do not occur at random, but rather coincide with specific regulatory regions.

Comparison with in vivo results
Calculations performed using a previously developed, approximate, method have demonstrated close associations between stress-destabilized sites and several specific classes of regulatory regions ͓1,28͔.Experiments have established that stress-induced denaturation does occur in vivo ͓19͔.Here we assess the accuracy of the presently developed technique at predicting in vivo stress-induced denaturation by analyzing the 4-kb region containing the yeast FBP1 gene sequence used in these experiments.We note that this region is not circular in vivo.However, because the superhelical constraint of imposed linking difference is functionally identical in circular and in looped domains, one can treat regions that are not circular by a simple modification of the approach presented above.One simply conceptually closes the region into a circle by connecting its ends with a short run of GC base pairs, and then imposes a linking difference on the resulting domain.If the region were circularized by directly joining its ends instead of connecting them with an insert, one would run the risk of creating a spurious susceptible site in cases where the ends were reasonably AϩT rich.A GϩC-rich insert is chosen because it has a high energy of denaturation, and hence will join the ends without either assisting their destabilization or itself constituting an introduced destabilized site.͓Alternatively, one can join the two ends with a base pair which is constrained to remain closed ͑bonded͒.͔ The results of this analysis, applied to the model of Eq. ͑17͒, are shown in Fig. 5.Here the linking difference selected is that which gives the level of torsional stress found in extracted plasmids.The bar denotes the region where denaturation was experimentally detected in vivo.One sees that this analysis provides a quantitatively precise depiction of the denaturation experienced in this region, even though the in vivo conditions are much more complex than those envisioned in the present model.
To determine what effect the alternative methods of treating the twists j might have on the results, sample calculations using each assumption were performed on this sequence.In case ͑1͒ the j 's are assumed all to have the same value , which equilibrates with the residual superhelicity ␣ r .This was the assumption made in the previously developed, approximate method ͓5͔.In case ͑2͒ the j 's can fluctuate independently.Figure 6 shows the profiles calculated in these two cases.The differences between the two profiles are quite slight, and are confined to the boundaries of the denaturing region.
All the locations that denature in the two sequences analyzed in this section serve important regulatory functions.In the pBR322 plasmid denaturation is confined to two regions, the terminator and the promoter of the amp r gene.In the yeast sequence denaturation occurs only at the terminal region of the FBP1 gene.These and many other results show that sites of stress-induced destabilization are closely associated with several types of regulatory regions ͓1,28,21͔.This suggests that the interplay between base sequence and torsional stress provides a biologically important mechanism for regulatory activity.

DISCUSSION
This paper presents a method for calculating equilibrium local denaturation ͑strand separation͒ properties of superhelical DNA having kilobase lengths and specified sequences.The effective Hamiltonian includes the energies of denaturation of the AT and GC base pairs, the energies associated with the torsional deformations of the denatured regions, and interactions between denaturation and torsional deformations induced by the topological constraint of constant linking number L. The partition function and ensemble averages are calculated in a formally exact manner from the states of the system and their given energies.Through the introduction of auxiliary variables, the calculations can be implemented by an algorithm that is stable and has quadratic complexity with molecular length N. The computations can require relatively long execution times, however, since they must in general be implemented in arbitrary precision.An accelerated algorithm The methods developed here can be applied either to circular or, by a simple modification, to looped domains which are not topologically circular.Comparison with experiments indicates that this method, with no adjustable parameters, can provide quantitatively precise predictions for the locations and extents of superhelical DNA denaturation, both in vitro and in vivo.
Our approach has numerous advantages over the other techniques that have been previously used to treat superhelical denaturation.First, it correctly includes the topological constraint imposed by the domain structure, be it circular or looped, which is the fixing of the linking number L. Second, it permits the two inherently flexible single strands comprising a separated region to twist around each other.In this paper we present two levels of detail with which this twisting can be treated-either as a mechanical equilibrium torsional deformation or as a locally fluctuating quantity.Sample calculations show that these two alternatives give very similar results.Third, the approach correctly includes the exact contributions from all states, weighted according to their Boltzmann factors.As this is the only method able to evaluate the exact equilibrium distribution in polynomial time, it clearly improves on other techniques which either calculate an approximation of, or only sample from, that distribution.
This exact method can also treat many types of interesting cases that the other approximate methods, as presently formulated, cannot handle.These include near-neighbor base pair identity effects and alterations in the energies of base pair separation that result from such events as base methylation, protein or ligand binding, or the presence of pyrimidine dimers or abasic sites.Structural alterations that decrease the energies of denaturation of the base pairs involved have been predicted to significantly affect the transition behavior of stressed DNA ͓50͔.The method can treat cases where one or more of the base pairs is externally constrained to remain open or closed, and is applicable to transitions at any temperature.
The approach presented in this paper can easily be extended to include the possibility of other competing transitions, such as cruciform extrusion at inverted repeat sequences, or Z-DNA or H-DNA formation at sites having the local base sequences required by these alternate conformations.An analysis in which c different conformations compete requires cϫc transfer matrices.However, at present there is no compelling experimental evidence to suggest that transitions to any DNA conformations other than local denaturation serve biological functions; and, in any case, the energetics of these alternative transitions are not now known with sufficient precision to enable quantitatively accurate predictions of multistate competing transitions to be made.
This approach also can be easily extended to include the possibility of a sequence-dependent nucleation energy a ͓32͔.This could be important, for example, when analyzing the effects of the presence of abasic sites on transitions.
Future work will include the application of this method to a variety of DNA sequences for comparison with experimental data.The possibly important effects of various types and locations of defects, as mentioned above, will also be explored.

N
FIG.2.The CPU time required for an exact calculation using Bailey's MPFUN high precision FORTRAN package ͓49͔ with 200 decimal digits of accuracy is plotted as a function of molecular sequence length ͑squares͒.These calculations were performed on a single R10000 64-bit RISC-based Silicon Graphics processor.The curve gives the best quadratic fit to the data points.
Figure 3 plots these two values as functions of the bound M.

FIG. 3 .
FIG.3.The top graph plots the average accuracy ͑squares͒ and the minimum accuracy ͑circles͒ achieved by the accelerated algorithm as functions of the bound M imposed on the number of denatured base pairs in a state.The bottom graph plots the execution time of the accelerated algorithm as a function of M, which is seen to be linear in M to high accuracy.These calculations were performed on the pBR322 DNA sequence referred to in the text, and implemented on a dedicated HP 9000/735 RISC-based work station.

FIG. 5 .
FIG.5.The top graph shows the ensemble average probability of denaturation of each base pair in the 4 kb region of yeast DNA containing the FBP1 gene sequence.A linking difference of Ϫ22 turns was assumed, which gives the level of stress commonly found in DNA that has been extracted from living cells.The lower graph shows a detailed view of the region where denaturation is predicted to occur.In both graphs the bar indicates the region where denaturation is experimentally detected.