Efficient Coding of the Minimum Image Convention

Abstract The computation of distances between interactions sites while observing the minimum image convention is one of the most frequently performed operations in computer simulations of fluids. In this work several ways of encoding the minimum image convention are discussed and their performances on different computers compared. It turns out that the optimal choice for the algorithm depends on the computer type as well as the optimization level of the compiler, and that a well adapted algorithm can achieve a significant reduction of the computation time.


Introduction
One of the most frequently performed calculations in computer simulations of fluids -Monte Carlo as well as molecular dynamics simulations -is the computation of the distance between two particles (or interaction sites), and evidently the efficiency of this computation can have a significant impact on the performance of a simulation program. This distance computation, however, is complicated by the fact that the particle numbers in typical molecular computer simulations are of the order of merely 10 2 -10 4 , and that the "thermodynamic limit" (infinite sample size) is realized approximately by means of the periodic boundary conditions, i.e., by surrounding a simulation ensemble with identical replicas, thus creating an infinite, periodic superlattice of simulation boxes. In such a superlattice, interactions are computed between the nearest replicas of particles only. This so-called minimum image convention requires additional computation steps.
Efficient code implementations of the minimum image convention had already been addressed by a publication in 1998 [1], but as computer architectures as well as compilers have changed over the past decade, it may now be worthwhile to reconsider this problem with modern computers.

The algorithms
The Euclidian distance between a particle at the location r 0 and one at r 1 is with Δx = x 1 − x 0 and similar expressions for the other coordinates. In a system with periodic boundary conditions, the one-dimensional distance corresponds to a set of values Here b denotes the length of the simulation box or, in other words, the lattice constant of the superlattice. The minimum image convention then corresponds to choosing k in such a way that |Δx| becomes minimal: Δ mic x is known as the remainder of Δx with respect to the box length b.
The problem is now that the calculation of the remainder is not part of the instruction set of the X86 family of microprocessors [2], which nowadays seems to dominate the field. 1 Therefore the remainder calculation has to be performed otherwise, and we will explore now some ways to do so.
The best way to calculate remainders may depend on the way how the coordinates of the molecules are stored with respect to the periodic boundary conditions. There are three different concepts: A. For all atoms or interaction sites, absolute coordinates are stored. If, in the course of the simulation, a site moves out of the primary simulation box, it is "folded back" by adding ±b to the relevant coordinate. Consequently, all interaction sites have coordinates within the primary simulation box. This simplifies the calculation of distances, but it makes it awkward to rotate a molecule. B. For the center of each molecule (the center of gravity or simply a selected site), absolute coordinates are stored; for the other sites, relative coordinates with respect to the center are stored. If the center moves out of the primary simulation box, it is folded back.
With this concept, shifting or rotating a molecule is easy. But now the interaction sites of a molecule may lie outside the primary box, even if its center is inside. C. The molecules are never folded back, but can drift away from the primary simulation box unhindered. Consequently, the molecules can move to locations far outside the primary simulation box. With this storage concept it is possible to calculate mean squared displacements and other diffusion-related properties.
In the following sections we will outline some ways to encode the minimum image convention, using C/C++ notation. In the algorithms, box denotes the box length b and dx the raw difference of the coordinates of two sites (Δx), which is then replaced by the remainder. Moreover, we define int k; double box_r = 1.0 / box; // reciprocal box length double box2 = 0.5 * box; double box2_r = 1.0 / box2;

Class A algorithms
If all interaction sites are in the same simulation box, the distance between two arbitrary sites along any coordinate axis cannot exceed the box length, hence −b ≤ Δx ≤ +b. A1: A simple, but efficient way to calculate remainders is then doing an implicit doubleto-integer cast operation, which is usually rather fast ([1], Alg. 2): With the default settings of the compilers, such a cast operation is equivalent to a rounding towards zero.
Of course, any Class B or C algorithm can also be used here, too.

Class B algorithms
If the centers or primary sites of all molecules are in the same box, but the other sites can be away from the centers by a distance L, the distances between two arbitrary sites must be in the range −b − L ≤ Δx ≤ +b + L. We must demand that the algorithm works reliably over this range of distances.
B1: The remainder can then be calculated by an extension of Algorithm A1, namely by doing the "cast" step twice ([1], Alg. 3). The working range of this algorithm is −1.5b ≤ Δx ≤ +1.5b: B2: But also a simple conditional clause can be used ( Here no multiplications are performed. The working range of this algorithm is the same as that of B1.
B3: As the sign of Δx does not matter for the calculation of the Euclidian distance, the previous method can be simplified: Again, the working range of this algorithm is the same as that of B1.
B4: Finally, the problem of dealing with negative distances can be eliminated by adding a multiple of the box length ( [1], Alg. 4). So the following algorithm can be used for the range −3.5b ≤ Δx ≤ +∞: Users should be aware that the Class B algorithms might be problematic in simulations with boxes of variable size (isobaric simulations), where the condition b > 2r can possibly be violated.

Class C algorithms
Algorithms in this class calculate remainders at any distance. The difference between round() and nearbyint() seems to be that the behaviour of the latter can be controlled by some external flags.

Application
The algorithms from the previous section were used to compute a complete site-site distance table for an ensemble of 256 diatomic molecules. The configuration of this ensemble was the result of an N pT Monte Carlo simulation. Within the usual statistical uncertainties, the configuration had neither preferential orientations nor any long-distance order. The simulation program employed the Class C storage concept, i.e., the molecules could move to coordinates outside the primary simulation box. For the test of the Class A and B algorithms, the coordinates of the "escaped" molecules were folded back to the primary box. The computers used for the comparisons were as follows: depends on temperature and system activity, care was taken to run the test programs at a low and constant overall system load. The results are shown in Table 1.
The overall decrease of the CPU time requirements from machine 4 to 1 reflects the progress in microprocessor technology. It is evidently advisable to use modern computers for simulations. Of course, this insight is neither new nor the topic of this work. More suprising is the effect of the optimization level: For some algorithms it is rather small; the performance of algorithm C2, however, is simply disastrous when compiled at level -O3, and good at level -Ofast. This underscores the importance of selecting the correct optimization level of the compiler.
For a comparison of the various algorithms it is perhaps better to look a relative CPU times. Figure 1 contains a logarithmic representation of the relative speeds of the algorithms, t C1 /t i , where t C1 is the CPU time required for the C1 algorithm (which invokes the remainder() library function). Figure 1 shows that the CPU time requirements of the various algorithms depend on the processor architecture, which confirms our earlier results [6]. Some observations can be made: • Of the Class B methods, B3 is best (except for the HP-PA processor). At the highest optimization level, it can even be faster than A1. • C1 is never very good, although on the AMD processors the speed differences are not as pronounced as on the Intel processors. give good performances irrespective of the optimization level. • "Fast math" optimization brings the performances of the different algorithms to a similar level. But even so, significant differences remain.
One should be aware, however, that the "fast math" optimization of the GNU as well as the Intel C++ compiler involves unsafe optimizations which may possibly lead to wrong numerical results [7,8]. No adverse effects were observed for our test programs nor for our simulation programs 3 , but this may be different for other programs.
Note to Fortran users: Most of the C library functions used in this article have Fortran counterparts; type cast operations are handled by modern Fortran compilers implicitly. Therefore most of the algorithms discussed here can easily be translated to Fortran. A Fortran77 version of our program based on Algorithm C3, but using the Fortran library function NINT() to compute the nearest integer (as proposed, for instance, in Algorithm 5 in [9]), was slower than the fastest algorithms in Table 1 by a factor of 1.5-2.4 for X86-based processors, depending on the optimization level. On machine 4, the slowdown amounted to a factor of 30.
We conclude that generally the algorithms C5 and C6 give very good performances for most computers, compilers, and optimization levels. But as microprocessors as well as compilers are undergoing a rapid evolution, programmers of computer simulations should be encouraged to make their own performance tests.
In a full simulation program, the computation of distances takes up a part of the total CPU time only; larger shares of the CPU time are required for the calculation of molecule movements and the evaluation of interaction energies and forces. A comparison of some Monte Carlo simulations for a Lennard-Jones fluid shows, however, that the time savings achieved by choosing an efficient encoding of the minimum image convention can still be significant (Table 2).