Software Open Access
maddyscientist; Mathias Wagner; Evan Weinberg; Alexei Strelchenko; Dean Howarth; Buck Babich; Justin Foley; Alejandro Vaquero; Simone Bacchio; nmrcardoso; Michael Cheng; Balint Joo; windy510; Frank Winter; Bartosz Kostrzewa; chris-schroeder; Carleton DeTar; Robert Maynard; Eloy Romero; Mario Schröck; Xiao-Yong; Matthew R Johnson; Filippo Spiga; Evan Berkowitz; walkloud
Add support for CUDA 10.2: QUDA 1.0.0 is supported on CUDA 7.5-10.2 using either GCC or clang compilers. CUDA 10.x and either GCC >= 6.x or clang >= 6.x are highly recommended.
Significant improvements to the CMake build system and removal of the legacy configure build.
Added more targeted compilation options to constrain which precisions and reconstruct types are compiled. QUDA_PRECISION is a cmake parameter that is a 4-bit number corresponding to which precisions are enabled, with 1 = quarter, 2 = half, 4 = single and 8 = double, the default is 14 which enables double, single and half precision. QUDA_RECONSTRUCT is a 3-bit number corresponding to which reconstruct types are enabled, with 1 = reconstruct-8/9, 2 = reconstruct-12/13 and 4 = reconstruct-18, the default is 7 which enables all reconstruct types.
Completely rewritten all dslash kernels using the accessor framework. This dramatically reduces code complexity and improve performance.
New physics functionality added: gauge Laplace kernel, Gaussian quark smearing, topological charge density.
QUDA can now be built to either utilize texture-memory reads or to use direct memory accessing (cmake option QUDA_TEX). The default has textures on, though we note that since Pascal it can be advantageous to disable textures and utilize direct reads.
QUDA is no longer supported on the Fermi generation of GPUs (sm_20 and sm_21). Compilation and running should still be possible but will require compilation with texture objects disabled.
Added supported for quarter precision (QUDA_QUARTER_PRECISION) for the linear operator and associated solvers.
Implemented both CA-CG and CA-GCR communication avoid solvers, for use either as stand-alone solvers or as a means to accelerate multigrid.
Continued evolution and optimization of the multigrid framework. Regardless, we advise users to use the latest develop branch when using multigrid, since it continues to be a fast-moving target with continual focus on optimization and improvement.
An implementation of the Thick Restarted Lanczos Method (TRLM) for eigenvector solving of the normal operator.
Lanczos-accelerated multigrid through the use of coarse-grid deflation and / or using singular vectors to define the prolongator.
Removal of the legacy contraction and co-variant derivative algorithms, and replacement with accessor-based rewrites.
Improved heavy-quark residual convergence which ensure correct convergence for MILC heavy quark observables.
Experimental support for Just-In-Time (JIT) compilation using Jitify.
Significantly improved unit testing framework using ctest.
QUDA can now be built to target Google's address sanitizer (CMAKE_BUILD_TYPE option is SANITIZE) for improved debugging.
QUDA can now download and install the USQCD libraries QMP and QIO automatically as part of the compilation process. To enable this, the option QUDA_DOWNLOAD_USQCD=ON should be set. Similarly to Eigen installation this requires access to the outside internet.
QUDA can now download and install the ARPACK library automatically if the QUDA_DOWNLOAD_ARPACK option is enabled.
Updated to CUB 1.8.
Multiple bug fixes and clean up to the library. Many of these are listed here: https://github.com/lattice/quda/milestone/21?closed=1
|All versions||This version|
|Data volume||8.3 MB||3.9 MB|