lattice/quda: QUDA v1.0.0

maddyscientist; Mathias Wagner; Evan Weinberg; Alexei Strelchenko; Dean Howarth; Buck Babich; Justin Foley; Alejandro Vaquero; Simone Bacchio; nmrcardoso; Michael Cheng; Balint Joo; windy510; Frank Winter; Bartosz Kostrzewa; chris-schroeder; Carleton DeTar; Robert Maynard; Eloy Romero; Mario Schröck; Xiao-Yong; Matthew R Johnson; Filippo Spiga; Evan Berkowitz; walkloud

Version 1.0.0 - 10 January 2020
  • Add support for CUDA 10.2: QUDA 1.0.0 is supported on CUDA 7.5-10.2 using either GCC or clang compilers. CUDA 10.x and either GCC >= 6.x or clang >= 6.x are highly recommended.

  • Significant improvements to the CMake build system and removal of the legacy configure build.

  • Added more targeted compilation options to constrain which precisions and reconstruct types are compiled. QUDA_PRECISION is a cmake parameter that is a 4-bit number corresponding to which precisions are enabled, with 1 = quarter, 2 = half, 4 = single and 8 = double, the default is 14 which enables double, single and half precision. QUDA_RECONSTRUCT is a 3-bit number corresponding to which reconstruct types are enabled, with 1 = reconstruct-8/9, 2 = reconstruct-12/13 and 4 = reconstruct-18, the default is 7 which enables all reconstruct types.

  • Completely rewritten all dslash kernels using the accessor framework. This dramatically reduces code complexity and improve performance.

  • New physics functionality added: gauge Laplace kernel, Gaussian quark smearing, topological charge density.

  • QUDA can now be built to either utilize texture-memory reads or to use direct memory accessing (cmake option QUDA_TEX). The default has textures on, though we note that since Pascal it can be advantageous to disable textures and utilize direct reads.

  • QUDA is no longer supported on the Fermi generation of GPUs (sm_20 and sm_21). Compilation and running should still be possible but will require compilation with texture objects disabled.

  • Added supported for quarter precision (QUDA_QUARTER_PRECISION) for the linear operator and associated solvers.

  • Implemented both CA-CG and CA-GCR communication avoid solvers, for use either as stand-alone solvers or as a means to accelerate multigrid.

  • Continued evolution and optimization of the multigrid framework. Regardless, we advise users to use the latest develop branch when using multigrid, since it continues to be a fast-moving target with continual focus on optimization and improvement.

  • An implementation of the Thick Restarted Lanczos Method (TRLM) for eigenvector solving of the normal operator.

  • Lanczos-accelerated multigrid through the use of coarse-grid deflation and / or using singular vectors to define the prolongator.

  • Removal of the legacy contraction and co-variant derivative algorithms, and replacement with accessor-based rewrites.

  • Improved heavy-quark residual convergence which ensure correct convergence for MILC heavy quark observables.

  • Experimental support for Just-In-Time (JIT) compilation using Jitify.

  • Significantly improved unit testing framework using ctest.

  • QUDA can now be built to target Google's address sanitizer (CMAKE_BUILD_TYPE option is SANITIZE) for improved debugging.

  • QUDA can now download and install the USQCD libraries QMP and QIO automatically as part of the compilation process. To enable this, the option QUDA_DOWNLOAD_USQCD=ON should be set. Similarly to Eigen installation this requires access to the outside internet.

  • QUDA can now download and install the ARPACK library automatically if the QUDA_DOWNLOAD_ARPACK option is enabled.

  • Updated to CUB 1.8.

  • Multiple bug fixes and clean up to the library. Many of these are listed here:

