Software Open Access

lattice/quda: QUDA v1.1.0

maddyscientist; Mathias Wagner; Dean Howarth; Evan Weinberg; Alexei Strelchenko; Jiqun Tu; Buck Babich; Alejandro Vaquero; Balint Joo; Simone Bacchio; Nuno Cardoso; Michael Cheng; Justin Foley; windy510; Frank Winter; Bartosz Kostrzewa; Carleton DeTar; chris-schroeder; Eloy Romero; jcosborn; Robert Maynard; walkloud; Evan Berkowitz; Filippo Spiga; Matthew R Johnson; sunwayihep; Xiao-Yong; Mario Schröck; tsuki


MARC21 XML Export

<?xml version='1.0' encoding='UTF-8'?>
<record xmlns="http://www.loc.gov/MARC21/slim">
  <leader>00000nmm##2200000uu#4500</leader>
  <controlfield tag="005">20211029014900.0</controlfield>
  <controlfield tag="001">5610079</controlfield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Mathias Wagner</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">LLNL</subfield>
    <subfield code="a">Dean Howarth</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Evan Weinberg</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">FNAL</subfield>
    <subfield code="a">Alexei Strelchenko</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">NVIDIA</subfield>
    <subfield code="a">Jiqun Tu</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">NVIDIA</subfield>
    <subfield code="a">Buck Babich</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University Of Utah</subfield>
    <subfield code="a">Alejandro Vaquero</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Oak RIdge Leadership Computing Facility, Oak RIdge National Laboratory</subfield>
    <subfield code="a">Balint Joo</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">The Cyprus Institute</subfield>
    <subfield code="a">Simone Bacchio</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">CeFEMA, Departamento de Física, Instituto Superior Técnico, Universidade de Lisboa</subfield>
    <subfield code="a">Nuno Cardoso</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Michael Cheng</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Justin Foley</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">windy510</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Jefferson Lab</subfield>
    <subfield code="a">Frank Winter</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Digital Science Center (DiCe) &amp; High Performance Computing / Analytics Lab (HPC/A), Bonn University</subfield>
    <subfield code="a">Bartosz Kostrzewa</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Utah</subfield>
    <subfield code="a">Carleton DeTar</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">chris-schroeder</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">Jefferson Lab</subfield>
    <subfield code="a">Eloy Romero</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">jcosborn</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">NVIDIA</subfield>
    <subfield code="a">Robert Maynard</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">walkloud</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">University of Maryland</subfield>
    <subfield code="a">Evan Berkowitz</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">NVIDIA</subfield>
    <subfield code="a">Filippo Spiga</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Matthew R Johnson</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">IHEP</subfield>
    <subfield code="a">sunwayihep</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="a">Xiao-Yong</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">INFN Roma Tre</subfield>
    <subfield code="a">Mario Schröck</subfield>
  </datafield>
  <datafield tag="700" ind1=" " ind2=" ">
    <subfield code="u">TokyoTech</subfield>
    <subfield code="a">tsuki</subfield>
  </datafield>
  <datafield tag="856" ind1="4" ind2=" ">
    <subfield code="s">2200254</subfield>
    <subfield code="z">md5:1bbfd66be71db28de90456455ce60669</subfield>
    <subfield code="u">https://zenodo.org/record/5610079/files/lattice/quda-v1.1.0.zip</subfield>
  </datafield>
  <datafield tag="542" ind1=" " ind2=" ">
    <subfield code="l">open</subfield>
  </datafield>
  <datafield tag="260" ind1=" " ind2=" ">
    <subfield code="c">2021-10-28</subfield>
  </datafield>
  <datafield tag="909" ind1="C" ind2="O">
    <subfield code="p">software</subfield>
    <subfield code="o">oai:zenodo.org:5610079</subfield>
  </datafield>
  <datafield tag="100" ind1=" " ind2=" ">
    <subfield code="a">maddyscientist</subfield>
  </datafield>
  <datafield tag="245" ind1=" " ind2=" ">
    <subfield code="a">lattice/quda: QUDA v1.1.0</subfield>
  </datafield>
  <datafield tag="540" ind1=" " ind2=" ">
    <subfield code="a">Other (Open)</subfield>
  </datafield>
  <datafield tag="650" ind1="1" ind2="7">
    <subfield code="a">cc-by</subfield>
    <subfield code="2">opendefinition.org</subfield>
  </datafield>
  <datafield tag="520" ind1=" " ind2=" ">
    <subfield code="a">&lt;p&gt;Version 1.1.0 - October 2021&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Add support for NVSHMEM communication for the Dslash operators, for significantly improved strong scaling.  See &lt;a href="https://github.com/lattice/quda/wiki/Multi-GPU-with-NVSHMEM"&gt;https://github.com/lattice/quda/wiki/Multi-GPU-with-NVSHMEM&lt;/a&gt; for more  details.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Addition of the MSPCG preconditioned CG solver for Möbius fermions. See &lt;a href="https://github.com/lattice/quda/wiki/The-Multi-Splitting-Preconditioned-Conjugate-Gradient-(MSPCG),-an-application-of-the-additive-Schwarz-Method"&gt;https://github.com/lattice/quda/wiki/The-Multi-Splitting-Preconditioned-Conjugate-Gradient-(MSPCG),-an-application-of-the-additive-Schwarz-Method&lt;/a&gt; for more details.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Addition of the Exact One Flavor Algorithm (EOFA) for Möbius fermions.  See &lt;a href="https://github.com/lattice/quda/wiki/The-Exact-One-Flavor-Algorithm-(EOFA"&gt;https://github.com/lattice/quda/wiki/The-Exact-One-Flavor-Algorithm-(EOFA&lt;/a&gt;) for more details.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Addition of a fully GPU native Implicitly Restarted Arnoldi eigensolver (as opposed to partially relying on ARPACK).  See &lt;a href="https://github.com/lattice/quda/wiki/QUDA%27s-eigensolvers#implicitly-restarted-arnoldi-eigensolver"&gt;https://github.com/lattice/quda/wiki/QUDA%27s-eigensolvers#implicitly-restarted-arnoldi-eigensolver&lt;/a&gt; for more details.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Significantly reduced latency for reduction kernels through the use of heterogeneous atomics.  Requires CUDA 11.0+.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Addition of support for a split-grid multi-RHS solver.  See &lt;a href="https://github.com/lattice/quda/wiki/Split-Grid"&gt;https://github.com/lattice/quda/wiki/Split-Grid&lt;/a&gt; for more details.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Continued work on enhancing and refining the staggered multigrid algorithm.  The MILC interface can now drive the staggered multigrid solver.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multigrid setup can now use tensor cores on Volta, Turing and Ampere GPUs to accelerate the calculation.  Enable with the
&lt;code&gt;QudaMultigridParam::use_mma&lt;/code&gt; parameter.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improved support of managed memory through the addition of a prefetch API.  This can dramatically improve the performance of the multigrid setup when oversubscribing the memory.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improved the performance of using MILC RHMC with QUDA&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add support for a new internal data order FLOAT8.  This is the default data order for nSpin=4 half and quarter precision fields,
though the prior FLOAT4 order can be enabled with the cmake option QUDA_FLOAT8=OFF.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Remove of the singularity from the reconstruct-8 and reconstruct-9 compressed gauge field ordering.  This enables support for free fields with these orderings.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The clover parameter convention has been codified: one can either
1.) pass in QudaInvertParam::kappa and QudaInvertParam::csw separately, and QUDA will infer the necessary clover coefficient, or
2.) pass an explicit value of QudaInvertParam::clover_coeff (e.g. CHROMA's use case) and that will override the above inference.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;QUDA now includes fast-compilation options (QUDA_FAST_COMPILE_DSLASH and QUDA_FAST_COMPILE_REUDCE) which enable much faster build times for development at the expense of reduced performance.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add support for compiling QUDA using clang for both the host and device compiler.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;While the bulk of the work associated with making QUDA portable to different architectures will form the soul of QUDA 2.0, some of the initial refactoring associated with this has been applied.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Significant cleanup of the tests directory to reduce boiler plate.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;General improvements to the cmake build system using modern cmake features.  We now require cmake 3.15.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Extended the ctest list to include some optional benchmarks.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Fix a long-standing issue with multi-node Kepler GPU and Intel dual socket systems.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Improved ASAN integration: SANITIZE builds now work out of the box with no need to set the ASAN_OPTIONS environment variable.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Add support for the extended QIO branch (now required for MILC).&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Bump QMP version to 2.5.3.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Updated to Eigen 3.3.9.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Multiple bug fixes and clean up to the library.  Many of these are listed here: &lt;a href="https://github.com/lattice/quda/milestone/24?closed=1"&gt;https://github.com/lattice/quda/milestone/24?closed=1&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">url</subfield>
    <subfield code="i">isSupplementTo</subfield>
    <subfield code="a">https://github.com/lattice/quda/tree/v1.1.0</subfield>
  </datafield>
  <datafield tag="773" ind1=" " ind2=" ">
    <subfield code="n">doi</subfield>
    <subfield code="i">isVersionOf</subfield>
    <subfield code="a">10.5281/zenodo.3604375</subfield>
  </datafield>
  <datafield tag="024" ind1=" " ind2=" ">
    <subfield code="a">10.5281/zenodo.5610079</subfield>
    <subfield code="2">doi</subfield>
  </datafield>
  <datafield tag="980" ind1=" " ind2=" ">
    <subfield code="a">software</subfield>
  </datafield>
</record>
122
4
views
downloads
All versions This version
Views 12280
Downloads 42
Data volume 8.3 MB4.4 MB
Unique views 10868
Unique downloads 42

Share

Cite as