Published April 12, 2026 | Version LuxCUDA-v0.3.5
Software Open

Lux: Explicit Parameterization of Deep Neural Networks in Julia

Authors/Creators

Description

LuxCUDA LuxCUDA-v0.3.5

Diff since LuxCUDA-v0.3.4

Merged pull requests:

  • feat: migrate DDIM to Reactant (#1158) (@avik-pal)
  • feat: precompile common workloads (#1485) (@avik-pal)
  • feat: annotate function defns with @trace (#1486) (@avik-pal)
  • ci(asv): run more benchmarks for ASV (#1487) (@avik-pal)
  • feat: precompile common workloads (try 2) (#1488) (@avik-pal)
  • chore: drop Polyester as a main dependency (#1489) (@avik-pal)
  • ci: fix cache name (#1491) (@avik-pal)
  • feat: use new NCCL version (#1492) (@avik-pal)
  • docs: make it easy to build docs locally (#1493) (@avik-pal)
  • fix(WeightInitializers): overlay backend array for reactant (#1494) (@avik-pal)
  • feat: replace Compat.jl with SciMLPublic.jl for @public macro (#1497) (@Copilot)
  • Add type-stable eltype control to device adaptors with comprehensive testing (#1498) (@Copilot)
  • chore: bump crate-ci/typos from 1.36.2 to 1.36.3 (#1499) (@dependabot[bot])
  • chore: bump crate-ci/typos from 1.36.3 to 1.37.2 (#1500) (@dependabot[bot])
  • CompatHelper: bump compat for BFloat16s to 0.6 for package CIFAR10, (keep existing compat) (#1501) (@github-actions[bot])
  • CompatHelper: bump compat for BFloat16s to 0.6 for package Qwen3, (keep existing compat) (#1502) (@github-actions[bot])
  • CompatHelper: bump compat for JLArrays to 0.3 for package test, (keep existing compat) (#1503) (@github-actions[bot])
  • ci: use 1.11 (#1504) (@avik-pal)
  • feat: JVP and VJP APIs for Reactant (#1506) (@avik-pal)
  • feat: batched_jacobian for Reactant (#1507) (@avik-pal)
  • chore: bump crate-ci/typos from 1.37.2 to 1.38.1 (#1508) (@dependabot[bot])
  • CompatHelper: bump compat for Optimization to 5 for package GravitationalWaveForm, (keep existing compat) (#1510) (@github-actions[bot])
  • CompatHelper: bump compat for Optimization to 5 for package OptimizationIntegration, (keep existing compat) (#1511) (@github-actions[bot])
  • CompatHelper: bump compat for BLISBLAS in [weakdeps] to 0.2 for package LuxLib, (keep existing compat) (#1512) (@github-actions[bot])
  • CompatHelper: bump compat for BLISBLAS to 0.2 for package test, (keep existing compat) (#1513) (@github-actions[bot])
  • feat: move rng to reactant device (#1517) (@avik-pal)
  • fix: donation errors for reactant (#1518) (@avik-pal)
  • feat: allow passing a sync option (#1519) (@avik-pal)
  • chore: bump actions/upload-artifact from 4 to 5 (#1526) (@dependabot[bot])
  • feat: support distributed training via TrainState API (#1529) (@avik-pal)
  • feat: support track numbers via reactant device API (#1533) (@avik-pal)
  • ci: run LuxCore + MLDataDevices testing on 1.12 (#1534) (@avik-pal)
  • feat: update LuxTestUtils to support 1.12 (#1535) (@avik-pal)
  • docs: stop manual specification of precision config (#1536) (@avik-pal)
  • CompatHelper: add new compat entry for TensorBoardLogger at version 0.1 for package DDIM, (keep existing compat) (#1537) (@github-actions[bot])
  • CompatHelper: add new compat entry for ImageShow at version 0.3 for package DDIM, (keep existing compat) (#1538) (@github-actions[bot])
  • CompatHelper: add new compat entry for OhMyThreads at version 0.8 for package DDIM, (keep existing compat) (#1539) (@github-actions[bot])
  • chore: bump crate-ci/typos from 1.38.1 to 1.39.0 (#1541) (@dependabot[bot])
  • refactor: use EnzymeRules.@easy_rule in Lux.jl (#1542) (@avik-pal)
  • Fix identity_init filling entire submatrix instead of diagonal (#1544) (@Copilot)
  • test: use finite differencing for gradient testing for Reactant (#1545) (@avik-pal)
  • feat: more informative error on constructing trainstate with compiled function (#1547) (@avik-pal)
  • fix: minor reactant stuff + docs build (#1548) (@avik-pal)
  • feat: use a caching allocator for GPUArrays workflows (#1549) (@avik-pal)
  • Avoid reconstruction in Internal.unsafe_free! (#1550) (@AntonOresten)
  • test: update tests for enzyme (#1552) (@avik-pal)
  • fix: update how the error message looks (#1553) (@avik-pal)
  • Use |> for moving data to devices (#1559) (@abhro)
  • feat: return sequence properly + checkpointing + mincut (#1561) (@avik-pal)
  • fix(LuxLib): avoid extra copy if input and output are aliased (#1562) (@avik-pal)
  • ci: use dependabot for updating compat entries (#1563) (@avik-pal)
  • chore: bump actions/checkout from 5 to 6 (#1564) (@dependabot[bot])
  • chore: bump crate-ci/typos from 1.39.0 to 1.39.2 (#1565) (@dependabot[bot])
  • Put plot labels within plotting directives (#1566) (@abhro)
  • Remove unnecessary begin...end markers (#1567) (@abhro)
  • ci(docs): update cpu builds to use default gh actions (#1569) (@avik-pal)
  • Indent code blocks to make it part of ordered list (#1570) (@abhro)
  • Fix @example blocks in gpu_management.md (#1571) (@abhro)
  • chore: bump actions/download-artifact from 5 to 6 (#1575) (@dependabot[bot])
  • ci: fix download path for cuda ci (#1576) (@avik-pal)
  • chore: bump crate-ci/typos from 1.39.2 to 1.40.0 (#1578) (@dependabot[bot])
  • test: Metal test now works (#1581) (@avik-pal)
  • Add AbstractChar array support to MLDataDevices (#1582) (@Copilot)
  • test: streamline installing packages in tests (#1583) (@avik-pal)
  • Mark LuxCore imports as public using @public (#1585) (@Copilot)
  • Fix isbits type support for GPU device transfer (#1587) (@Copilot)
  • Add OpenCL support to MLDataDevices (#1590) (@VarLad)
  • docs: Recommend Reactant for AMD GPU support and add Tenstorrent backend (#1593) (@Copilot)
  • docs: fix docs build (#1594) (@avik-pal)
  • ci(docs): run all docs on cpu runners (#1595) (@avik-pal)
  • chore: bump actions/upload-artifact from 5 to 6 (#1598) (@dependabot[bot])
  • chore: bump actions/download-artifact from 6 to 7 (#1599) (@dependabot[bot])
  • chore: bump peter-evans/create-pull-request from 7 to 8 (#1600) (@dependabot[bot])
  • feat: proper annotations for xprof (#1603) (@avik-pal)
  • fix: forwarddiff support for gpu arrays (#1605) (@avik-pal)
  • docs: use new API to export jax models (#1606) (@avik-pal)
  • chore: update NPZ requirement to 0.4.3 in /docs (#1607) (@dependabot[bot])
  • chore: update MLUtils requirement to 0.4.8 in /docs (#1608) (@dependabot[bot])
  • ci: reactant don't preallocate on CI (#1611) (@avik-pal)
  • ci: fix how downgrade ci works (#1612) (@avik-pal)
  • feat: make forwarddiff a weak dependency for luxlib (#1613) (@avik-pal)
  • perf: more benchmarking results (#1614) (@avik-pal)
  • chore: bump crate-ci/typos from 1.40.0 to 1.41.0 (#1616) (@dependabot[bot])
  • fix: sparse arrays support (#1617) (@avik-pal)
  • chore: rename extensions in LuxCore (#1618) (@avik-pal)
  • chore: rename extensions in LuxLib (#1619) (@avik-pal)
  • chore: rename extensions in MLDataDevices & WI (#1620) (@avik-pal)
  • chore: rename extensions in Lux (#1621) (@avik-pal)
  • Allow for empty Chains. (#1623) (@ispielma)
  • fix: circular dependency warning in MLDataDevices (#1624) (@avik-pal)
  • chore: bump crate-ci/typos from 1.41.0 to 1.42.0 (#1625) (@dependabot[bot])
  • fix: explicit imports failures from 0.14.2 (#1626) (@avik-pal)
  • fix: temporarily disable precompile workloads for reactant (#1628) (@avik-pal)
  • feat: various fixups for nicer JETLS interaction (#1630) (@avik-pal)
  • fix: reactant precompilation + throw error (#1631) (@avik-pal)
  • test(LuxLib): migrate testing to v1.12 (#1633) (@avik-pal)
  • test(Lux): migrate testing to v1.12 (#1634) (@avik-pal)
  • fix: update tracing to new Reactant API (#1636) (@avik-pal)
  • feat: parallel precompile for reactant workloads (#1638) (@avik-pal)
  • docs: fix out of bounds size access (#1639) (@avik-pal)
  • fix: skip zygote on 1.12 (#1641) (@avik-pal)
  • chore: bump crate-ci/typos from 1.42.0 to 1.42.1 (#1642) (@dependabot[bot])
  • test: Enzyme.jl on 1.12 (#1644) (@avik-pal)
  • Fixed legend labels for the curves in the PolynomialFitting documentation page (#1645) (@JamieMair)
  • ci: run docs on 1.12 (#1646) (@avik-pal)
  • feat: support AutoReactant (#1647) (@avik-pal)
  • fix: dont rely on auto inference in reactant (#1648) (@avik-pal)
  • fix: bypass reactant in get_device (#1649) (@avik-pal)
  • test: use ParallelTestRunner (#1650) (@avik-pal)
  • chore: update Mooncake requirement from 0.4.148 to 0.4.148, 0.5 (#1651) (@dependabot[bot])
  • chore: update Mooncake requirement from 0.4.138 to 0.4.138, 0.5 in /test (#1652) (@dependabot[bot])
  • chore: bump crate-ci/typos from 1.42.1 to 1.42.3 (#1653) (@dependabot[bot])
  • test(LuxLib): migrate to ParallelTestRunners (#1654) (@avik-pal)
  • fix: missing onehotarrays dispatch for cpu matmul (#1655) (@avik-pal)
  • test(Lux): migrate to ParallelTestRunner (#1656) (@avik-pal)
  • chore: bump crate-ci/typos from 1.42.3 to 1.43.3 (#1659) (@dependabot[bot])
  • chore: update DocumenterVitepress requirement from 0.2 to 0.2, 0.3 in /docs (#1660) (@dependabot[bot])
  • chore: update LuxCUDA requirement to 0.3.4 in /test (#1661) (@dependabot[bot])
  • chore: update cuDNN requirement to 1.4.6 in /test (#1662) (@dependabot[bot])
  • chore: update CUDA requirement to 5.9.6 in /test (#1663) (@dependabot[bot])
  • Mooncake for LuxTestUtils, LuxLib, Lux general testing, current Tests. (#1664) (@AstitvaAggarwal)
  • chore: bump crate-ci/typos from 1.43.3 to 1.43.4 (#1665) (@dependabot[bot])
  • chore: bump crate-ci/typos from 1.43.4 to 1.43.5 (#1667) (@dependabot[bot])
  • chore: bump actions/download-artifact from 7 to 8 (#1670) (@dependabot[bot])
  • chore: bump crate-ci/typos from 1.43.5 to 1.44.0 (#1671) (@dependabot[bot])
  • chore: bump actions/upload-artifact from 6 to 7 (#1672) (@dependabot[bot])
  • chore: bump julia-actions/cache from 2 to 3 (#1675) (@dependabot[bot])
  • fix(MLDataDevices): correct amdgpu_array_adapt signature in AMDGPUExt (#1677) (@Copilot)
  • test: ditch mooncake testing in training API (#1680) (@avik-pal)
  • fix: N-dimensional mask and bias support in ReactantExt MultiHeadAttention (#1682) (@Copilot)
  • test: reenable mooncake testing for training api (#1683) (@avik-pal)
  • fix: correct double activation in cublasLt_fused_dense! generic fallback (#1685) (@Copilot)
  • docs: add missing mooncake_gradient_function docstring to LuxTestUtils API page (#1687) (@Copilot)
  • fix: host memory leak in cublaslt implementation (#1689) (@avik-pal)
  • chore: bump crate-ci/typos from 1.44.0 to 1.45.0 (#1692) (@dependabot[bot])
  • feat: allow new CUDA versions (#1694) (@avik-pal)

Closed issues:

  • Rethinking eltype conversions in Adaptors (#1015)
  • Proper Sparse Interfaces (#1014)
  • Add simple tests for other accelerators (#686)
  • DifferentiationInterface testing (#769)
  • Irregular RAM usage under large amount of epochs on gpu (#872)
  • Warning from LuxLib when using OneHotArrays about Mixed Precision (#1197)
  • Memory leak in Dense layer with CUDA (#1230)
  • CUDA.jl along cannot trigger automatic GPU backend selection (#1245)
  • Fix remaining CUDA testing (#1457)
  • Relax NCCL dep for testing (#1479)
  • MLDataDevices: Don't force precision reduction to Float32 for CUDA? (#1490)
  • Use SciMLPublic.jl instead of Compat for @public (#1496)
  • Global configuration for setting sync=true in training API (#1509)
  • Invalid buffer donation in new Reactant versions (#1514)
  • Lux.jl and Reactant and StableRNG interaction (#1515)
  • memory leak (?) on AMD MI250X GPUs (#1516)
  • Reactant get_device with sharding throws error inside of MLDataDevices, impossible to use with TrainState API (#1520)
  • Error "failed to run pass manager on module" only on Vector input (#1521)
  • Local MPI rank is always 0 if Ipopt solver is imported before Lux and MPI (#1525)
  • Automatically cache allocations for JuliaGPU workloads (#1527)
  • Relax ForwardDiff version bound? (#1530)
  • Reactant RNG handling broken in latest release (#1531)
  • Towards 1.12 support (#1532)
  • Exporting to Jax manual entry segfaults with recent reactant (#1540)
  • Identity matrix initialization fills all entries with ones (#1543)
  • Embedding Layer results in scalar indexing with Reactant? (#1546)
  • Enzyme Cache Invalidation Failure with v1.10 (#1551)
  • OneHotArrays + Reactant with cross entropy loss (#1556)
  • Overhead of convolution on AMD GPU (#1557)
  • Running Lux and Reactant tests locally (#1577)
  • LuxTestUtils not re-exported (#1579)
  • [MLDataDevices] failure at transfering non numerical array (#1580)
  • Mark LuxCore imports in Lux as public (#1584)
  • [MLDataDevices] broken support for isbits types movement (#1586)
  • Update exporting to jax example to directly use new export functionality (#1588)
  • Failed to precompile LuxLossFunctionsExt (#1591)
  • Update AMD support documentation to recommend using Reactant (#1592)
  • ERROR: KeyError: key "ReactantCore" not found (#1596)
  • TagBot: Manual intervention needed for releases (#1604)
  • What is wrong with my custom layer? (#1610)
  • [MLDataDevices] AbstractCuSparseArray not defined in CUDA.jl v5.9.6 (#1615)
  • Layer summary page of docs i weirdly formatted (#1627)
  • Precompilation fails when Flux is in the project (#1629)
  • RMSNorm fails to compile with Reactant (#1637)
  • Using RMSNorm silently breaks AD with Reactant and Enzyme (#1640)
  • Reactant compilation for ConvolutionalVAE broken (#1673)
  • [MLDataDevices] error when moving arrays to AMDGPU (#1676)
  • Mooncake is completely busted in recent versions (#1679)
  • Attention assumes 2D mask when using Reactant (#1681)
  • Possible double activation in LuxLib.Impl.cublasLt_fused_dense! (#1684)
  • Documentation build is broken (#1686)

Notes

If you use this software, please cite it as below.

Files

LuxDL/Lux.jl-LuxCUDA-v0.3.5.zip

Files (14.9 MB)

Name Size Download all
md5:f2170f392ce154aee758e10cc0fcfa13
14.9 MB Preview Download

Additional details

Related works

Software