Published April 12, 2026
| Version LuxCUDA-v0.3.5
Software
Open
Lux: Explicit Parameterization of Deep Neural Networks in Julia
Authors/Creators
Description
LuxCUDA LuxCUDA-v0.3.5
Merged pull requests:
- feat: migrate DDIM to Reactant (#1158) (@avik-pal)
- feat: precompile common workloads (#1485) (@avik-pal)
- feat: annotate function defns with
@trace(#1486) (@avik-pal) - ci(asv): run more benchmarks for ASV (#1487) (@avik-pal)
- feat: precompile common workloads (try 2) (#1488) (@avik-pal)
- chore: drop Polyester as a main dependency (#1489) (@avik-pal)
- ci: fix cache name (#1491) (@avik-pal)
- feat: use new NCCL version (#1492) (@avik-pal)
- docs: make it easy to build docs locally (#1493) (@avik-pal)
- fix(WeightInitializers): overlay backend array for reactant (#1494) (@avik-pal)
- feat: replace Compat.jl with SciMLPublic.jl for @public macro (#1497) (@Copilot)
- Add type-stable eltype control to device adaptors with comprehensive testing (#1498) (@Copilot)
- chore: bump crate-ci/typos from 1.36.2 to 1.36.3 (#1499) (@dependabot[bot])
- chore: bump crate-ci/typos from 1.36.3 to 1.37.2 (#1500) (@dependabot[bot])
- CompatHelper: bump compat for BFloat16s to 0.6 for package CIFAR10, (keep existing compat) (#1501) (@github-actions[bot])
- CompatHelper: bump compat for BFloat16s to 0.6 for package Qwen3, (keep existing compat) (#1502) (@github-actions[bot])
- CompatHelper: bump compat for JLArrays to 0.3 for package test, (keep existing compat) (#1503) (@github-actions[bot])
- ci: use 1.11 (#1504) (@avik-pal)
- feat: JVP and VJP APIs for Reactant (#1506) (@avik-pal)
- feat: batched_jacobian for Reactant (#1507) (@avik-pal)
- chore: bump crate-ci/typos from 1.37.2 to 1.38.1 (#1508) (@dependabot[bot])
- CompatHelper: bump compat for Optimization to 5 for package GravitationalWaveForm, (keep existing compat) (#1510) (@github-actions[bot])
- CompatHelper: bump compat for Optimization to 5 for package OptimizationIntegration, (keep existing compat) (#1511) (@github-actions[bot])
- CompatHelper: bump compat for BLISBLAS in [weakdeps] to 0.2 for package LuxLib, (keep existing compat) (#1512) (@github-actions[bot])
- CompatHelper: bump compat for BLISBLAS to 0.2 for package test, (keep existing compat) (#1513) (@github-actions[bot])
- feat: move rng to reactant device (#1517) (@avik-pal)
- fix: donation errors for reactant (#1518) (@avik-pal)
- feat: allow passing a sync option (#1519) (@avik-pal)
- chore: bump actions/upload-artifact from 4 to 5 (#1526) (@dependabot[bot])
- feat: support distributed training via TrainState API (#1529) (@avik-pal)
- feat: support track numbers via reactant device API (#1533) (@avik-pal)
- ci: run LuxCore + MLDataDevices testing on 1.12 (#1534) (@avik-pal)
- feat: update LuxTestUtils to support 1.12 (#1535) (@avik-pal)
- docs: stop manual specification of precision config (#1536) (@avik-pal)
- CompatHelper: add new compat entry for TensorBoardLogger at version 0.1 for package DDIM, (keep existing compat) (#1537) (@github-actions[bot])
- CompatHelper: add new compat entry for ImageShow at version 0.3 for package DDIM, (keep existing compat) (#1538) (@github-actions[bot])
- CompatHelper: add new compat entry for OhMyThreads at version 0.8 for package DDIM, (keep existing compat) (#1539) (@github-actions[bot])
- chore: bump crate-ci/typos from 1.38.1 to 1.39.0 (#1541) (@dependabot[bot])
- refactor: use
EnzymeRules.@easy_rulein Lux.jl (#1542) (@avik-pal) - Fix identity_init filling entire submatrix instead of diagonal (#1544) (@Copilot)
- test: use finite differencing for gradient testing for Reactant (#1545) (@avik-pal)
- feat: more informative error on constructing trainstate with compiled function (#1547) (@avik-pal)
- fix: minor reactant stuff + docs build (#1548) (@avik-pal)
- feat: use a caching allocator for GPUArrays workflows (#1549) (@avik-pal)
- Avoid reconstruction in
Internal.unsafe_free!(#1550) (@AntonOresten) - test: update tests for enzyme (#1552) (@avik-pal)
- fix: update how the error message looks (#1553) (@avik-pal)
- Use
|>for moving data to devices (#1559) (@abhro) - feat: return sequence properly + checkpointing + mincut (#1561) (@avik-pal)
- fix(LuxLib): avoid extra copy if input and output are aliased (#1562) (@avik-pal)
- ci: use dependabot for updating compat entries (#1563) (@avik-pal)
- chore: bump actions/checkout from 5 to 6 (#1564) (@dependabot[bot])
- chore: bump crate-ci/typos from 1.39.0 to 1.39.2 (#1565) (@dependabot[bot])
- Put plot labels within plotting directives (#1566) (@abhro)
- Remove unnecessary
begin...endmarkers (#1567) (@abhro) - ci(docs): update cpu builds to use default gh actions (#1569) (@avik-pal)
- Indent code blocks to make it part of ordered list (#1570) (@abhro)
- Fix
@exampleblocks in gpu_management.md (#1571) (@abhro) - chore: bump actions/download-artifact from 5 to 6 (#1575) (@dependabot[bot])
- ci: fix download path for cuda ci (#1576) (@avik-pal)
- chore: bump crate-ci/typos from 1.39.2 to 1.40.0 (#1578) (@dependabot[bot])
- test: Metal test now works (#1581) (@avik-pal)
- Add AbstractChar array support to MLDataDevices (#1582) (@Copilot)
- test: streamline installing packages in tests (#1583) (@avik-pal)
- Mark LuxCore imports as public using @public (#1585) (@Copilot)
- Fix isbits type support for GPU device transfer (#1587) (@Copilot)
- Add OpenCL support to MLDataDevices (#1590) (@VarLad)
- docs: Recommend Reactant for AMD GPU support and add Tenstorrent backend (#1593) (@Copilot)
- docs: fix docs build (#1594) (@avik-pal)
- ci(docs): run all docs on cpu runners (#1595) (@avik-pal)
- chore: bump actions/upload-artifact from 5 to 6 (#1598) (@dependabot[bot])
- chore: bump actions/download-artifact from 6 to 7 (#1599) (@dependabot[bot])
- chore: bump peter-evans/create-pull-request from 7 to 8 (#1600) (@dependabot[bot])
- feat: proper annotations for xprof (#1603) (@avik-pal)
- fix: forwarddiff support for gpu arrays (#1605) (@avik-pal)
- docs: use new API to export jax models (#1606) (@avik-pal)
- chore: update NPZ requirement to 0.4.3 in /docs (#1607) (@dependabot[bot])
- chore: update MLUtils requirement to 0.4.8 in /docs (#1608) (@dependabot[bot])
- ci: reactant don't preallocate on CI (#1611) (@avik-pal)
- ci: fix how downgrade ci works (#1612) (@avik-pal)
- feat: make forwarddiff a weak dependency for luxlib (#1613) (@avik-pal)
- perf: more benchmarking results (#1614) (@avik-pal)
- chore: bump crate-ci/typos from 1.40.0 to 1.41.0 (#1616) (@dependabot[bot])
- fix: sparse arrays support (#1617) (@avik-pal)
- chore: rename extensions in LuxCore (#1618) (@avik-pal)
- chore: rename extensions in LuxLib (#1619) (@avik-pal)
- chore: rename extensions in MLDataDevices & WI (#1620) (@avik-pal)
- chore: rename extensions in Lux (#1621) (@avik-pal)
- Allow for empty Chains. (#1623) (@ispielma)
- fix: circular dependency warning in MLDataDevices (#1624) (@avik-pal)
- chore: bump crate-ci/typos from 1.41.0 to 1.42.0 (#1625) (@dependabot[bot])
- fix: explicit imports failures from 0.14.2 (#1626) (@avik-pal)
- fix: temporarily disable precompile workloads for reactant (#1628) (@avik-pal)
- feat: various fixups for nicer JETLS interaction (#1630) (@avik-pal)
- fix: reactant precompilation + throw error (#1631) (@avik-pal)
- test(LuxLib): migrate testing to v1.12 (#1633) (@avik-pal)
- test(Lux): migrate testing to v1.12 (#1634) (@avik-pal)
- fix: update tracing to new Reactant API (#1636) (@avik-pal)
- feat: parallel precompile for reactant workloads (#1638) (@avik-pal)
- docs: fix out of bounds size access (#1639) (@avik-pal)
- fix: skip zygote on 1.12 (#1641) (@avik-pal)
- chore: bump crate-ci/typos from 1.42.0 to 1.42.1 (#1642) (@dependabot[bot])
- test: Enzyme.jl on 1.12 (#1644) (@avik-pal)
- Fixed legend labels for the curves in the PolynomialFitting documentation page (#1645) (@JamieMair)
- ci: run docs on 1.12 (#1646) (@avik-pal)
- feat: support AutoReactant (#1647) (@avik-pal)
- fix: dont rely on auto inference in reactant (#1648) (@avik-pal)
- fix: bypass reactant in get_device (#1649) (@avik-pal)
- test: use ParallelTestRunner (#1650) (@avik-pal)
- chore: update Mooncake requirement from 0.4.148 to 0.4.148, 0.5 (#1651) (@dependabot[bot])
- chore: update Mooncake requirement from 0.4.138 to 0.4.138, 0.5 in /test (#1652) (@dependabot[bot])
- chore: bump crate-ci/typos from 1.42.1 to 1.42.3 (#1653) (@dependabot[bot])
- test(LuxLib): migrate to ParallelTestRunners (#1654) (@avik-pal)
- fix: missing onehotarrays dispatch for cpu matmul (#1655) (@avik-pal)
- test(Lux): migrate to ParallelTestRunner (#1656) (@avik-pal)
- chore: bump crate-ci/typos from 1.42.3 to 1.43.3 (#1659) (@dependabot[bot])
- chore: update DocumenterVitepress requirement from 0.2 to 0.2, 0.3 in /docs (#1660) (@dependabot[bot])
- chore: update LuxCUDA requirement to 0.3.4 in /test (#1661) (@dependabot[bot])
- chore: update cuDNN requirement to 1.4.6 in /test (#1662) (@dependabot[bot])
- chore: update CUDA requirement to 5.9.6 in /test (#1663) (@dependabot[bot])
MooncakeforLuxTestUtils,LuxLib,Luxgeneral testing, current Tests. (#1664) (@AstitvaAggarwal)- chore: bump crate-ci/typos from 1.43.3 to 1.43.4 (#1665) (@dependabot[bot])
- chore: bump crate-ci/typos from 1.43.4 to 1.43.5 (#1667) (@dependabot[bot])
- chore: bump actions/download-artifact from 7 to 8 (#1670) (@dependabot[bot])
- chore: bump crate-ci/typos from 1.43.5 to 1.44.0 (#1671) (@dependabot[bot])
- chore: bump actions/upload-artifact from 6 to 7 (#1672) (@dependabot[bot])
- chore: bump julia-actions/cache from 2 to 3 (#1675) (@dependabot[bot])
- fix(MLDataDevices): correct
amdgpu_array_adaptsignature in AMDGPUExt (#1677) (@Copilot) - test: ditch mooncake testing in training API (#1680) (@avik-pal)
- fix: N-dimensional mask and bias support in ReactantExt MultiHeadAttention (#1682) (@Copilot)
- test: reenable mooncake testing for training api (#1683) (@avik-pal)
- fix: correct double activation in
cublasLt_fused_dense!generic fallback (#1685) (@Copilot) - docs: add missing
mooncake_gradient_functiondocstring to LuxTestUtils API page (#1687) (@Copilot) - fix: host memory leak in cublaslt implementation (#1689) (@avik-pal)
- chore: bump crate-ci/typos from 1.44.0 to 1.45.0 (#1692) (@dependabot[bot])
- feat: allow new CUDA versions (#1694) (@avik-pal)
Closed issues:
- Rethinking
eltypeconversions in Adaptors (#1015) - Proper Sparse Interfaces (#1014)
- Add simple tests for other accelerators (#686)
- DifferentiationInterface testing (#769)
- Irregular RAM usage under large amount of epochs on gpu (#872)
- Warning from LuxLib when using OneHotArrays about Mixed Precision (#1197)
- Memory leak in Dense layer with CUDA (#1230)
- CUDA.jl along cannot trigger automatic GPU backend selection (#1245)
- Fix remaining CUDA testing (#1457)
- Relax NCCL dep for testing (#1479)
- MLDataDevices: Don't force precision reduction to Float32 for CUDA? (#1490)
- Use
SciMLPublic.jlinstead ofCompatfor@public(#1496) - Global configuration for setting
sync=truein training API (#1509) - Invalid buffer donation in new Reactant versions (#1514)
- Lux.jl and Reactant and StableRNG interaction (#1515)
- memory leak (?) on AMD MI250X GPUs (#1516)
- Reactant get_device with sharding throws error inside of MLDataDevices, impossible to use with TrainState API (#1520)
- Error "failed to run pass manager on module" only on Vector input (#1521)
- Local MPI rank is always 0 if
Ipoptsolver is imported before Lux and MPI (#1525) - Automatically cache allocations for JuliaGPU workloads (#1527)
- Relax ForwardDiff version bound? (#1530)
- Reactant RNG handling broken in latest release (#1531)
- Towards 1.12 support (#1532)
- Exporting to Jax manual entry segfaults with recent reactant (#1540)
- Identity matrix initialization fills all entries with ones (#1543)
- Embedding Layer results in scalar indexing with Reactant? (#1546)
- Enzyme Cache Invalidation Failure with v1.10 (#1551)
- OneHotArrays + Reactant with cross entropy loss (#1556)
- Overhead of convolution on AMD GPU (#1557)
- Running Lux and Reactant tests locally (#1577)
LuxTestUtilsnot re-exported (#1579)- [MLDataDevices] failure at transfering non numerical array (#1580)
- Mark LuxCore imports in Lux as public (#1584)
- [MLDataDevices] broken support for
isbitstypes movement (#1586) - Update exporting to jax example to directly use new export functionality (#1588)
- Failed to precompile LuxLossFunctionsExt (#1591)
- Update AMD support documentation to recommend using Reactant (#1592)
- ERROR: KeyError: key "ReactantCore" not found (#1596)
- TagBot: Manual intervention needed for releases (#1604)
- What is wrong with my custom layer? (#1610)
- [MLDataDevices]
AbstractCuSparseArraynot defined in CUDA.jl v5.9.6 (#1615) - Layer summary page of docs i weirdly formatted (#1627)
- Precompilation fails when Flux is in the project (#1629)
- RMSNorm fails to compile with Reactant (#1637)
- Using RMSNorm silently breaks AD with Reactant and Enzyme (#1640)
- Reactant compilation for ConvolutionalVAE broken (#1673)
- [MLDataDevices] error when moving arrays to AMDGPU (#1676)
- Mooncake is completely busted in recent versions (#1679)
- Attention assumes 2D mask when using Reactant (#1681)
- Possible double activation in
LuxLib.Impl.cublasLt_fused_dense!(#1684) - Documentation build is broken (#1686)
Notes
Files
LuxDL/Lux.jl-LuxCUDA-v0.3.5.zip
Files
(14.9 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:f2170f392ce154aee758e10cc0fcfa13
|
14.9 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Software: https://github.com/LuxDL/Lux.jl/tree/LuxCUDA-v0.3.5 (URL)
Software
- Repository URL
- https://github.com/LuxDL/Lux.jl