Lux: Explicit Parameterization of Deep Neural Networks in Julia

Pal, Avik

doi:10.5281/zenodo.19540587

Published April 12, 2026 | Version LuxCUDA-v0.3.5

Software Open

Lux: Explicit Parameterization of Deep Neural Networks in Julia

Pal, Avik

LuxCUDA LuxCUDA-v0.3.5

Diff since LuxCUDA-v0.3.4

Merged pull requests:

feat: migrate DDIM to Reactant (#1158) (@avik-pal)
feat: precompile common workloads (#1485) (@avik-pal)
feat: annotate function defns with @trace (#1486) (@avik-pal)
ci(asv): run more benchmarks for ASV (#1487) (@avik-pal)
feat: precompile common workloads (try 2) (#1488) (@avik-pal)
chore: drop Polyester as a main dependency (#1489) (@avik-pal)
ci: fix cache name (#1491) (@avik-pal)
feat: use new NCCL version (#1492) (@avik-pal)
docs: make it easy to build docs locally (#1493) (@avik-pal)
fix(WeightInitializers): overlay backend array for reactant (#1494) (@avik-pal)
feat: replace Compat.jl with SciMLPublic.jl for @public macro (#1497) (@Copilot)
Add type-stable eltype control to device adaptors with comprehensive testing (#1498) (@Copilot)
chore: bump crate-ci/typos from 1.36.2 to 1.36.3 (#1499) (@dependabot[bot])
chore: bump crate-ci/typos from 1.36.3 to 1.37.2 (#1500) (@dependabot[bot])
CompatHelper: bump compat for BFloat16s to 0.6 for package CIFAR10, (keep existing compat) (#1501) (@github-actions[bot])
CompatHelper: bump compat for BFloat16s to 0.6 for package Qwen3, (keep existing compat) (#1502) (@github-actions[bot])
CompatHelper: bump compat for JLArrays to 0.3 for package test, (keep existing compat) (#1503) (@github-actions[bot])
ci: use 1.11 (#1504) (@avik-pal)
feat: JVP and VJP APIs for Reactant (#1506) (@avik-pal)
feat: batched_jacobian for Reactant (#1507) (@avik-pal)
chore: bump crate-ci/typos from 1.37.2 to 1.38.1 (#1508) (@dependabot[bot])
CompatHelper: bump compat for Optimization to 5 for package GravitationalWaveForm, (keep existing compat) (#1510) (@github-actions[bot])
CompatHelper: bump compat for Optimization to 5 for package OptimizationIntegration, (keep existing compat) (#1511) (@github-actions[bot])
CompatHelper: bump compat for BLISBLAS in [weakdeps] to 0.2 for package LuxLib, (keep existing compat) (#1512) (@github-actions[bot])
CompatHelper: bump compat for BLISBLAS to 0.2 for package test, (keep existing compat) (#1513) (@github-actions[bot])
feat: move rng to reactant device (#1517) (@avik-pal)
fix: donation errors for reactant (#1518) (@avik-pal)
feat: allow passing a sync option (#1519) (@avik-pal)
chore: bump actions/upload-artifact from 4 to 5 (#1526) (@dependabot[bot])
feat: support distributed training via TrainState API (#1529) (@avik-pal)
feat: support track numbers via reactant device API (#1533) (@avik-pal)
ci: run LuxCore + MLDataDevices testing on 1.12 (#1534) (@avik-pal)
feat: update LuxTestUtils to support 1.12 (#1535) (@avik-pal)
docs: stop manual specification of precision config (#1536) (@avik-pal)
CompatHelper: add new compat entry for TensorBoardLogger at version 0.1 for package DDIM, (keep existing compat) (#1537) (@github-actions[bot])
CompatHelper: add new compat entry for ImageShow at version 0.3 for package DDIM, (keep existing compat) (#1538) (@github-actions[bot])
CompatHelper: add new compat entry for OhMyThreads at version 0.8 for package DDIM, (keep existing compat) (#1539) (@github-actions[bot])
chore: bump crate-ci/typos from 1.38.1 to 1.39.0 (#1541) (@dependabot[bot])
refactor: use EnzymeRules.@easy_rule in Lux.jl (#1542) (@avik-pal)
Fix identity_init filling entire submatrix instead of diagonal (#1544) (@Copilot)
test: use finite differencing for gradient testing for Reactant (#1545) (@avik-pal)
feat: more informative error on constructing trainstate with compiled function (#1547) (@avik-pal)
fix: minor reactant stuff + docs build (#1548) (@avik-pal)
feat: use a caching allocator for GPUArrays workflows (#1549) (@avik-pal)
Avoid reconstruction in Internal.unsafe_free! (#1550) (@AntonOresten)
test: update tests for enzyme (#1552) (@avik-pal)
fix: update how the error message looks (#1553) (@avik-pal)
Use |> for moving data to devices (#1559) (@abhro)
feat: return sequence properly + checkpointing + mincut (#1561) (@avik-pal)
fix(LuxLib): avoid extra copy if input and output are aliased (#1562) (@avik-pal)
ci: use dependabot for updating compat entries (#1563) (@avik-pal)
chore: bump actions/checkout from 5 to 6 (#1564) (@dependabot[bot])
chore: bump crate-ci/typos from 1.39.0 to 1.39.2 (#1565) (@dependabot[bot])
Put plot labels within plotting directives (#1566) (@abhro)
Remove unnecessary begin...end markers (#1567) (@abhro)
ci(docs): update cpu builds to use default gh actions (#1569) (@avik-pal)
Indent code blocks to make it part of ordered list (#1570) (@abhro)
Fix @example blocks in gpu_management.md (#1571) (@abhro)
chore: bump actions/download-artifact from 5 to 6 (#1575) (@dependabot[bot])
ci: fix download path for cuda ci (#1576) (@avik-pal)
chore: bump crate-ci/typos from 1.39.2 to 1.40.0 (#1578) (@dependabot[bot])
test: Metal test now works (#1581) (@avik-pal)
Add AbstractChar array support to MLDataDevices (#1582) (@Copilot)
test: streamline installing packages in tests (#1583) (@avik-pal)
Mark LuxCore imports as public using @public (#1585) (@Copilot)
Fix isbits type support for GPU device transfer (#1587) (@Copilot)
Add OpenCL support to MLDataDevices (#1590) (@VarLad)
docs: Recommend Reactant for AMD GPU support and add Tenstorrent backend (#1593) (@Copilot)
docs: fix docs build (#1594) (@avik-pal)
ci(docs): run all docs on cpu runners (#1595) (@avik-pal)
chore: bump actions/upload-artifact from 5 to 6 (#1598) (@dependabot[bot])
chore: bump actions/download-artifact from 6 to 7 (#1599) (@dependabot[bot])
chore: bump peter-evans/create-pull-request from 7 to 8 (#1600) (@dependabot[bot])
feat: proper annotations for xprof (#1603) (@avik-pal)
fix: forwarddiff support for gpu arrays (#1605) (@avik-pal)
docs: use new API to export jax models (#1606) (@avik-pal)
chore: update NPZ requirement to 0.4.3 in /docs (#1607) (@dependabot[bot])
chore: update MLUtils requirement to 0.4.8 in /docs (#1608) (@dependabot[bot])
ci: reactant don't preallocate on CI (#1611) (@avik-pal)
ci: fix how downgrade ci works (#1612) (@avik-pal)
feat: make forwarddiff a weak dependency for luxlib (#1613) (@avik-pal)
perf: more benchmarking results (#1614) (@avik-pal)
chore: bump crate-ci/typos from 1.40.0 to 1.41.0 (#1616) (@dependabot[bot])
fix: sparse arrays support (#1617) (@avik-pal)
chore: rename extensions in LuxCore (#1618) (@avik-pal)
chore: rename extensions in LuxLib (#1619) (@avik-pal)
chore: rename extensions in MLDataDevices & WI (#1620) (@avik-pal)
chore: rename extensions in Lux (#1621) (@avik-pal)
Allow for empty Chains. (#1623) (@ispielma)
fix: circular dependency warning in MLDataDevices (#1624) (@avik-pal)
chore: bump crate-ci/typos from 1.41.0 to 1.42.0 (#1625) (@dependabot[bot])
fix: explicit imports failures from 0.14.2 (#1626) (@avik-pal)
fix: temporarily disable precompile workloads for reactant (#1628) (@avik-pal)
feat: various fixups for nicer JETLS interaction (#1630) (@avik-pal)
fix: reactant precompilation + throw error (#1631) (@avik-pal)
test(LuxLib): migrate testing to v1.12 (#1633) (@avik-pal)
test(Lux): migrate testing to v1.12 (#1634) (@avik-pal)
fix: update tracing to new Reactant API (#1636) (@avik-pal)
feat: parallel precompile for reactant workloads (#1638) (@avik-pal)
docs: fix out of bounds size access (#1639) (@avik-pal)
fix: skip zygote on 1.12 (#1641) (@avik-pal)
chore: bump crate-ci/typos from 1.42.0 to 1.42.1 (#1642) (@dependabot[bot])
test: Enzyme.jl on 1.12 (#1644) (@avik-pal)
Fixed legend labels for the curves in the PolynomialFitting documentation page (#1645) (@JamieMair)
ci: run docs on 1.12 (#1646) (@avik-pal)
feat: support AutoReactant (#1647) (@avik-pal)
fix: dont rely on auto inference in reactant (#1648) (@avik-pal)
fix: bypass reactant in get_device (#1649) (@avik-pal)
test: use ParallelTestRunner (#1650) (@avik-pal)
chore: update Mooncake requirement from 0.4.148 to 0.4.148, 0.5 (#1651) (@dependabot[bot])
chore: update Mooncake requirement from 0.4.138 to 0.4.138, 0.5 in /test (#1652) (@dependabot[bot])
chore: bump crate-ci/typos from 1.42.1 to 1.42.3 (#1653) (@dependabot[bot])
test(LuxLib): migrate to ParallelTestRunners (#1654) (@avik-pal)
fix: missing onehotarrays dispatch for cpu matmul (#1655) (@avik-pal)
test(Lux): migrate to ParallelTestRunner (#1656) (@avik-pal)
chore: bump crate-ci/typos from 1.42.3 to 1.43.3 (#1659) (@dependabot[bot])
chore: update DocumenterVitepress requirement from 0.2 to 0.2, 0.3 in /docs (#1660) (@dependabot[bot])
chore: update LuxCUDA requirement to 0.3.4 in /test (#1661) (@dependabot[bot])
chore: update cuDNN requirement to 1.4.6 in /test (#1662) (@dependabot[bot])
chore: update CUDA requirement to 5.9.6 in /test (#1663) (@dependabot[bot])
Mooncake for LuxTestUtils, LuxLib, Lux general testing, current Tests. (#1664) (@AstitvaAggarwal)
chore: bump crate-ci/typos from 1.43.3 to 1.43.4 (#1665) (@dependabot[bot])
chore: bump crate-ci/typos from 1.43.4 to 1.43.5 (#1667) (@dependabot[bot])
chore: bump actions/download-artifact from 7 to 8 (#1670) (@dependabot[bot])
chore: bump crate-ci/typos from 1.43.5 to 1.44.0 (#1671) (@dependabot[bot])
chore: bump actions/upload-artifact from 6 to 7 (#1672) (@dependabot[bot])
chore: bump julia-actions/cache from 2 to 3 (#1675) (@dependabot[bot])
fix(MLDataDevices): correct amdgpu_array_adapt signature in AMDGPUExt (#1677) (@Copilot)
test: ditch mooncake testing in training API (#1680) (@avik-pal)
fix: N-dimensional mask and bias support in ReactantExt MultiHeadAttention (#1682) (@Copilot)
test: reenable mooncake testing for training api (#1683) (@avik-pal)
fix: correct double activation in cublasLt_fused_dense! generic fallback (#1685) (@Copilot)
docs: add missing mooncake_gradient_function docstring to LuxTestUtils API page (#1687) (@Copilot)
fix: host memory leak in cublaslt implementation (#1689) (@avik-pal)
chore: bump crate-ci/typos from 1.44.0 to 1.45.0 (#1692) (@dependabot[bot])
feat: allow new CUDA versions (#1694) (@avik-pal)

Closed issues:

Rethinking eltype conversions in Adaptors (#1015)
Proper Sparse Interfaces (#1014)
Add simple tests for other accelerators (#686)
DifferentiationInterface testing (#769)
Irregular RAM usage under large amount of epochs on gpu (#872)
Warning from LuxLib when using OneHotArrays about Mixed Precision (#1197)
Memory leak in Dense layer with CUDA (#1230)
CUDA.jl along cannot trigger automatic GPU backend selection (#1245)
Fix remaining CUDA testing (#1457)
Relax NCCL dep for testing (#1479)
MLDataDevices: Don't force precision reduction to Float32 for CUDA? (#1490)
Use SciMLPublic.jl instead of Compat for @public (#1496)
Global configuration for setting sync=true in training API (#1509)
Invalid buffer donation in new Reactant versions (#1514)
Lux.jl and Reactant and StableRNG interaction (#1515)
memory leak (?) on AMD MI250X GPUs (#1516)
Reactant get_device with sharding throws error inside of MLDataDevices, impossible to use with TrainState API (#1520)
Error "failed to run pass manager on module" only on Vector input (#1521)
Local MPI rank is always 0 if Ipopt solver is imported before Lux and MPI (#1525)
Automatically cache allocations for JuliaGPU workloads (#1527)
Relax ForwardDiff version bound? (#1530)
Reactant RNG handling broken in latest release (#1531)
Towards 1.12 support (#1532)
Exporting to Jax manual entry segfaults with recent reactant (#1540)
Identity matrix initialization fills all entries with ones (#1543)
Embedding Layer results in scalar indexing with Reactant? (#1546)
Enzyme Cache Invalidation Failure with v1.10 (#1551)
OneHotArrays + Reactant with cross entropy loss (#1556)
Overhead of convolution on AMD GPU (#1557)
Running Lux and Reactant tests locally (#1577)
LuxTestUtils not re-exported (#1579)
[MLDataDevices] failure at transfering non numerical array (#1580)
Mark LuxCore imports in Lux as public (#1584)
[MLDataDevices] broken support for isbits types movement (#1586)
Update exporting to jax example to directly use new export functionality (#1588)
Failed to precompile LuxLossFunctionsExt (#1591)
Update AMD support documentation to recommend using Reactant (#1592)
ERROR: KeyError: key "ReactantCore" not found (#1596)
TagBot: Manual intervention needed for releases (#1604)
What is wrong with my custom layer? (#1610)
[MLDataDevices] AbstractCuSparseArray not defined in CUDA.jl v5.9.6 (#1615)
Layer summary page of docs i weirdly formatted (#1627)
Precompilation fails when Flux is in the project (#1629)
RMSNorm fails to compile with Reactant (#1637)
Using RMSNorm silently breaks AD with Reactant and Enzyme (#1640)
Reactant compilation for ConvolutionalVAE broken (#1673)
[MLDataDevices] error when moving arrays to AMDGPU (#1676)
Mooncake is completely busted in recent versions (#1679)
Attention assumes 2D mask when using Reactant (#1681)
Possible double activation in LuxLib.Impl.cublasLt_fused_dense! (#1684)
Documentation build is broken (#1686)

Notes

If you use this software, please cite it as below.

Files

LuxDL/Lux.jl-LuxCUDA-v0.3.5.zip

Files (14.9 MB)

Name	Size	Download all
LuxDL/Lux.jl-LuxCUDA-v0.3.5.zip md5:f2170f392ce154aee758e10cc0fcfa13	14.9 MB	Preview Download

Additional details

Is supplement to: Software: https://github.com/LuxDL/Lux.jl/tree/LuxCUDA-v0.3.5 (URL)

Repository URL: https://github.com/LuxDL/Lux.jl

	All versions	This version
Views	7,200	20
Downloads	1,429	0
Data volume	15.1 GB	0 Bytes

Lux: Explicit Parameterization of Deep Neural Networks in Julia

Authors/Creators

Description

LuxCUDA LuxCUDA-v0.3.5

Notes

Files

LuxDL/Lux.jl-LuxCUDA-v0.3.5.zip

Files (14.9 MB)

Additional details

Related works

Software