There is a newer version of the record available.

Published April 25, 2026 | Version v2.0.0
Software Open

slowkow/harmonypy: v2.0.0

  • 1. Mass General Brigham
  • 2. Broad Institute of MIT and Harvard
  • 3. Seqera

Description

Complete rewrite with C++ backend (Armadillo + nanobind), matching the R harmony2 package step-by-step.

Highlights

  • ~10x faster than v0.1.0: 858k cells in ~36s on Apple M1 Ultra (vs ~340s previously)
  • Matches R harmony2: correlation ≥0.998 across all PCs
  • Minimal dependencies: only numpy at runtime
  • Pre-built wheels for Linux (x86_64, aarch64) and macOS (x86_64, arm64), Python 3.9–3.13

New

  • C++ backend with BLAS-accelerated dense matrix ops (Accelerate on macOS, OpenBLAS on Linux)
  • Custom scatter/gather kernels replace all sparse matrix operations
  • K-means initialization matches R exactly (Gumbel-max cosine-distance sampling)
  • ncores parameter to control BLAS thread count
  • batch_prop_cutoff parameter for underrepresented batch handling
  • Arrowhead matrix inverse for fast single-covariate correction
  • Accepts pandas DataFrame, dict of arrays, or NumPy array for meta_data
  • C++ progress messages routed through Python logging for proper integration with downstream packages (thanks @yakirr)

Breaking changes

  • lamb defaults to automatic estimation (was fixed 1). Pass lamb=1 for old behavior.
  • Default parameters changed to match R harmony2: max_iter_kmeans 20→4, epsilon_cluster 1e-5→1e-3, epsilon_harmony 1e-4→1e-2
  • Only numpy required at runtime (previously pandas, scipy, scikit-learn)

See CHANGELOG.md for full details.

Files

slowkow/harmonypy-v2.0.0.zip

Files (93.7 MB)

Name Size Download all
md5:b684ea53d713cf7c5ccb1c467286bbec
93.7 MB Preview Download

Additional details

Related works

Software