Presentation Open Access

flox: Fast & furious GroupBy reductions with Dask at Pangeo-scale

Cherian, Deepak

The "groupby" or the "split-apply-combine" paradigm is ubiquitous in scientific analysis, though it may be named differently e.g. "binning", "histogramming", "resampling", "compositing", or "climatology reductions". Xarray implements the groupby paradigm through a "GroupBy" object. Historically the underlying algorithm is not dask-aware, and tends to fail disastrously with large Pangeo-scale distributed workflows. Here I present "flox": a new package that explores effective strategies for groupby reductions at scale with dask. Ongoing work will plug this package in to xarray in a backwards-compatible manner, allowing the community to seamlessly benefit from significantly more efficient groupby computations.See https://flox.readthedocs.io for more.

Files (65.8 MB)
Name Size
Cherian_2021-11-17.mp4
md5:d5435a8e75714dad50c5114ca3d90f57
58.1 MB Download
Cherian_2021-11-17.pdf
md5:8c6514269b2982834fb93f4d534b1df0
7.7 MB Download
73
25
views
downloads
All versions This version
Views 7373
Downloads 2525
Data volume 293.4 MB293.4 MB
Unique views 6767
Unique downloads 2323

Share

Cite as