Presentation Open Access

flox: Fast & furious GroupBy reductions with Dask at Pangeo-scale

Cherian, Deepak


JSON-LD (schema.org) Export

{
  "description": "<p>The &quot;groupby&quot; or the &quot;split-apply-combine&quot; paradigm is ubiquitous in scientific analysis, though it may be named differently e.g. &quot;binning&quot;, &quot;histogramming&quot;, &quot;resampling&quot;, &quot;compositing&quot;, or &quot;climatology reductions&quot;. Xarray implements the groupby paradigm through a &quot;GroupBy&quot; object. Historically the underlying algorithm is not dask-aware, and tends to fail disastrously with large Pangeo-scale distributed workflows.&nbsp;Here I present &quot;flox&quot;: a new package that explores effective strategies for groupby reductions at scale with dask. Ongoing work will plug this package in to xarray in a backwards-compatible manner, allowing the community to seamlessly benefit from significantly more efficient groupby computations.See&nbsp;https://flox.readthedocs.io&nbsp;for more.</p>", 
  "license": "https://creativecommons.org/licenses/by/4.0/legalcode", 
  "creator": [
    {
      "affiliation": "NCAR", 
      "@id": "https://orcid.org/0000-0002-6861-8734", 
      "@type": "Person", 
      "name": "Cherian, Deepak"
    }
  ], 
  "url": "https://zenodo.org/record/5772165", 
  "datePublished": "2021-11-17", 
  "keywords": [
    "Pangeo", 
    "Xarray"
  ], 
  "@context": "https://schema.org/", 
  "identifier": "https://doi.org/10.5281/zenodo.5772165", 
  "@id": "https://doi.org/10.5281/zenodo.5772165", 
  "@type": "PresentationDigitalDocument", 
  "name": "flox: Fast & furious GroupBy reductions with Dask at Pangeo-scale"
}
77
25
views
downloads
All versions This version
Views 7777
Downloads 2525
Data volume 293.4 MB293.4 MB
Unique views 7070
Unique downloads 2323

Share

Cite as