PyDAP Revisited: Exploiting OPeNDAP's Data-Proximate Transformation Tools to Accelerate Scientific Workflows
Description
OPeNDAP’s Hyrax Server enables researchers to access and share scientific datasets over the web efficiently, especially for NetCDF and HDF formats. Despite its widespread adoption and advancements like DAP4 and DMR++, the lack of modern client API tools hampers its usability in contemporary scientific workflows. PyDAP, a Python-based implementation of the DAP protocol, addresses this gap by integrating seamlessly with Xarray, allowing Python-native access to OPeNDAP data.
This presentation revisits PyDAP’s modernization, highlighting the challenges and solutions in leveraging DAP4’s capabilities. Recent and ongoing developments include improved hierarchical data handling, enhanced DMR++ parsing for serverless S3 access, and the adoption of FastAPI to boost PyDAP’s performance. Benchmarks demonstrate significant performance gains using constrained requests, achieving up to 10-100x speedup in remote dataset access. These advancements position PyDAP as a critical tool for enabling efficient, scalable, and cost-effective open science workflows.
Files
AGU2024PyDAP.pdf
Files
(3.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:cdef6c671d7eae6942b0fd15c9b6d72f
|
3.5 MB | Preview Download |