Published December 4, 2024 | Version v1
Poster Open

PyDAP Revisited: Exploiting OPeNDAP's Data-Proximate Transformation Tools to Accelerate Scientific Workflows

  • 1. ROR icon OPeNDAP

Description

OPeNDAP’s Hyrax Server enables researchers to access and share scientific datasets over the web efficiently, especially for NetCDF and HDF formats. Despite its widespread adoption and advancements like DAP4 and DMR++, the lack of modern client API tools hampers its usability in contemporary scientific workflows. PyDAP, a Python-based implementation of the DAP protocol, addresses this gap by integrating seamlessly with Xarray, allowing Python-native access to OPeNDAP data.

This presentation revisits PyDAP’s modernization, highlighting the challenges and solutions in leveraging DAP4’s capabilities. Recent and ongoing developments include improved hierarchical data handling, enhanced DMR++ parsing for serverless S3 access, and the adoption of FastAPI to boost PyDAP’s performance. Benchmarks demonstrate significant performance gains using constrained requests, achieving up to 10-100x speedup in remote dataset access. These advancements position PyDAP as a critical tool for enabling efficient, scalable, and cost-effective open science workflows. 

Files

AGU2024PyDAP.pdf

Files (3.5 MB)

Name Size Download all
md5:cdef6c671d7eae6942b0fd15c9b6d72f
3.5 MB Preview Download