Published November 7, 2024 | Version v2
Video/Audio Open

Exploring Data at Scale with Arkouda: A Practical Introduction to Scalable Data Science

  • 1. ROR icon Hewlett Packard Enterprise (United States)

Description

Data scientists can be thought of as modern-day explorers, venturing into the vast unknown of information. However, this exciting journey is not without its hurdles. One of the biggest challenges they face is the sheer immensity of data they encounter. Modern datasets cannot fit in laptop memory, containing terabytes or even petabytes of information. Working with such massive data requires specialized tools and techniques to extract meaningful insights. As data sets are growing ever larger, data science demands interactivity, where scientists can learn while working with the data. At the same time, data science demands scalability, where scientists are able to work with data sets in their entirety. Data scientists have naturally been drawn to Python as it provides interactivity through its read, evaluate, print loop and performance through its utilization of libraries written in other languages, like C and Fortran. These libraries typically are not designed for HPC and run into problems when attempting to scale. The gap that Arkouda fills in the data science landscape is a library that is both interactive, providing a familiar Python API, and scalable, leveraging a scalable Chapel server in the backend. Arkouda is a framework for scalable Python packages for interactive data science and has applications ranging from oceanography to net flow analysis.

Files

Arkouda final demo cut.mp4

Files (170.6 MB)

Name Size Download all
md5:20e2ba32dfe79de5cf4cdacb5c7e2850
170.6 MB Preview Download

Additional details

Dates

Submitted
2024-08-15

Software

Repository URL
https://github.com/bmcdonald3/chapelcon-2024-arkouda
Programming language
Python, Chapel
Development Status
Active