Published September 14, 2022 | Version v1
Presentation Open

Basic Physics Analyses Implemented Using Apache Spark

Creators

  • 1. CERN

Description

Apache Spark is a very successful open-source tool for data processing. This talk will focus on the use of Spark and its DataFrame API in the context of HEP. We will go through a few demos of some simple and outreach-style analyses implemented using Jupyter notebooks and the Spark Python API (PySpark). We will wrap up with a short discussion of the key features in Spark and its ecosystem that can be useful for Physics analysis and what still needs improvements.

Files

PyHEP2022_LucaCanali.pdf

Files (976.6 kB)

Name Size Download all
md5:36afe410efde174932b5ee6b0d6fce33
976.6 kB Preview Download