Published July 15, 2020 | Version v1
Presentation Open

A prototype U.S. CMS analysis facility

Authors/Creators

  • 1. University of Nebraska Lincoln

Description

In the HL-LHC era, an order of magnitude increase of event rates will mean activities that can be done on a laptop today will require significantly more resources tomorrow. For example, increased dataset volumes means that users cannot necessarily keep all their data locally on a laptop - dedicated analysis facilities will be needed. Today, most facilities are batch-oriented while analysts often want to work interactive when exploring the data; facilities will likely need to provide a hybrid of both batch and interactive approaches going forward. U.S. CMS seeks to provide a prototype analysis facility that addresses these challenges during 2020. In this tutorial we describe and demonstrate elements of such a prototype at the University of Nebraska-Lincoln (UNL).

The prototype analysis facility provides services for “low latency columnar analysis”, enabling rapid processing of data in a column-wise fashion. These services, based on Dask and Jupyter notebooks, aim to dramatically lower time for analysis and provide an easily-scalable and user-friendly computational environment that will simplify, facilitate, and accelerate the delivery of HEP results. The facility is built on top of a local Kubernetes cluster and integrates dedicated resources with resources allocated via fairshare through the local HTCondor system. In addition to the user-facing interfaces such as Dask, the facility also manages access control through single-sign-on and authentication & authorization for data access. The showcase will include simple HEP analysis examples, managed interactively in a Jupyter notebook and scheduled on Dask workers and accessing both public and protected data

Files

PYHEP2020_OksanaShadura.zip

Files (3.5 MB)

Name Size Download all
md5:782d85dc0d693f1263892e81cb30bfbf
3.5 MB Preview Download