A prototype U.S. CMS analysis facility
Description
In the HL-LHC era, an order of magnitude increase of event rates will mean activities that can be done on a laptop today will require significantly more resources tomorrow. For example, increased dataset volumes means that users cannot necessarily keep all their data locally on a laptop - dedicated analysis facilities will be needed. Today, most facilities are batch-oriented while analysts often want to work interactive when exploring the data; facilities will likely need to provide a hybrid of both batch and interactive approaches going forward. U.S. CMS seeks to provide a prototype analysis facility that addresses these challenges during 2020. In this tutorial we describe and demonstrate elements of such a prototype at the University of Nebraska-Lincoln (UNL).
The prototype analysis facility provides services for “low latency columnar analysis”, enabling rapid processing of data in a column-wise fashion. These services, based on Dask and Jupyter notebooks, aim to dramatically lower time for analysis and provide an easily-scalable and user-friendly computational environment that will simplify, facilitate, and accelerate the delivery of HEP results. The facility is built on top of a local Kubernetes cluster and integrates dedicated resources with resources allocated via fairshare through the local HTCondor system. In addition to the user-facing interfaces such as Dask, the facility also manages access control through single-sign-on and authentication & authorization for data access. The showcase will include simple HEP analysis examples, managed interactively in a Jupyter notebook and scheduled on Dask workers and accessing both public and protected data
Files
PYHEP2020_OksanaShadura.zip
Files
(3.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:782d85dc0d693f1263892e81cb30bfbf
|
3.5 MB | Preview Download |