Report Open Access

HEP Application Delivery on HPC Resources

Shaffer, Tim; Blomer, Jakob; Ganis, Gerardo

Citation Style Language JSON Export

  "publisher": "Zenodo", 
  "DOI": "10.5281/zenodo.61157", 
  "title": "HEP Application Delivery on HPC Resources", 
  "issued": {
    "date-parts": [
  "abstract": "<p>Project Specification</p>\n\n<p>High-performance computing (HPC) contributes a significant and growing share of&nbsp;resource to high-energy physics (HEP). Individual supercomputers such as&nbsp;Edison or&nbsp;Titan in the U.S. or SuperMUC in Europe deliver a raw performance of the same order of&nbsp;magnitude than the Worldwide LHC Computing Grid. As we have seen with codes from&nbsp;ALICE and ATLAS, it is notoriously difficult to deploy high-energy physics applications&nbsp;on supercomputers, even though they often run a standard Linux on Intel x86_64 CPUs.</p>\n\n<p>The three main problems are:</p>\n\n<p>1. Limited or no Internet access;</p>\n\n<p>2. The lack of privileged local system rights;</p>\n\n<p>3. The concept of cluster submission or whole-node submission of jobs in contrast to&nbsp;single CPU slot submission in HEP.</p>\n\n<p>Generally, the delivery of applications to hardware resources in high-energy physics is&nbsp;done by CernVM-FS [1]. CernVM-FS is optimized for high-throughput resources.&nbsp;Nevertheless, some successful results on HPC resources where&nbsp;achieved using the Parrot&nbsp;system[2] that allows to use CernVM-FS without special privileges. Building on these&nbsp;results, the project aims to prototype a toolkit for application delivery that seamlessly&nbsp;integrates with HEP experiments job submission systems, for instance with ALICE AliEn&nbsp;or ATLAS PanDA. The&nbsp;task includes a performance study of the parrot-induced&nbsp;overhead which will be used to guide performance tuning for both CernVM-FS and&nbsp;Parrot on typical&nbsp;supercomputers. The project should further deliver a lightweight&nbsp;scheduling shim that translates HEP&rsquo;s job slot allocation to a whole&nbsp;node or cluster-based&nbsp;allocation. Finally, in order to increase the turn-around of the evaluation of new&nbsp;supercomputers, a set of &quot;canary jobs&quot; should be&nbsp;collected that validate HEP codes on&nbsp;new resources.</p>\n\n<p>[1]</p>\n\n<p>[2]</p>\n\n<p>Abstract</p>\n\n<p>On high performance computing (HPC) resources, users have less control over&nbsp;worker&nbsp;nodes than in the grid. Using HPC resources for high energy physics&nbsp;applications&nbsp;becomes more complicated because individual nodes often&nbsp;don&#39;t have Internet&nbsp;connectivity or a filesystem configured to use as a local&nbsp;cache. The current solution in&nbsp;CVMFS preloads the cache from a gateway node onto the shared cluster file system.&nbsp;This approach works but does not scale&nbsp;well into large production environments. In this&nbsp;project, we develop an in&nbsp;memory cache for CVMFS, and assess approaches to running&nbsp;jobs without&nbsp;special privilege on the worker nodes. We propose using Parrot and CVMFS&nbsp;with RAM cache as a viable approach to HEP application delivery on&nbsp;HPC resources.</p>", 
  "author": [
      "given": "Tim", 
      "family": "Shaffer"
      "given": "Jakob", 
      "family": "Blomer"
      "given": "Gerardo", 
      "family": "Ganis"
  "type": "article", 
  "id": "61157"
All versions This version
Views 3333
Downloads 2424
Data volume 10.9 MB10.9 MB
Unique views 3232
Unique downloads 2323


Cite as