Published October 17, 2022 | Version v1
Report Open

BATCH ANOMALY DETECTION

Creators

Description

The 300,000 CPU-core HTCondor Batch farm at CERN provides the computing power for the initial processing of data coming from the LHC experiments. Such a large-scale computing setup inevitably comes with an abundance of monitoring data; current monitoring methods cannot be configured in a reasonable amount of time to catch all the potential anomalies. Building on previous work done and ongoing in the IT department, this project will focus on the HTC batch system, using both the base monitoring metrics and the HTC job data to evaluate our options for better anomaly detection and handling.

Files

Batch_Anomaly_Detection (1).pdf

Files (1.3 MB)

Name Size Download all
md5:a9180920541a4a2909d75bbfc312dbfa
1.3 MB Preview Download