Published October 17, 2022
| Version v1
Report
Open
BATCH ANOMALY DETECTION
Creators
Description
The 300,000 CPU-core HTCondor Batch farm at CERN provides the computing power for the initial processing of data coming from the LHC experiments. Such a large-scale computing setup inevitably comes with an abundance of monitoring data; current monitoring methods cannot be configured in a reasonable amount of time to catch all the potential anomalies. Building on previous work done and ongoing in the IT department, this project will focus on the HTC batch system, using both the base monitoring metrics and the HTC job data to evaluate our options for better anomaly detection and handling.
Files
Batch_Anomaly_Detection (1).pdf
Files
(1.3 MB)
Name | Size | Download all |
---|---|---|
md5:a9180920541a4a2909d75bbfc312dbfa
|
1.3 MB | Preview Download |