Introduction
The CES package contains the new tool CHIP (Central Hint and
Information Processor) which in Run 2 replaces the obsolete Run 1
OnlineRecovery package. CHIP supervises the ATLAS data taking, takes
operational decisions and handles abnormal conditions. It automates
procedures and performs advanced recoveries. Furthermore it
interacts with the Test Management service which allows to make
informed decisions based on the outcome of the test results. CHIP is
written in Java and based on a third party open source complex event
processing engine.
More information about CHIP including a list of recoveries and
automatic procedures can be found on the
CHIP twiki page .
Core functionality
The CHIP takes the decision what to do in case of applications of
the Run Control tree going into the error state, failing, crashing,
etc. Normally the action taken will be according to the
configuration settings for the specific application (IF-FAILS,
IF-DIES, IF-ERROR) with the following exceptions:
- An application with decision set to RESTART will be restarted
up to a maximum of 5 times in a time interval of 30 minutes.
When the maximum number of restarts is reached, the IF-FAILS
action is taken instead.
- With the configuration setting HANDLE an application-specific
recovery can be defined.
Automatic procedures and recoveries
CHIP implements various automatic procedures and advanced recovery
mechanism which are detailed on the twiki page linked above. The
current release contains the following recoveries:
- Stopless Removal/Recovery
- Module Removal/Recovery
- Resynch
- TTC Restart
- Various HLT recoveries triggered either by crashed HLT
applications or ERS messages
- L1Calo-specific recoveries
- RPC-specific recoveries
- Specific commands sent to the DCMs when ROS and/or SFO
applications are killed
The current release contains the following automatic procedures:
- Switching of ATLAS reference clock between LHC clock and
internal one
- Warm Start/Stop
- Enabling/disabling the BCID check for the BCM detector taking
into account ATLAS reference clock
- Switching on/off of the PIXEL's pre-amplifier
- Restarting of specific applications in the EventDisplays
partition when a new ATLAS run is started
New procedures, either recoveries or automatic ones, can be added to
CHIP on request of the sub-detector communities.
Enabling CHIP metrics
The metrics report of the internal CEP engine can now be switched
on/off at run-time publishing in IS an object of type EnableCHIPMetricsIs
(default IS server is RunCtrlStatistics).
It is possible to setup the metrics reporting interval as well.
Metrics information for both the CEP engine and the single
statements is available (information types are CHIPEngineMetrics
and CHIPStatementMetrics).
List of changes in Jira
Here is a list of patches applied to the last release and ported
to the tdaq-06-00-00 release:
Here is a list of fixes available in tdaq-06-00-00 only:
The threading model has been modified and it is described here.