Workflow Automation for Reliable Data Movement and Storage Between ARCHER2 and JASMIN
Authors/Creators
- 1. National Centre for Atmospheric Sciences (NCAS)
Description
Large-scale simulations executed on ARCHER2 produce substantial data volumes that must be transferred, archived and managed reliably across national research facilities. Manual data movement introduces operational risk, delays downstream analysis, and increases the likelihood of storage bottlenecks.
This work presents a production workflow that automates the end-to-end data movement pipeline from ARCHER2 to the JASMIN data facility and onward to long-term Elastic Tape storage. Implemented using Cylc8, the workflow detects newly generated datasets via external triggers, coordinates secure transfers using Globus, verifies successful archival through the Near-Line Data Store (NLDS), and performs controlled clean-up of intermediate storage.
Operational resilience is a primary design objective. The workflow incorporates configurable polling intervals, retry mechanisms to tolerate transient network failures, and structured logging to support traceability and debugging. A modular configuration allows the workflow to be reused across modelling activities and adapted to differing data management requirements.
Automating cross-facility data handling reduces time-to-archive, alleviates storage pressure on ARCHER2, and improves the reliability of scientific data preservation. This work highlights the role of workflow automation in supporting sustainable data management practices for Tier-1 supercomputing environments and provides a practical approach for handling the increasing data demands of contemporary simulation workloads.
Files
Celebration_of_science_2026Archer2_JC_Bilbao_V1_poster.pdf
Files
(1.0 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:d7f3f954d54ab26756eacf15c839b1ec
|
1.0 MB | Preview Download |