WLCG/DOMA Data Challenge 2024: Final Report
Contributors
Researchers:
- Arora, Aashay
- Agostini, Federica
- Arsuaga Rios, Maria
- Balcas, Justas
- Balci, Berk
- Barisits, Martin
- Benjamin, Douglas
- Betev, Latchezar
- Carder, Dale
- Chauhan, Rahul
- Christidis, Dimitrios
- Chudoba, Jiri
- Dart, Eli
- Davila, Diego
- Dewhurst, Alastair
- Ellis, Katy
- Fernandez Casani, Alvaro
- Flix Molina, Josep
- Forti, Alessandra
- Gardner, Robert William
- Garonne, Vincent
- Garrido, Borja
- Giacomini, Francesco
- Glushkov, Ivan
- Haen, Christophe
- Hoeft, Bruno
- Lassnig, Mario
- Lehman, Tom
- Litmaath, Maarten
- Luehring, Frederick
- Lukasczyk, Mark
- Manrique, Andres
- Mascetti, Luca
- McKee, Shawn
- Miccoli, Roberta
- Morganti, Lucia
- Murray, Steven
- Musheghyan, Haykuhi
- Nappi, Antonio
- Ozturk, Hasan
- Pacheco Pages, Andres
- Paparrigopoulos, Panos
- Pardi, Silvio
- Paspalaki, Garyfallia
- Patrascoiu, Mihai
- Perez-Calero Yzquierdo, Antonio
- Robinson, Kate
- Rogovskiy, Alexander
- Sapunenko, Vladimir
- Shah, Asif
- Timm, Steven
- Vianello, Enrico
- Vokac, Petr
- Wissing, Christoph
- Yang, Xi
- Zani, Stefano
Description
The WLCG/DOMA Data Challenge 2024 (DC24) was executed to rigorously test the functionalities and
capabilities of the Worldwide LHC Computing Grid (WLCG) in preparation for the High-Luminosity
LHC. DC24 was the second in a series of increasing challenges and targeted 25% of the expected
HL-LHC throughput. DC24 was structured to stress-test various data transfer tools and
methodologies, optimise network configurations, and investigate potential limitations in our
infrastructure. The challenge set forth aggressive targets: 1.2 Tbps for the minimal model focusing on
Tier-0 Export to the Tier-1 centres, and 2.4 Tbps for the flexible model, including complex experiment
data flows. The primary objectives of DC24 also included validating the scalability and performance
of data management tools like FTS (File Transfer Service) and Rucio, and ensuring robust
authentication mechanisms using tokens. During DC24, the Belle-2 and DUNE experiments executed
network exercises as well. Although their throughput was orders of magnitude lower than the LHC
experiments, many sites and network paths were shared, and no interference was observed.
The challenge yielded numerous significant achievements. The minimal model was easily achieved
thanks to a multi-month preparatory effort involving various ramp-up challenges. The flexible model
was reached during the second half of the challenge and sustained for multiple hours. The challenge
also identified various performance bottlenecks, including issues with token refresh operations and
database overloads. DC24 offered the first opportunity to gain operational experiences using
token-based authentication for data transfers. About half of the transfers injected for the challenge
used tokens already. Significant tuning and dynamic adjustments were essential to maintain high
transfer rates during the challenge. While token-based transfers were successfully tested, significant
issues related to token refresh operations led to timeouts and transfer failures, particularly at highly
loaded sites. New network technologies for load balancing, guaranteed bandwidth, and congestion
control were also successfully evaluated.
Most of the sites did not observe problems with their storage nor suffered from network saturations.
A few sites identified bottlenecks in their local infrastructure and are now in a position to apply
upgrades or tuning of parameters. Overall, the challenge was considered very useful.
Based on the experiences from the previous DC21, the monitoring capabilities were greatly improved
before DC24. New capabilities to tag dataflows on the network were demonstrated and will allow
enhanced monitoring in the future. Some small-scale differences between different monitoring
systems were spotted during the challenge and are subject to further investigation.
DC24 was instrumental in identifying and mitigating various performance bottlenecks, configuration
issues, and operational hurdles. The insights and improvements derived from this challenge are
pivotal for future scalability and operational robustness of our infrastructure. The preliminary date of
the next Data Challenge will be autumn 2026, targeting 50% of the HL-LHC traffic.
Files
WLCG_DOMA Data Challenge 2024 - Final Report.pdf
Files
(17.4 MB)
Name | Size | Download all |
---|---|---|
md5:1eb1888c66bef1fe759eda66ebaa7e64
|
17.4 MB | Preview Download |