Published September 5, 2025 | Version v1
Report Open

CEPH RGW MULTISITE CONSISTENCY MONITORING SERVICE

  • 1. ROR icon Silesian University of Technology
  • 2. ROR icon European Organization for Nuclear Research

Description

The long-term storage and availability of vast datasets, such as those generated by the Large Hadron Collider (LHC), are critical to CERN’s scientific mission. The Ceph distributed storage system, with its RADOS Gateway (RGW) S3-compatible object storage interface, provides a scalable and resilient solution. To ensure high availability and disaster recovery, RGW can be deployed in a multisite replication configuration, for instance, between the Meyrin and Prévessin data centers. However, maintaining perfect data consistency across geographically distributed sites presents a significant challenge. Latency, network partitions, or software bugs can lead to replication inconsistencies, where data exists at one site but is missing or outdated at another. This project addresses this challenge through the development of the Ceph RGW Multisite Consistency Monitor, a comprehensive tool designed to detect and diagnose replication discrepancies. The tool operates in two distinct modes: a non-intrusive Passive Monitoring Mode that listens to real-time S3 operations via Ceph’s Kafka-based bucket notifications, and an Active Testing Mode that generates controlled S3 workload (PUT/DELETE operations) to stress-test the replication pipeline and validate consistency under load. The system leverages the AWS S3 command-line interface for object manipulation and a high-performance C++ component for real-time Kafka event processing. By comparing the ground truth of performed S3 operations with the stream of replication notifications, the monitor can pinpoint specific inconsistencies, such as missing notifications, extra notifications, and orphaned synchronization events. The tool produces detailed JSON reports and human-readable summaries, providing storage administrators with the necessary diagnostics to maintain the integrity of CERN’s distributed storage infrastructure.

Files

DawidGRABOWSKI-2025SummerStudent-Report.pdf

Files (504.2 kB)

Name Size Download all
md5:dcb39c010e2ce9976216d06f0815421d
504.2 kB Preview Download