Dataset of RTT latency internet measurements in Europe.
Authors/Creators
Description
This dataset provides real-world Round-Trip Time (RTT) latency measurements collected from a geographically distributed set of probing nodes (Monitors) to target IP addresses across Europe. It is intended to support machine learning research in IP geolocation, enabling both model training and performance evaluation through curated datasets. Measurements were collected using ICMP echo requests during a campaign spanning from November 27, 2024 to January 30, 2025.
Scenario Description
The measurements were gathered from six virtual machines acting as Monitors, strategically deployed across Azure regions in Europe: Madrid (Spain), Dublin (Ireland), Frankfurt (Germany), Warsaw (Poland), Gävle (Sweden), and Milan (Italy). These Monitors probed two categories of destinations:
· Landmarks: Nodes with known and verified geographical coordinates, used for model training.
· Targets: Nodes also with known coordinates, used exclusively for validation but treated as unknown during inference.
The RTT data are structured as fingerprint vectors, where each vector consists of latency statistics from all six Monitors to a specific target IP in a given measurement instance. These vectors incorporate multiple RTT-based features such as mean, geometric mean, and standard deviation values.
Dataset Structure
Each dataset contains multiple rows, where each row represents a RTT fingerprint vector consisting of latency measurements from multiple Monitors to a given node.
1) Learning Dataset: Landmark_RTTfingerprint_dataset.csv
- Monitors deployed: 6 (distributed across Microsoft Azure regions in Madrid, Dublin, Frankfurt, Warsaw, Gävle, and Milan).
- Landmarks: nodes with known geographical locations, used for training models.
- Columns:
o measure_id: Unique identifier for each measurement.
o landmark_id: ID of the geolocated node used for training.
o landmark_type: Type of landmark (dns, ripe_anchor, ripe_probe).
o dst_ip: IP address of the landmark node.
o init_time: Timestamp of the measurement.
o country_code_gt: Ground truth country code of the node.
o latitude_gt, longitude_gt: Ground truth geolocation of the landmark node.
o 4h_time_slot, 6h_time_slot: Time window identifiers indicating when the measurement was taken.
o mean_latency_m1 – mean_latency_m6: Mean RTT fingerprint vector from 6 different Monitors (milliseconds).
o geomean_latency_m1 – geomean_latency_m6: Geometric mean RTTs from 6 Monitors (milliseconds).
o std_latency_m1 – std_latency_m6: Standard deviation of RTTs from 6 Monitors (milliseconds).
2) Validation Dataset: ValidationDataset_RTT_dispersed_EU.csv)
- Monitors deployed: 6 (same as Learning Dataset).
- Targets: nodes used to evaluate model performance. Their actual locations are known but treated as unknown during inference.
- Columns:
o measure_id: Unique identifier for each measurement.
o target_id: ID of the target node used for validation.
o target_type: Type of target (dns, ripe_anchor, ripe_probe).
o dst_ip: IP address of the target node.
o init_time: Timestamp of the measurement.
o country_code_gt: Ground truth country code of the node.
o latitude_gt, longitude_gt: Ground truth geolocation of the landmark node.
o 4h_time_slot, 6h_time_slot: Time window identifiers indicating when the measurement was taken.
o mean_latency_m1 – mean_latency_m6: Mean RTT fingerprint vector from 6 different Monitors (milliseconds).
o geomean_latency_m1 – geomean_latency_m6: Geometric mean RTTs from 6 Monitors (milliseconds).
o std_latency_m1 – std_latency_m6: Standard deviation of RTTs from 6 Monitors (milliseconds).
Files
Landmark_RTTfingerprint_dataset.csv
Additional details
Dates
- Collected
-
2024-11-27/2025-01-30dataset generation