Published January 29, 2026 | Version v1.0.0
Dataset Open

Chocolate Cloud Object Storage Transfer Speeds Dataset

  • 1. Chocolate Cloud ApS

Description

Overview

This dataset measures upload and download performance between Fly.io gateway regions (origins) and commercial object storage backends (targets). Each row is one measurement for a specific data size, initiated from a Fly.io region and recorded against a particular backend, and is intended for studying network performance, latency-sensitive placement, and cross-region transfer behavior.

Some records use 1-byte uploads/downloads to approximate latency by activating the target service's data path with minimal payload. For each timestamp, measurements include standard sizes (1 byte, 1 MB, 10 MB, 50 MB) plus a few random sizes up to 50 MB. The dataset includes ~900.000 measurements spanning 86 days between 2024-10-31 and 2025-01-24, with a pause from 2024-11-18 to 2024-12-18. Each measurement is uniquely identified by (timestamp, origin_fly_region, target_backend_id, size_bytes).

CSV Columns

  • timestamp: UTC datetime string for the measurement (timezone-aware, ISO 8601).
  • origin_fly_region: Fly.io gateway region code (3-letter).
  • origin_countrycode: ISO 3166-1 alpha-2 country code (lowercase) for the Fly.io gateway.
  • origin_city: City of the Fly.io gateway.
  • origin_lat: Latitude of the Fly.io gateway.
  • origin_lng: Longitude of the Fly.io gateway.
  • target_backend_id: Internal storage backend ID.
  • target_provider: Cloud provider name.
  • target_region: Cloud provider region.
  • target_countrycode: ISO 3166-1 alpha-2 country code (lowercase) for the backend location.
  • target_city: City of the storage backend.
  • target_timezone: Time zone name for the backend.
  • target_lat: Latitude of the storage backend.
  • target_lng: Longitude of the storage backend.
  • target_local_time: Local time at the target backend for the same instant as timestamp.
  • distance_km: Great-circle distance between origin and target, in kilometers (rounded int).
  • size_bytes: Data size in bytes for the measurement.
  • upload_time_ms: Upload time in milliseconds.
  • download_time_ms: Download time in milliseconds.
  • upload_speed_mbps: Upload speed in megabits per second (2 decimal places).
  • download_speed_mbps: Download speed in megabits per second (2 decimal places).

Intended Use Examples

  • Compare upload/download performance across cloud providers and regions for a fixed data size.
  • Identify nearest or best-performing storage backends for a given Fly.io region.
  • Analyze how geographic distance correlates with throughput.
  • Build placement or replication strategies based on observed network performance.
  • Use as input for predictive models of transfer time or throughput.

Notes

  • Rows are sorted by timestamp ascending.
  • City names may contain commas and are properly quoted in the CSV.
  • There are no missing values

Related ML Models

Models trained on this dataset are published at:

https://zenodo.org/records/18288840

These models predict transfer time for a specific Fly.io region to storage-backend route at a given time and data size. There is a separate model for six backends and the Fly.io London (lhr) region.

The target_backend_id column is the internal unique ID of a region for a commercial cloud storage provider and is consistent with the backend identifiers used in the published models.

Files

_SAMPLE_chocolate_cloud_object_storage_transfer_speeds copy.csv

Files (176.7 MB)

Name Size Download all
md5:609cbf542261e3f53bd557538a9a45f2
4.0 kB Preview Download
md5:d9bb01ee3997bbb6225971f5b8e29aea
176.7 MB Preview Download
md5:687f35c2f3b1ff2ec36b05c1e45fe6ec
3.5 kB Preview Download

Additional details

Related works

Is referenced by
Journal article: 10.1109/TCC.2023.3287653 (DOI)

Funding

European Commission
MLSysOps - Machine Learning for Autonomic System Operation in the Heterogeneous Edge-Cloud Continuum 101092912