Published June 28, 2024 | Version v2
Dataset Open

ORBITAAL: cOmpRehensive BItcoin daTaset for temporAl grAph anaLysis

  • 1. ROR icon Laboratoire d'Informatique en Images et Systèmes d'Information
  • 2. ROR icon Université Claude Bernard Lyon 1

Description

Dataset Construction

This dataset captures the temporal network of Bitcoin (BTC) flow exchanged between entities at the finest time resolution in UNIX timestamp. Its construction is based on the blockchain covering the period from January, 3rd  of 2009 to January the 25th of 2021. The blockchain extraction has been made using bitcoin-etl  (https://github.com/blockchain-etl/bitcoin-etl) Python package. The entity-entity network is built by aggregating Bitcoin addresses using the common-input heuristic [1] as well as popular Bitcoin users' addresses provided by https://www.walletexplorer.com/

[1] M. Harrigan and C. Fretter, "The Unreasonable Effectiveness of Address Clustering," 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld), Toulouse, France, 2016, pp. 368-373, doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0071.
keywords: {Online banking;Merging;Protocols;Upper bound;Bipartite graph;Electronic mail;Size measurement;bitcoin;cryptocurrency;blockchain},

 

Dataset Description

Bitcoin Activity Temporal Coverage: From 03 January 2009 to 25 January 2021

Overview:

This dataset provides a comprehensive representation of Bitcoin exchanges between entities over a significant temporal span, spanning from the inception of Bitcoin to recent years. It encompasses various temporal resolutions and representations to facilitate Bitcoin transaction network analysis in the context of temporal graphs.

Every dates have been retrieved from bloc UNIX timestamp and GMT timezone.

Contents:

The dataset is distributed across three compressed archives:

All data are stored in the Apache Parquet file format, a columnar storage format optimized for analytical queries. It can be used with pyspark Python package.

  1. orbitaal-stream_graph.tar.gz:

    • The root directory is STREAM_GRAPH/
    • Contains a stream graph representation of Bitcoin exchanges at the finest temporal scale, corresponding to the validation time of each block (averaging approximately 10 minutes).
    • The stream graph is divided into 13 files, one for each year
    • Files format is parquet
    • Name format is orbitaal-stream_graph-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering
    • These files are in the subdirectory STREAM_GRAPH/EDGES/
  2. orbitaal-snapshot-all.tar.gz:

    • The root directory is SNAPSHOT/
    • Contains the snapshot network representing all transactions aggregated over the whole dataset period (from Jan. 2009 to Jan. 2021).
    • Files format is parquet
    • Name format is orbitaal-snapshot-all.snappy.parquet.
    • These files are in the subdirectory SNAPSHOT/EDGES/ALL/
  3. orbitaal-snapshot-year.tar.gz:

    • The root directory is SNAPSHOT/
    • Contains the yearly resolution of snapshot networks
    • Files format is parquet
    • Name format is orbitaal-snapshot-date-[YYYY]-file-id-[ID].snappy.parquet, where [YYYY] stands for the corresponding year and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year ordering
    • These files are in the subdirectory SNAPSHOT/EDGES/year/
  4. orbitaal-snapshot-month.tar.gz:

    • The root directory is SNAPSHOT/
    • Contains the monthly resoluted snapshot networks
    • Files format is parquet
    • Name format is orbitaal-snapshot-date-[YYYY]-[MM]-file-id-[ID].snappy.parquet, where
    • [YYYY] and [MM] stands for the corresponding year and month, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year and month ordering
    • These files are in the subdirectory SNAPSHOT/EDGES/month/
  5. orbitaal-snapshot-day.tar.gz:

    • The root directory is SNAPSHOT/
    • Contains the daily resoluted snapshot networks
    • Files format is parquet
    • Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-file-id-[ID].snappy.parquet, where
    • [YYYY], [MM], and  [DD] stand for the corresponding year, month, and day, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, and day ordering
    • These files are in the subdirectory SNAPSHOT/EDGES/day/
  6. orbitaal-snapshot-hour.tar.gz:

    • The root directory is SNAPSHOT/
    • Contains the hourly resoluted snapshot networks
    • Files format is parquet
    • Name format is orbitaal-snapshot-date-[YYYY]-[MM]-[DD]-[hh]-file-id-[ID].snappy.parquet, where
    • [YYYY], [MM], [DD], and [hh] stand for the corresponding year, month, day, and hour, and [ID] is an integer from 1 to N (number of files here) such as sorting in increasing [ID] ordering is similar to sort by increasing year, month, day and hour ordering
    • These files are in the subdirectory SNAPSHOT/EDGES/hour/
  7. orbitaal-nodetable.tar.gz:

    • The root directory is NODE_TABLE/
    • Contains two files in parquet format, the first one gives information related to nodes present in stream graphs and snapshots such as period of activity and associated global Bitcoin balance, and the other one contains the list of all associated Bitcoin addresses.

 

Small samples in CSV format

  1. orbitaal-stream_graph-2016_07_08.csv and orbitaal-stream_graph-2016_07_09.csv

    • These two CSV files are related to stream graph representations of an halvening happening in 2016.
  2. orbitaal-snapshot-2016_07_08.csv and orbitaal-snapshot-2016_07_09.csv

    • These two CSV files are related to daily snapshot representations of an halvening happening in 2016.
    •  

 

 

 

Files

orbitaal-readme.md

Files (156.9 GB)

Name Size Download all
md5:67d8f6412f532968b3da21440afe3e79
24.9 GB Download
md5:7a47d4f3e69069f9eea8d6e1af1c0a60
10.3 kB Preview Download
md5:3e3c3597f4fe94ff90e75465d9364c6a
17.2 MB Preview Download
md5:c8ff2f00175bf9e0f3152dab2a45cb28
15.0 MB Preview Download
md5:50b87171c9ce206b75f5bc0bafdbecf4
10.1 GB Download
md5:dffab2760cf7fc26e533f650f1600f2b
24.8 GB Download
md5:52a1a3043f2af88f1a1285ae8fe8289c
26.9 GB Download
md5:aad6f5d9902f56adfb34b565ea8f2607
23.0 GB Download
md5:0ada3f60200e1b042ff9870e45e009d9
23.1 GB Download
md5:56c3ccd166b5bf9bf42634eac074bdfe
26.3 MB Preview Download
md5:4dc535c75e6d426476fc986daf5cbf88
23.0 MB Preview Download
md5:142db3fc29949b8423790f7f882ad7cf
23.9 GB Download