This is a repository of openly available hypergraph datasets in JSON format with documentation more extensively describing the datasets. This is loosely inspired by Datasheets for Datasets by Gebru et al. You can find scripts for converting the original datasets on the XGI-DATA Github Page.


Overview of the xgi-data format

The xgi-data format for hypergraph data sets is a JSON data structure with the following structure:

  • "hypergraph-data": This tag accesses the attributes of the entire hypergraph dataset such as the authors or dataset name.
  • "node-data": This tag accesses the nodes of the hypergraph and their associated properties as a dictionary where the keys are node IDs and the corresponding values are dictionaries. If a node doesn't have any properties, the associated dictionary is empty.
    • "name": This tag accesses the node's name if there is one that is different from the ID specified in the hyperedges.
    • Other tags are user-specified based on the particular attributes provided by the dataset.
  • "edge-data": This tag accesses the hyperedges of the hypergraph and their associated attributes.
    • "name": This tag accesses the edge's name if one is provided.
    • "timestamp": This is the tag specifying the time associated with the hyperedge if it is given. All times are stored in ISO8601 standard.
    • Other tags are user-specified based on the particular attributes provided by the dataset.
  • "edge-dict": This tag accesses the edge IDs and the corresponding nodes which participate in that hyperedge.

All IDs are strings but can be converted to other types if desired.

Data sets available on xgi-data

Currently available data sets are:

  • coauth-mag-geology
  • coauth-mag-history
  • congress-bills
  • contact-high-school
  • contact-primary-school
  • diseasome
  • disgenenet
  • email-enron
  • email-eu
  • hospital-lyon
  • ndc-substances
  • tags-ask-ubuntu
  • tags-math-sx
  • tags-stack-overflow

These datasets can be loaded with XGI using the following lines:

import xgi

H = xgi.load_xgi_data("<dataset_name>")


where <dataset_name> is chosen from the list above.


These datasets have been taken from the following sources:


HNDS-I: Using Hypergraphs to Study Spreading Processes in Complex Social Networks
U.S. National Science Foundation