Published August 5, 2025 | Version 1.0.1
Dataset Open

Synthetic Protein Interaction Dataset

  • 1. ROR icon Heidelberg University

Description

Synthetic Protein Interaction Dataset

This dataset contains synthetic protein-protein interaction data designed for demonstration, benchmarking, and educational purposes in computational biology and bioinformatics. The data simulates interactions between proteins from multiple species, including mouse (Mus musculus), rat (Rattus norvegicus), and human (Homo sapiens).

File Format

  • Type: Tab-separated values (TSV)
  • Filename: synthetic_protein_interactions.tsv

Columns

Column Description
source Source protein identifier
target Target protein identifier
source_genesymbol Source gene symbol
target_genesymbol Target gene symbol
is_directed Indicates if the interaction is directed (1) or undirected (0)
is_stimulation Indicates if the interaction is stimulatory (1) or not (0)
is_inhibition Indicates if the interaction is inhibitory (1) or not (0)
consensus_direction Consensus on directionality (1 for directed, 0 for undirected)
consensus_stimulation Consensus on stimulation (1 for stimulatory, 0 for not)
consensus_inhibition Consensus on inhibition (1 for inhibitory, 0 for not)
type Type of interaction (e.g., binding, activation, phosphorylation, inhibition, ubiquitination)
ncbi_tax_id_source NCBI taxonomy ID for the source protein
entity_type_source Entity type for the source (e.g., protein)
ncbi_tax_id_target NCBI taxonomy ID for the target protein
entity_type_target Entity type for the target (e.g., protein)

Use Cases

  • Testing and benchmarking graph database import pipelines
  • Educational demonstrations of protein interaction networks
  • Development and validation of bioinformatics tools

Notes

  • All data is synthetic and does not represent real biological interactions.
  • The dataset includes a variety of interaction types and cross-species relationships.
  • Gene symbols and protein identifiers are modeled after real-world conventions but are randomly assigned.

Citation

If you use this dataset, please cite as:

Synthetic Protein Interaction Dataset (v1.0). Generated for demonstration and benchmarking purposes. [DOI 10.5281/zenodo.16745601]

 

 

 

Files

Files (2.0 kB)

Name Size Download all
md5:155577b25e2e8460a59e8a096875edfa
2.0 kB Download

Additional details

Software

Development Status
Active