There is a newer version of the record available.

Published July 29, 2023 | Version v1.0.0
Dataset Open

Pharmacogenomics datasets for Ontology Matching

  • 1. Université Côte d'Azur, Inria, CNRS, I3S, France
  • 2. Inria Paris, Centre de Recherche des Cordeliers, Inserm, Université Paris Cité, Sorbonne Université, Paris, France

Description

Pharmacogenomics datasets for Ontology Matching

Pharmacogenomics (or PGx for short) involves n-ary tuples representing so-called "pharmacogenomic relationships" and their components of three distinct types: drugs, genetic factors, and phenotypes. Tuples are reified as instances of the class ``pgxo:PharmacogenomicRelationship``. The goal of the matching task is to match these tuples (instance matching).

Motivation: Pharmacogenomic tuples involve drugs, genetic factors, and phenotypes, and state that patients being treated by the specified drugs while having the specified genetic factors may experience the given phenotypes. Knowledge in pharmacogenomics is scattered across several resources, e.g., reference databases (PharmGKB) or the biomedical literature. Hence, there is a need to build a consolidated view of the knowledge of this domain by aligning tuples from different sources. See [1] for a detailed motivation and [2] for a detailed task description.

Datasets

We provide different subsets of the alignments available in PGxLOD that have been created with the matching rules described in [3].

Task with 10 % of PGx relationships

  • Alignments: 1092
    • relatedMatch alignments: 66
    • sameAs alignments: 498
    • closeMatch alignments: 53
    • broadMatch alignments: 333
    • narrowMatch alignments: 142
  • Entities to align in source: 2525
  • Triples in source: 816604
  • Entities to align in target: 2518
  • Triples in target: 816859

Task with 50 % of PGx relationships

  • Alignments: 23630
    • relatedMatch alignments: 1245
    • sameAs alignments: 9219
    • closeMatch alignments: 1175
    • broadMatch alignments: 7183
    • narrowMatch alignments: 4808
  • Entities to align in source: 12816
  • Triples in source: 894723
  • Entities to align in target: 12401
  • Triples in target: 889735

Task with 100 % of PGx relationships

  • Alignments: 89926
    • relatedMatch alignments: 4979
    • sameAs alignments: 35499
    • closeMatch alignments: 4603
    • broadMatch alignments: 26135
    • narrowMatch alignments: 18710
  • Entities to align in source: 25406
  • Triples in source: 982548
  • Entities to align in target: 25029
  • Triples in target: 980543

References

  1. Pierre Monnin, Joël Legrand, Graziella Husson, Patrice Ringot, Andon Tchechmedjiev, Clément Jonquet, Amedeo Napoli, Adrien Coulet: PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison. BMC Bioinformatics 20-S(4): 139:1-139:16 (2019) [pdf]
  2. Pierre Monnin, Adrien Coulet: Matching pharmacogenomic knowledge: particularities, results, and perspectives. OM@ISWC 2022: 79-83 [pdf]
  3. Pierre Monnin, Miguel Couceiro, Amedeo Napoli, Adrien Coulet: Knowledge-Based Matching of n-ary Tuples. ICCS 2020: 48-56 [pdf]

Files

pharmacogenomics-om-v1.0.0.zip

Files (49.5 MB)

Name Size Download all
md5:709560077480d850377a906ba68af1d3
49.5 MB Preview Download

Additional details

Related works

Cites
Journal article: 10.1186/s12859-019-2693-9 (DOI)
Conference paper: https://ceur-ws.org/Vol-3324/om2022_STpaper3.pdf (URL)
Conference paper: 10.1007/978-3-030-57855-8_4 (DOI)