Pharmacogenomics datasets for Ontology Matching

Monnin, Pierre; Coulet, Adrien

doi:10.5281/zenodo.8195909

Published July 29, 2023 | Version v1.0.0

Dataset Open

Pharmacogenomics datasets for Ontology Matching

1. Université Côte d'Azur, Inria, CNRS, I3S, France
2. Inria Paris, Centre de Recherche des Cordeliers, Inserm, Université Paris Cité, Sorbonne Université, Paris, France

Pharmacogenomics datasets for Ontology Matching

Pharmacogenomics (or PGx for short) involves n-ary tuples representing so-called "pharmacogenomic relationships" and their components of three distinct types: drugs, genetic factors, and phenotypes. Tuples are reified as instances of the class ``pgxo:PharmacogenomicRelationship``. The goal of the matching task is to match these tuples (instance matching).

Motivation: Pharmacogenomic tuples involve drugs, genetic factors, and phenotypes, and state that patients being treated by the specified drugs while having the specified genetic factors may experience the given phenotypes. Knowledge in pharmacogenomics is scattered across several resources, e.g., reference databases (PharmGKB) or the biomedical literature. Hence, there is a need to build a consolidated view of the knowledge of this domain by aligning tuples from different sources. See [1] for a detailed motivation and [2] for a detailed task description.

Datasets

We provide different subsets of the alignments available in PGxLOD that have been created with the matching rules described in [3].

Task with 10 % of PGx relationships

Alignments: 1092
- relatedMatch alignments: 66
- sameAs alignments: 498
- closeMatch alignments: 53
- broadMatch alignments: 333
- narrowMatch alignments: 142
Entities to align in source: 2525
Triples in source: 816604
Entities to align in target: 2518
Triples in target: 816859

Task with 50 % of PGx relationships

Alignments: 23630
- relatedMatch alignments: 1245
- sameAs alignments: 9219
- closeMatch alignments: 1175
- broadMatch alignments: 7183
- narrowMatch alignments: 4808
Entities to align in source: 12816
Triples in source: 894723
Entities to align in target: 12401
Triples in target: 889735

Task with 100 % of PGx relationships

Alignments: 89926
- relatedMatch alignments: 4979
- sameAs alignments: 35499
- closeMatch alignments: 4603
- broadMatch alignments: 26135
- narrowMatch alignments: 18710
Entities to align in source: 25406
Triples in source: 982548
Entities to align in target: 25029
Triples in target: 980543

References

Pierre Monnin, Joël Legrand, Graziella Husson, Patrice Ringot, Andon Tchechmedjiev, Clément Jonquet, Amedeo Napoli, Adrien Coulet: PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison. BMC Bioinformatics 20-S(4): 139:1-139:16 (2019) [pdf]
Pierre Monnin, Adrien Coulet: Matching pharmacogenomic knowledge: particularities, results, and perspectives. OM@ISWC 2022: 79-83 [pdf]
Pierre Monnin, Miguel Couceiro, Amedeo Napoli, Adrien Coulet: Knowledge-Based Matching of n-ary Tuples. ICCS 2020: 48-56 [pdf]

Files

pharmacogenomics-om-v1.0.0.zip

Files (49.5 MB)

Name	Size	Download all
pharmacogenomics-om-v1.0.0.zip md5:709560077480d850377a906ba68af1d3	49.5 MB	Preview Download

Additional details

Cites: Journal article: 10.1186/s12859-019-2693-9 (DOI); Conference paper: https://ceur-ws.org/Vol-3324/om2022_STpaper3.pdf (URL); Conference paper: 10.1007/978-3-030-57855-8_4 (DOI)

	All versions	This version
Views	263	80
Downloads	67	18
Data volume	2.2 GB	890.1 MB

Pharmacogenomics datasets for Ontology Matching

Authors/Creators

Description

Files

pharmacogenomics-om-v1.0.0.zip

Files (49.5 MB)

Additional details

Related works