Published October 23, 2023 | Version v1
Other Open

Wikidata for biocuration of cell types in the context of the Human Cell Atlas

  • 1. Interunit Graduate Program in Bioinformatics, University of São Paulo, Brazil
  • 2. College of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
  • 3. Interunit Graduate Program in Bioinformatics, University of São Paulo, São Paulo, Brazil
  • 4. Hospital Israelita Albert Einstein, São Paulo, Brazil

Description

The emergence of the Human Cell Atlas and the prominence of single-cell omics have propelled cell types to the forefront of modern biology. Bioinformaticians now heavily rely on various databases that furnish crucial information, especially markers, essential for annotating new datasets. However, despite the pivotal role of cell types, the organization of information about these entities remains in its infancy. Unlike species and genes, a standardized nomenclatural scheme for cell types and clear boundaries for their assignment are still lacking. While most datasets use ambiguous natural language, the Cell Ontology has been providing unique identifiers for cell types for over two decades, although contributing to it demands advanced skills in GitHub and ontology development. Presently, it offers identifiers for fewer than 2800 cell types. In contrast, Wikidata, the versatile open knowledge graph of the Wikimedia Foundation, encompassing over 100 million entities, is increasingly harnessed for integrating biomedical knowledge. It allows for seamless navigation and editing through a user-friendly visual interface and well-documented APIs. Following successful integration efforts with data from Gene Ontology, Cellosaurus, Complex Portal, and others, its web-based SPARQL Query Service emerges as a potent tool for biomedical discovery. This work describes a three-year endeavor to explore Wikidata as a platform for representing information about cell types. Currently, Wikidata hosts identifiers for over 600 cell types, in addition to more than 2600 cross-references to Cell Ontology, 8400 marker genes, 500 links to Wikipedia pages, and 150 links to openly-licensed images, all queryable via SPARQL. Its accessible, crowd-sourced infrastructure facilitates rapid biocuration, enhancing coverage and providing a testing ground for the large-scale organization of cell type information. Wikidata has matured as a platform for cell type information and stands ready to be integrated into workflows involving Cell Ontology and bioinformatics.

Files

Lubiana, T..pdf

Files (32.0 kB)

Name Size Download all
md5:265ef77952ce75f174d6fde609e3282f
32.0 kB Preview Download