Wikidata for biocuration of cell types in the context of the Human Cell Atlas
- 1. Interunit Graduate Program in Bioinformatics, University of São Paulo, Brazil
- 2. College of Pharmaceutical Sciences, University of São Paulo, São Paulo, Brazil
- 3. Interunit Graduate Program in Bioinformatics, University of São Paulo, São Paulo, Brazil
- 4. Hospital Israelita Albert Einstein, São Paulo, Brazil
Description
The emergence of the Human Cell Atlas and the prominence of single-cell omics have propelled cell types to the forefront of modern biology. Bioinformaticians now heavily rely on various databases that furnish crucial information, especially markers, essential for annotating new datasets. However, despite the pivotal role of cell types, the organization of information about these entities remains in its infancy. Unlike species and genes, a standardized nomenclatural scheme for cell types and clear boundaries for their assignment are still lacking. While most datasets use ambiguous natural language, the Cell Ontology has been providing unique identifiers for cell types for over two decades, although contributing to it demands advanced skills in GitHub and ontology development. Presently, it offers identifiers for fewer than 2800 cell types. In contrast, Wikidata, the versatile open knowledge graph of the Wikimedia Foundation, encompassing over 100 million entities, is increasingly harnessed for integrating biomedical knowledge. It allows for seamless navigation and editing through a user-friendly visual interface and well-documented APIs. Following successful integration efforts with data from Gene Ontology, Cellosaurus, Complex Portal, and others, its web-based SPARQL Query Service emerges as a potent tool for biomedical discovery. This work describes a three-year endeavor to explore Wikidata as a platform for representing information about cell types. Currently, Wikidata hosts identifiers for over 600 cell types, in addition to more than 2600 cross-references to Cell Ontology, 8400 marker genes, 500 links to Wikipedia pages, and 150 links to openly-licensed images, all queryable via SPARQL. Its accessible, crowd-sourced infrastructure facilitates rapid biocuration, enhancing coverage and providing a testing ground for the large-scale organization of cell type information. Wikidata has matured as a platform for cell type information and stands ready to be integrated into workflows involving Cell Ontology and bioinformatics.
Files
Lubiana, T..pdf
Files
(32.0 kB)
Name | Size | Download all |
---|---|---|
md5:265ef77952ce75f174d6fde609e3282f
|
32.0 kB | Preview Download |