Construction of a Knowledge Graph for RNA drug analysis
Description
In the past 5 years, a great deal of research has been devoted to the construction of biomedical Knowledge Graphs (KGs) capturing complex hidden biological and biochemical mechanisms by integrating different data sources. KGs are rapidly being adopted by the pharmaceutical industry to accelerate data-driven drug discovery by enabling the use of powerful algorithms for different kinds of applications, ranging from prioritizing novel disease targets to predicting previously unknown drug-disease associations. On the other hand, the COVID-19 pandemic highlighted the importance of RiboNucleic Acid (RNA)-based technologies for the development of new vaccines and, more in general, of novel RNA-based therapies covering the full spectrum of the main human diseases. The availability of a KG describing RNA molecules and their interaction with any other biomedical entity would therefore be a crucial resource for RNA-drug development. However, even if many open data sources report the interaction among different RNA molecules and some other biomedical entities (e.g., drugs, diseases, and genes), we still lack a comprehensive and well-described KG that contains such information.
This thesis project is rooted in the context of the “National Center for Gene Therapy and Drugs based on RNA Technology” funded by the Italian PNRR, and aims at creating a novel biomedical KG (named RNA-KG) for representing and inferring biological, experimentally validated, interactions among (coding and non-coding) RNA molecules. The construction of RNA-KG faces many interoperability issues for the acquisition of big biological data from biomolecular databanks and bio-ontologies. Starting from the analysis of public data sources containing different kinds of non-coding RNA sequences (and their relationships with other molecules), we have developed a meta-graph to guide the RNA-KG construction by exploiting and integrating biomedical ontologies and relevant data from public databases. Moreover, we propose an initial version of RNA-KG containing around 600K nodes and 6M edges, and we describe its topological characteristics.
Files
Thesis_Emanuele_CAVALLERI.pdf
Files
(26.5 MB)
Name | Size | Download all |
---|---|---|
md5:7d6e87fec40f36529799cbd58c4dd84e
|
26.5 MB | Preview Download |
Additional details
Software
- Repository URL
- https://github.com/AnacletoLAB/RNA-KG/