Heliophysics ScienceCore curriculum development with emphasis on knowledge representation techniques to increase usability of NASA cloud-based datasets
Contributors
Project member:
Description
This document is a redacted proposal from Polyneme LLC to the NASA SMD call for proposals F.14 Transform to Open Science Training (NNH22ZDA001N-TOPST). This was one of 16 successfully funded proposals under the program.
The key, central objectives of this proposal are to
1. Develop showcase notebooks (i.e., Jupyter notebooks) around cloud-based NASA heliophysics and space weather datasets that engender a sense of excitement about open science (methods and outcomes) within the general science community by demonstrating powerful, interactive infrastructure for querying, visualization, investigating data governance and lineage, data lake navigation, and dataset discoverability.
2. Bootstrap a knowledge graph (KG) via coordinated human-expert modeling of taxonomies/ontologies and curation of high-quality metadata that describe said datasets.
3. Advance appreciation – among domain specialists and nonspecialists – of semantic knowledge representation approaches that lower data preparation efforts, enable deeper analyses based on enriching data context, and facilitate gathering a critical mass of domain awareness by interpreting and interlinking data from different sources.
Critical methods and techniques proposed to accomplish the stated objectives are:
1. Literate programming, as realized by the Jupyter project and, in particular, the Jupyter notebook format and the iPython kernel for interactive computing through a web-browser-based interface, as well as mature infrastructure for deploying environments for user sessions such as BinderHub for JupyterHub containers.
2. Resource-oriented (e.g., RESTful) HTTP APIs to query and subset cloud-based datasets for interactive exploration via e.g., Jupyter notebooks.
3. W3C Semantic Web standards for machine-actionable (FAIR) knowledge representation based on Web technologies. In particular, the Resource Description Framework (RDF) suite of standards for metadata querying (SPARQL), serialization (e.g., Turtle), exchange (e.g., HTTP media types), modeling and inferencing (e.g., RDF entailment), and validation (SHACL), as well as tooling (e.g., the Python RDFLib project) and infrastructure (e.g., open-source RDF graph databases, process-embedded or otherwise).
The perceived significance of the proposed work to the objectives of the solicitation and to NASA interests and programs in general:
1. The bootstrapped knowledge graph will empower computer systems that in many aspects surpass the analytical capabilities that a typical human specialist has in their area of study. And unlike neural network models, the KG is human readable and explainable – one can use it as a reference data structure, correct it, govern it, publish it, etc. This is expected to allow more automation in scientific data management and to lower preparation efforts needed for AI projects across NASA that pertain to heliophysics and space weather.
2. By demonstrating the application of machine-actionable knowledge representation via showcase notebooks, the utility of semantic formalisms will be more accessible and concrete to both heliophysics / space weather specialists and to a broader community of nonspecialists that may interoperably interface with this field’s open datasets.
Files
topst-helio-sciencecore-submitted-sow.pdf
Files
(148.8 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:d4cef2e5a3d778d64b68041b8e49f499
|
148.8 kB | Preview Download |