FAIRtracks and Omnipy – FAIRtracks interoperability story
- 1. ELIXIR Norway
- 2. Centre for Bioinformatics, University of Oslo
Description
Presentation of the FAIRtracks project and the Omnipy Python library, part of a presentation series on "ELIXIR Interoperability stories" for the ELIXIR Interoperability Platform, in a meeting March 7, 2023. FAIRtracks is a metadata model and related service infrastructure developed to help FAIRify genomic annotation datasets. Omnipy was developed in the context of the FAIRtracks project as a means to develop scalable and maintainable metadata transformation flows. This presentation has a particular focus on Omnipy as it is a general Python library that is relevant for use in a range of contexts where there is a need for structured functionality for (meta)data wrangling and transformation.
FAIRtracks started as a proof–of–concept implementation study funded by ELIXIR, the pan–European research infrastructure for biological information, and is now recognized as an ELIXIR Recommended Interoperability Resource. Starting from the point of view of researchers, we recognized the major practical difficulties in locating genomic annotation/track data relevant to the specific analytical contexts, despite the major efforts from larger consortia and smaller research projects to make their data public through repositories, data portals, genome browsers and track hubs. We developed the FAIRtracks infrastructure as a proposed solution to these issues. At the core is a set of schemas proposed as a metadata exchange standard. Around this, we built a set of services and tools, including a central search service, a validation service and a library for building and deploying scalable data flows to continuously transform metadata from various sources.
Omnipy is a Python library for type-driven data wrangling and scalable data flow orchestration. It simplifies creation and deployment of (meta)data transformation processes, emphasizing FAIRification, executable metadata crosswalks and data brokering. Omnipy aids generic data tasks like extraction from multiple sources, (meta)data mapping and systematic data model transformations. Embracing a "parse, don't validate" approach, omnipy uses Python type hints and pydantic for data consistency and adherence to specific models. Highly modular, it allows integration of (meta)data transformation steps dependent on software like pandas and R. The architecture supports pluggable execution engines like Prefect, a highly interoperable industry-developed Open Source data flow orchestration engine. Envisioned as a community effort, omnipy aims to offer various import formats and data models (both hierarchical and tabular representations), as well as ontology mapping and semiautomatic data cleaning.
FAIRtracks was developed as a collaboration between ELIXIR Norway, ELIXIR Spain and EMBL-EBI, while Omnipy is (for now) mainly funded by ELIXIR Norway and the University of Oslo.
Files
2023.03 FAIRtracks + Omnipy.pdf
Files
(10.8 MB)
Name | Size | Download all |
---|---|---|
md5:f0b4dfa484e0c7ca9820cfcda942c798
|
10.8 MB | Preview Download |
Additional details
Related works
- Describes
- Publication: 10.12688/f1000research.28449.1 (DOI)
Dates
- Other
-
2023-03-07Presented