D2KAB project taking off: Data to Knowledge in Agronomy and Biodiversity
Agronomy/agriculture and biodiversity (ag & biodiv) communities face several major societal, economic, and environmental challenges that data science approaches will help address. To achieve their goals, researchers of these communities must be able to rapidly discover, aggregate, integrate, and analyse different types of data and information sources. Semantic technologies, combined to open, FAIR data and services, is one of the answers to fully knowledge-driven, and transparent science and innovation. The D2KAB project (www.d2kab.org) aims to create a framework to turn agronomy and biodiversity data into knowledge – semantically described, interoperable, actionable, open – and investigate the scientific methods and tools to exploit this knowledge for applications in agriculture and biodiversity sciences. This project, funded by French ANR (2019-2023), will provide the means –ontologies and linked open data– for ag & biodiv to embrace semantic Web technologies in order to produce and exploit FAIR data and services. To do so, D2KAB will develop new original methods and algorithms in the following areas: data integration, text mining, semantic annotation, ontology alignment and linked data exploitation and visualization.
D2KAB project brings together a unique multidisciplinary consortium of 12 partners to achieve this objective: 2 informatics research units (LIRMM, I3S); 6 INRA/IRSTEA/IRD research units at the interface of computer science and ag & biodiv (URGI, MaIAGE, IATE, DIST, TSCF, DIADE) specialized in agronomy or agriculture; 2 labs in biodiversity and ecosystem research (CEFE, URFM); 1 association of agriculture stakeholders (ACTA); and 1 partnership with Stanford BMIR department.
Three main goals drive D2KAB’s roadmap:
- To develop state-of-the-art methods and technologies for ontology lifecycle and alignment.
- To build the agronomy, agriculture and biodiversity Linked Open Data cloud.
- To enable new semantically driven agronomy and biodiversity science.
The work is starting from the recommendations of several RDA WG and IG already published or in progress (e.g. Agrisemantic WG, Vocabulary Services IG, Wheat and Rice Data Interoperability WGs, Agricultural Data IG, SHARC IG). Some of the key technological building blocks of D2KAB are AgroPortal, a reference repository for ontologies and vocabularies in agronomy; AgroLD, a semantic Web knowledge base that integrates agronomic data from public databases including GO associations, Gramene, UniprotKB, and OryGenesDB ; Corese, a semantic Web factory that implements the W3C standards RDF, RDFS, OWL-RL and SPARQL, and LDScript, a Linked Data Script Language, and STTL, the SPARQL Template Transformation Language for RDF; and Alvis, a text mining for semantic normalisation of free text by ontologies. D2KAB will allow the valorization of ag & biodiv data into real world applications leading to economic impact, smart agriculture and ecological preservation. Five driving scenarios are planned:
- development of an ontology-based expert system to select food packaging solutions;
- creation of an augmented semantic reader for Plant Health Bulletins; advanced integration of textual and experimental data on wheat phenotypes;
- development of new ontologies on plant root traits and extension of the Thesaurus Of Plant Characteristics;
- integration of plant functional biogeography data related to the Mediterranean Basin.
Each of the project scenarios will have a significant impact and produce concrete outcomes for ag & biodiv scientific communities and socio-economic stakeholders in agriculture.