Presentation Open Access
The international wheat community has embraced the omics era and is producing larger and more heterogeneous datasets at a rapid pace in order to produce better varieties via breeding programmes. These programmes, especially in the pre-breeding space, have encouraged wheat communities to make these datasets available more openly. This is a positive step, but the consistent and standardised collection and dissemination of data based on rich metadata remains difficult, as so much of this information is stored in papers and supplementary information. Furthermore, whilst ontologies exist for data descriptions, e.g. the Environmental Factor Ontology, the Crop Ontology, etc., use of these ontology terms to annotate key development characteristics across disparate data generation methods and growing sites is rarely routine or harmonised. Therefore, we built Grassroots, an infrastructure including portals to deliver large scale datasets with semantically marked-up metadata to power FAIR data in crop research. As part of the UK Designing Future Wheat (DFW) programme, we generate a variety of data ranging from field trial experimental information, sequencing data and phenotyping images, through to molecular biology data about host and pathogen interactions, nitrogen use efficiency, and other key treatment factors. As such, there is an increasing need to be able to manage this data and its metadata to allow for consistent, easy dissemination and integration with other datasets and within analytical tools and workflows. We decided that Frictionless Data was the right framework to use due to its ease of use and open standards, so we developed open source tools for FAIR data sharing to automatically generate Frictionless Data Packages for these datasets on both our Apache/iRODS and CKAN portals.