Project deliverable Open Access
de Villiers, Hendrik; van Zedde, Rick; Barth, Ruud; Pridmore, Tony
Plant phenotyping experiments are increasingly generating vast amounts of data. The question of how to organize this data quickly arises. However, choosing a means of organizing such data is a non-trivial matter, as many solutions are possible. This has led to a lack of standardization in the field. This hampers the use and reuse of data, as users of a dataset have to write/adapt their own code to account for the particular structure of a new dataset. Furthermore, often dataset formats do not enable the detailed recording of metadata such as provenance information, potentially leading to a lack of transparency around the activities and structures within which the dataset was collected.
In this document, we consider the question of dataset organization within the context of plant phenotyping, with a special emphasis on computer vision data. A mapping/gapping analysis is presented based on summaries of interviews with the imaging experts involved in setting up/controlling the imaging pipeline within phenotyping research infrastructure in Europe, to understand their way of working and their expectations from EMPHASIS-PREP.
Subsequently, we discuss PHIS (Phenotyping Hybrid Information System), a prominent community effort specialized in the organization of plant phenotyping datasets. The discussion includes core semantic web concepts, as well as an overview of PHIS’ approach to organizing information. Annexes are included which further expand on the information presented in the main matter, allowing the interested reader to learn more and experiment with using PHIS. This includes the uploading of a case study dataset containing hyperspectral image data. In addition, we consider the question of how to best preserve PHIS’ ability to capture provenance metadata while connecting the system to subsequent machine learning workflows (with a special emphasis on deep learning).