Project deliverable Open Access
Candela, Leonardo; Cirillo, Roberto; Coro, Gianpaolo; Lelii, Lucio; Pagano, Pasquale; Panichi, Giancarlo; Scarponi, Paolo; Sinibaldi, Fabio
Deliverable D3.1 “Open Science Data Analytics Technologies” is a deliverable of type Demonstrator meaning that it manifests in artefacts (software releases) other than reports. In particular, the deliverable is about the software realising the Data Analytics & Processing Layer of the AGINFRA+. This software is part of a large software system named gCube (www.gcube-system.org). The gCube system offers a large array of services supporting the entire lifecycle underlying a research activity (data management and collation, analytics, collaboration, sharing) and the possibility to combine these services in Virtual Research Environments. In the context of AGINFRA PLUS the following gCube components have been primarily exploited, consolidated and enhanced to serve the analytics needs arising in the context of the project use cases. DataMiner, i.e. a service enacting its users to perform data analytics tasks by relying on an array of analytics methods and a distributed and heterogeneous computing infrastructure. This service is available by a web-based GUI as well as via a web-based API based on the OGC WPS standard. SAI (Statistical Algorithm Importer), i.e. a service enacting its users to make available their own analytics methods via the DataMiner service. In addition to that, the entire analytics solution made available for AGINFRA PLUS cases counts on (i) a shared workspace realising a cloud-based file manager for managing content of interest and sharing this content with co-workers, (ii) a social networking area enabling users to post messages and have discussions, (iii) a flexible catalogue enabling to publish and discover items of interest including “research objects” resulting from an analytics task. This technology is deployed in its latest version in every Virtual Research Environment supporting AGINFRA PLUS cases. The major enhancements to the technology pertaining to AGINFRA PLUS have been included in three gCube major releases 4.7 (October 2017), 4.8 (November 2017), and 4.9 (under production).In particular, with these releases a new “black-box” oriented approach (https://wiki.gcubesystem.org/gcube/Statistical_Algorithms_Importer:_Java_Project#Black_Box_Integration) has been envisaged and implemented to enact analytics method owners and developers to easily integrate their solutions into the DataMinerservice. Among the supported black-box typologies there is that for KNIME workflows, i.e. analytics methods implemented by a KNIME workflow. KNIME is among the key technologies supporting the Food Safety Risk Assessment cases. In order to enact the execution of KNIME-based black-boxes, the distributed computing part of the data analytics platform has been extended to integrate the KNIME execution engine. Other cases are counting on the same mechanism to integrate entire applications (WOFOST) as well as Python-based methods.
The deliverable has been created on the public space of the AGINFRA+ Wiki and is accessible through the following link:
D3.1 Open Science Data Analytics Technologies I V2.pdf