Project deliverable Open Access
Hospital, Adam; Montras, Anna; Soiland-Reyes, Stian; Bonvin, Alexandre; Melquiond, Adrien; Gelpí, Josep Lluís; Lezzi, Daniele; Newhouse, Steven; Dianes, Jose A.; Abraham, Mark; Apostolov, Rossen; Ippoliti, Emiliano; Carter, Adam; White, Darren J.
This deliverable describes the state of the art and gives a technological gap analysis in the portable environments for computing and data resources of BioExcel.
We review the commonly used technologies for computational infrastructures, a selection of workflow managers for computational biology and three important repositories for biomolecular data. We then provide a catalogue of tools that are supported by BioExcel partners, which will become the building blocks used in the pipelines and transversal workflow units of our pilot use cases.
We then describe the seven BioExcel pilot use cases. To help identify potential issues in developing the corresponding pipelines, the use cases have been individually described and analyzed, focusing on the set of functionalities (from the tool catalogue and elsewhere) that form a complete workflow. Interoperability between building blocks and data models are explored using workflow diagrams. Finally, we summarize the technological gaps for each use case.
We analyzed the user feedback from WP3 to highlight key focus areas for BioExcel's future work. From the initial WP3 survey together with previous HADDOCK and GROMACS surveys we identified three main areas of potential user interest: Interoperability, usability and remotely accessible tools. For the interoperability issue, we found that the need for manual interaction needs to be reduced, for instance by incorporating workflow managers to integrate processes and input/output data. For the usability part, we found that improvements could be made to the main codes (GROMACS, HADDOCK and CPMD) to ease their usage, such as web portals providing assistance on how to run, install or use advanced configuration options. Finally, we realized that a high number of users would be interested in using remote tools, although several concerns have been raised about this, namely data privacy, reliability, and lack of control.
Based on the analysis of the pilot use cases and the user survey, we present a summary of the identified technological gaps in section 5 “Global observations”.
The final section of the deliverable describes the immediate future technology roadmap presenting how BioExcel will utilize cloud infrastructure, develop workflow building blocks and provide a tool deployment system integrated with EGI and ELIXIR services. The initial setup will consist in the deployment of software blocks to perform the most commonly demanded operations, as gathered from Use Case analysis. These blocks and workflows will be deployed, tested and verified in the already available Barcelona Supercomputing Center (BSC) cloud infrastructure, and eventually transferred to the production BioExcel’s portal hosted at the European Bioinformatics Institute (EMBL-EBI).