Presentation Open Access
14 March 2017, CNRS webinar, Grenoble, France (original slides were shared here).
A decade ago my research nearly stalled. I was investigating how to crowdsource performance analysis and optimization of realistic workloads across diverse hardware provided by volunteers and combine it with machine learning . Often, it was simply impossible to reproduce crowdsourced empirical results and build predictive models due to continuously changing software and hardware stacks. Worse still, lack of realistic workloads and representative data sets in our community severely limited the usefulness of such models.
All these problems motivated me to create a public portal (cTuning.org) to share, validate and reuse workloads, data sets, tools, experimental results, and predictive models while involving the community in this effort . This experience, in turn, helped us to initiate the so-called Artifact Evaluation (AE) at ACM conferences on parallel programming, architecture and code generation (ASPLOS, CGO, PPoPP, PACT, SC and MLSys). AE aims to independently validate experimental results reported in the publications and to encourage code and data sharing.
These slides are from my webinar “Enabling open and reproducible research at computer systems conferences: the good, the bad and the ugly” at CNRS Grenoble (14 March 2017). I shared my practical experience organizing Artifact Evaluation over the past years, along with encountered problems and possible solutions.
On the one hand, we have received incredible support from the research community, ACM, universities, and companies. We have even received a record number of artifact submissions at the CGO/PPoPP'17 AE (27 vs 17 two years ago) sponsored by NVIDIA and the cTuning foundation. We have also introduced Artifact Appendices and co-authored the new ACM Result and Artifact Review and Badging policy now used at Supercomputing.
On the other hand, the use of proprietary benchmarks, rare hardware platforms, and totally ad-hoc scripts to set up, run and process experiments all place a huge burden on evaluators. It is simply too difficult and time-consuming to customize and rebuild experimental setups, reuse artifacts and eventually build upon others’ efforts - the main pillars of open science!
I then present Collective Knowledge (CK), my attempt to introduce a customizable workflow framework with a unified JSON API and a cross-platform package manager, that can automate ML&systems R&D and enable live papers while automatically adapting to continuously evolving software and hardware . I also demonstrate a practical CK workflow to collaboratively optimize deep learning across different compilers, libraries, data sets and diverse platforms from resource-constrained mobile devices to data centers (see our Android app to crowdsource DNN optimization across diverse mobile devices provided by volunteers, and the public repository with results) .
Finally, I describe our novel publication model to reproduce results from published papers with the help of the community .
Please feel free to contact me at Grigori.Fursin@cTuning.org if you have any questions or comments! I am looking forward to your feedback!