Published June 28, 2023 | Version v1
Presentation Open

Collective Mind: toward a common language to facilitate reproducible research and technology transfer

  • 1. cTuning foundation
  • 2. MLCommons

Description

 

You can cite this project using the following ArXiv paper: https://arxiv.org/abs/2406.16791 .

 

This is a keynote presentation from the author of the Collective Mind framework at the 1st ACM conference on reproducibility and replicability (ACM REP'23).

Video ACM YouTube channel ] [ GitHub project ] [ Related reproducibility initiatives ]

The Collective Mind framework (CM & CM4MLOps) was developed by Grigori Fursin and donated to MLCommons to benefit everyone and continue further developments as a collaborative community initiative.

Abstract

During the past 10 years, we have considerably improved the reproducibility of experimental results from published papers by introducing the artifact evaluation process with a unified artifact appendix and reproducibility checklists, Jupyter notebooks, containers, and Git repositories. On the other hand, our experience reproducing more than 150 papers shows that it can take weeks and months of painful and repetitive interactions between teams to reproduce artifacts. This effort includes decrypting numerous README files, examining ad-hoc artifacts and containers, and figuring out how to reproduce computational results. Furthermore, snapshot containers pose a challenge to optimize algorithms' performance, accuracy, power consumption and operational costs across diverse and rapidly evolving software, hardware, and data used in the real world.

In this talk, I explain how our practical artifact evaluation experience and the feedback from researchers and evaluators motivated me to develop a simple, intuitive, technology agnostic, and English-like scripting language called Collective Mind (CM) with a collection of automation recipes for MLOps, DevOps and MLPerf (CM4MLOps repository with CM scripts). It helps to automatically adapt any given experiment to any software, hardware, and data while automatically generating unified README files and synthesizing modular containers with a unified API. 

I donated CM and CM4MLOps to MLCommons as a part of my Collective Knowledge project to continue developing it as a collaborative community initiative. My long term goal is to help the community facilitate reproducible AI/ML Systems research,  minimize manual and repetitive benchmarking and optimization efforts, reduce time and costs for reproducible research, simplify technology transfer to production, and learn how to co-design more efficient and cost-effective AI systems. I also present several recent use cases of how CM helps MLCommons and the Student Cluster Competition to run complex MLPerf benchmarks, and artifact evaluation at ACM/IEEE conferences to make it easier to reproduce results from research papers. I conclude with our development plans, new challenges, possible solutions, and upcoming reproducibility and optimization challenges powered by the Collective Knowledge Playground and CM: access.cKnowledge.org.

I would like to thank all CK and CM contributors for their help and support since 2014!

Please check this white paper for more details: https://arxiv.org/abs/2406.16791.

Files

presentation.pdf

Files (6.1 MB)

Name Size Download all
md5:b4167b107b11d737ae75520861b5496f
6.1 MB Preview Download

Additional details

Related works

Is described by
Report: 10.48550/arXiv.2406.16791 (DOI)
Is new version of
Journal: 10.1098/rsta.2020.0211 (DOI)

Software

Repository URL
https://github.com/mlcommons/ck
Development Status
Concept