Presentation Open Access
Slides from csvconf,v6 presentation, "Static datasets aren't enough: where deployed systems differ from research" by Bernease Herman (WhyLabs).
The focus on static datasets in machine learning and AI training fails to translate to how these systems are being deployed in industry. As a result, data scientists and engineers aren't considering how these systems perform in changing, real world environments nor the feedback mechanisms and societal implications that these systems can cause. In the session, we will highlight existing tools that work with dynamic (and perhaps streaming) data. We will suggest some preliminary studies of activities and lessons that may bridge the gap in data science training for realistic data.The goal of the talk is to:
- Point to resources for AI practitioners to engage with dynamic datasets
- Engage in discussion about the impact of feedback loops and other consequences on the real world
- Brainstorm new approaches to teaching skills on dynamic datasets