Published October 25, 2024 | Version v1
Poster Open

A Proposed Approach for Certifying Machine Learning Pipelines Containing Multiple Trust Objectives

Description

We propose to develop methods for certifying Machine Learning (ML) models by optimizing multiple trust and decision objectives. By combining measures of trustworthiness with domain- specific decision objectives, we aim to establish the criteria necessary to meet exacting standards for high-consequence applications of ML in support of national security priorities. In accordance with Executive Order-14110, our objective is to promote the safe, secure, and trustworthy development and use of Artificial Intelligence (AI) by delivering a generalizable process that can readily inform ML certification.

Current credibility assessments for ML are developer-focused, application-specific, and limited to uncertainty quantification (UQ) and validation, which are necessary but insufficient for certification. Whereas ML developers are primarily concerned with various measurements of model accuracy, non-ML- expert stakeholders are concerned with real-world quantities such as risk, or safety; this suggests that a more holistic technical basis is needed. With multiple objectives, decisions may only be Pareto-optimal: no objective can be improved without making another worse. Designing certification processes with multi-objective design optimization allows for the balancing of requirements for specific applications.

The absence of evidence-based, generalizable approaches to certification restricts our ability to develop and deploy credible, mature ML models. To address this gap, we will operationalize AI risk-frameworks, such as the National Institute of Standards and Technology’s Risk Management Framework, by synthesizing important Trustworthy AI capabilities such as robustness testing and anomaly detection. To validate and refine our approach, we will engage in collaborative case studies applying our tools to real-world datasets and scenarios while gathering feedback to ensure their effectiveness and usability.

Our primary research goal is to develop a process for designing certifications for trustworthiness of ML, particularly in high-consequence applications. Drawing inspiration from the multi-tiered approach of software testing, which encompasses unit to integration tests, our strategy involves assessing trustworthiness throughout the ML development lifecycle and conducting system-level, evaluations of non-functional properties such as safety, fairness, and privacy.

If successful, our research will position ML to fulfill the requirements of high- consequence domains, as evidenced by a measurable improvement in the reliability properties of our exemplar models.

Files

TrustwrothyML-US-RSE-Poster.pdf

Files (1.9 MB)

Name Size Download all
md5:71ceffdbe1c9ddb56ff6d74dc62103ca
1.9 MB Preview Download

Additional details

Funding

Sandia National Laboratories

References

  • "Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence," 88 FR 75191 (proposed October 30th, 2023). E.O. 14110, 2023-24283. https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and- trustworthy- development-and-use-of-artificial-intelligence.
  • Giray, Görkem. "A software engineering perspective on engineering machine learning systems: State of the art and challenges." Journal of Systems and Software 180 (2021): 111031.
  • Amershi, Saleema, et al. "Software engineering for machine learning: A case study." 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 2019.
  • Lwakatare, Lucy Ellen, et al. "A taxonomy of software engineering challenges for machine learning systems: An empirical investigation." Agile Processes in Software Engineering and Extreme Programming: 20th International Conference, XP 2019, Montréal, QC, Canada, May 21–25, 2019, Proceedings 20. Springer International Publishing, 2019.
  • Carleton, Anita D., Erin Harper, John E. Robert, Mark H. Klein, Dionisio De Niz, Edward Desautels, John B. Goodenough et al. Architecting the Future of Software Engineering: A National Agenda for Software Engineering Research and Development. CARNEGIE-MELLON UNIV PITTSBURGH PA, 2021.
  • Gezici, Bahar, and Ayça Koluk¿sa Tarhan. "Systematic literature review on software quality for AI- based software." Empirical Software Engineering 27.3 (2022): 66.
  • Lenarduzzi, Valentina, et al. "Software Quality for AI: Where We Are Now?." Software Quality: Future Perspectives on Software Engineering Quality: 13th International Conference, SWQD 2021, Vienna, Austria, January 19–21, 2021, Proceedings 13. Springer International Publishing, 2021.
  • Lanubile, Filippo, et al. "Towards Productizing AI/ML Models: An Industry Perspective from Data Scientists." 2021 IEEE/ACM 1st Workshop on AI Engineering-Software Engineering for AI (WAIN). IEEE, 2021
  • Wolf, Christine T., and Drew Paine. "Sensemaking practices in the everyday work of AI/ML software engineering." Proceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops. 2020.
  • Ferrario, Andrea, Michele Loi, and Eleonora Viganò. "In AI we trust incrementally: A multi-layer model of trust to analyze human-artificial intelligence interactions." Philosophy & Technology 33 (2020): 523- 539.
  • Siau, Keng, and Weiyu Wang. "Building trust in artificial intelligence, machine learning, and robotics." Cutter business technology journal 31.2 (2018): 47-53.
  • Lee, John D., and Katrina A. See. "Trust in automation: Designing for appropriate reliance." Human factors 46.1 (2004): 50-80.
  • Schaefer, Kristin E., et al. "A meta-analysis of factors influencing the development of trust in automation: Implications for understanding autonomy in future systems." Human factors 58.3 (2016): 377- 400.
  • "Common Criteria for Information Technology Security Evaluation". ISO/IEC 15408-1:2022. https://www.iso.org /standard/72891.html
  • Anisetti, Marco, Claudio A. Ardagna, and Nicola Bena. "Multi-Dimensional Certification of Modern Distributed Systems." IEEE Transactions on Services Computing (2022).
  • NIST. "Artificial Intelligence Risk Management Framework (AI RMF 1.0)." (2023).
  • DOE AITO. "DOE AI Risk Management Playbook." https://www.energy.gov/ai/doe-ai-risk- management-playbook- airmp.