Published May 31, 2024 | Version v1
Journal article Open

Data Poisoning - what is it and how it is being addressed by the leading Gen AI providers?

Description

Data poisoning is a serious threat to machine learning models, wherein malicious actors introduce corrupt input into the training data to skew model behavior, potentially leading to biased decision-making and reduced system reliability. Various types of data poisoning attacks exist, including targeted attacks, non-targeted attacks, label poisoning, training data poisoning, model inversion attacks, stealth attacks, and backdoor poisoning. Detecting and mitigating these attacks require close attention to model degradation patterns, securing training data, and employing advanced verification methods. Major AI companies such as OpenAI, Microsoft, Google, and Meta have developed protective mechanisms against data poisoning, providing valuable guidance for organizations leveraging AI technologies. Best practices for reducing data poisoning risks include data validation and sanitization, red teaming, secure data handling, negative testing, and benchmark testing. Collaboration among developers, MLOps communities, and security teams is crucial for robust AI system construction, requiring diligent efforts in data integrity assurance, cross-functional communication, education, and continuous improvement of testing and validation layers. Emphasis on strong defense mechanisms and ongoing innovation will support the growth and safe application of AI across diverse industries. The purpose of this article is to delve into these topics in depth and offer guidance for individuals and organizations working with data for machine learning.

Files

EJAET-11-5-105-109.pdf

Files (206.3 kB)

Name Size Download all
md5:982119583cb65b994c911a0f3a76530d
206.3 kB Preview Download

Additional details

References