Proposition of a Novel Type of Attacks Targetting Explainable AI Algorithms in Cybersecurity [preprint]
Description
Artificial intelligence systems, particularly those utilizing deep neural networks, are increasingly integrated into critical sectors like healthcare, finance, and security. These systems’ reliance underscores the importance of explainable AI, commonly known as xAI, which aims to explain the decision-making processes of AI models. However, xAI’s integrity is challenged by adversarial attacks that manipulate explanations without affecting the model’s output. This paper introduces a novel method to generate such attacks, highlighting their potential to mislead operators and compromise system trustworthiness. By detailing the susceptibility of xAI to these explanation attacks, the study emphasizes the need for robustness in xAI methods. It proposes that ensuring the fidelity of explanations under adversarial conditions is crucial for maintaining the transparency and reliability of AI systems. Through this work, the authors aim to advance understanding of security vulnerabilities in xAI and contribute to the development of AI systems that are both transparent and resilient against adversarial threats. This investigation into the phenomenon of explanation attacks opens a new possibility for enhancing the security protocols surrounding AI systems in sensitive applications.
---
Disclaimer:
This is a preprint version of the article.
The content here is for view-only purposes. This is not the final published version and may differ from the version of record.
Please refer to the official version for citation and authoritative use.
Files
ZENODO__Proposition_of_a_Novel_Type_of_Attacks_Targetting_Explainable_AI_Algorithms_in_Cybersecurity-2.pdf
Files
(726.3 kB)
| Name | Size | Download all |
|---|---|---|
|
md5:b7712fb270a89cda0d0dcecf859d599b
|
726.3 kB | Preview Download |
Additional details
Funding
Dates
- Issued
-
2025-04-01