Published April 21, 2026 | Version v1

mllm-shap: A Shapley Value Explainability Platform for Text-Audio Multimodal Large Language Models

  • 1. ROR icon Warsaw University of Technology
  • 2. ROR icon Systems Research Institute

Description

We present mllm-shap, an open-source Python platform for researchers and ML practitioners that extends Shapley value (SV) explainability from text-only large language models to multimodal LLMs (MLLMs) that jointly process text and audio. Building on the token-level SV framework introduced by TokenSHAP, mllm-shap addresses three challenges absent in the text-only setting: (1) modality-aware coalition masking that handles the coexistence of text tokens and dense audio encoder frames within a single input, (2) multi-turn conversation tracking with per-token role and modality metadata, and (3) audio token grouping via phonetic alignment that reduces the coalition space by 10–50×. The platform ships as a pip-installable package implementing five SV estimation strategies – including a Complementary Contributions estimator with Neyman-optimal allocation that outperforms Monte Carlo baselines – together with an interactive web GUI for real-time attribution visualization. To our knowledge, mllm-shap is the first publicly available framework for complete, reproducible SV-based explainability of text-audio MLLMs. The package is MIT-licensed with full source code on GitHub and a demonstration video included as supplementary material.

Files

Association_for_Computational_Linguistics__ACL__conference.pdf

Files (1.6 MB)

Additional details

Software

Repository URL
https://github.com/Pawlo77/MLLM-Shap
Programming language
Python
Development Status
Active