Five Queries Are Enough: Query-Efficient and Surrogate-Free Membership Inference Attacks on RAG via Entailment

Nguyen, Bao

doi:10.5281/zenodo.20335971

Published May 22, 2026 | Version v1

Conference paper Open

Five Queries Are Enough: Query-Efficient and Surrogate-Free Membership Inference Attacks on RAG via Entailment

Nguyen, Bao (Researcher)¹

1. Swinburne University of Technology

This artifact provides the full Python implementation and experimental pipeline for MEntA (Membership Entailment Attack), a query-efficient, surrogate-free membership inference attack on black-box Retrieval-Augmented Generation (RAG) systems, together with four baseline attacks and the RAG defenses evaluated in the paper (Five Queries Are Enough: Query-Efficient and Surrogate-Free Membership Inference Attacks on RAG via Entailment). It includes the following components:

Experiment driver scripts (./scripts/): Container-friendly bash pipelines that reproduce the paper experiments end-to-end and write outputs under ./results/. Specifically:

setup_data.sh — one-off data preparation: downloads BEIR datasets (NFCorpus, SCIDOCS, TREC-COVID), builds FAISS indices, and generates per-document summaries;
run_menta_pipeline.sh — runs the proposed MEntA attack (queries → retrieval → RAG → entailment → evaluation);
run_ia_pipeline.sh — Interrogation Attack (IA-MIA) baseline;
run_s2mia_pipeline.sh — S2-MIA baseline;
run_mba_pipeline.sh — Mask-Based Attack (MBA) baseline;
run_dcmi_pipeline.sh — DCMI baseline;
run_input_detection_defenses.sh — GPT-4 and Mirabel input-detection defenses (run after attack pipelines).

Each pipeline supports resume/skip of completed stages and optional --force re-runs. Output-modification defenses (DP, re-ranking, instruction prompts, paraphrasing) are applied via --defense on the RAG-output scripts (see README).

Core implementation (./MEntA/, ./IA-MIA/, ./S2-MIA/, ./MBA/, ./DCMI/, ./utils/, ./defense/): Each attack folder implements the same staged pipeline (query generation, dense retrieval, RAG output, scoring, evaluation) with method-specific entry points (e.g., MEntA/generate_queries.py, retrieve.py, generate_rag_output.py, compute_entailment.py, evaluate.py). ./utils/ contains shared helpers for BEIR download, FAISS indexing, HuggingFace model loading, OpenAI batch calls, and NLI splitting. ./defense/ implements input-detection and query-paraphrase modules.

Datasets (./data/, populated by setup_data.sh): Preprocessed BEIR corpora for NFCorpus, SCIDOCS, and TREC-COVID (member/non-member splits), FAISS indices (e.g., sentence-transformers/all-mpnet-base-v2 and thenlper/gte-large), and per-document summary files used by MEntA retrieval boosting. This archive ships with ./data/ already populated so experiments can run without re-downloading corpora or rebuilding indices. To refresh or extend datasets (e.g., additional retrievers), use scripts/setup_data.sh as documented in README.md.

Runtime environment (./Apptainer.def, requirements.txt, .env.example): Apptainer/Singularity recipe (CUDA 12.1, Python 3.9) and pinned dependencies for reproducible execution.

Result artifacts (./results/): Stores per-method, per-dataset outputs (queries, retrieval JSON, RAG answers, entailment/similarity scores, and evaluation metrics). Pipelines skip stages when expected output files already exist.

For repository layout, configuration, defense options, MEntA ablations (similarity scoring, generic queries, summary-only RAG), and step-by-step reproduction, see README.md in the repository root.

Files

menta-main.zip

Files (210.8 MB)

Name	Size	Download all
menta-main.zip md5:5fc04103c9792d2657c06fb65672762d	210.8 MB	Preview Download

Additional details

Available: 2026-05-22

Programming language: Python
Development Status: Active

	All versions	This version
Views	44	44
Downloads	7	7
Data volume	1.5 GB	1.5 GB

menta-main.zip

Files (210.8 MB)

Dates

Software

Five Queries Are Enough: Query-Efficient and Surrogate-Free Membership Inference Attacks on RAG via Entailment

Authors/Creators

Description

Files

menta-main.zip

Files (210.8 MB)

Additional details

Dates

Software