Conference paper Open Access

EnCoD: Distinguishing Compressed and Encrypted File Fragments

De Gaspari, Fabio; Hitaj, Dorjan; Pagnotta, Giulio; De Carli, Lorenzo; Mancini, Luigi V.

Reliable identification of encrypted file fragments is a requirement for several security applications, including ransomware detection, digital forensics, and traffic analysis. A popular approach consists of estimating high entropy as a proxy for randomness. However, many modern content types (e.g. office documents, media files, etc.) are highly compressed for storage and transmission efficiency. Compression algorithms also output high-entropy data, thus reducing the accuracy of entropy-based encryption detectors.

Over the years, a variety of approaches have been proposed to distinguish encrypted file fragments from high-entropy compressed fragments. However, these approaches are typically only evaluated over a few, selected data types and fragment sizes, which makes a fair assessment of their practical applicability impossible. This paper aims to close this gap by comparing existing statistical tests on a large, standardized dataset. Our results show that current approaches cannot reliably tell apart encryption and compression, even for large fragment sizes. To address this issue, we design EnCoD, a learning-based classifier which can reliably distinguish compressed and encrypted data, starting with fragments as small as 512 bytes. We evaluate EnCoD against current approaches over a large dataset of different data types, showing that it outperforms current state-of-the-art for most considered fragment sizes and data types.

Files (675.6 kB)
Name Size
675.6 kB Download
Views 27
Downloads 38
Data volume 25.7 MB
Unique views 23
Unique downloads 36


Cite as