Published September 25, 2023 | Version v1
Journal article Open

TERSE/PROLIX (TRPX) – a new algorithm for fast and lossless compression and decompression of diffraction and cryo-EM data

  • 1. Biozentrum, University of Basel, Basel, Basel-Stadt, Switzerland
  • 2. Biozentrum, University of Basel, Basel, Basel-Stadt, Switzerland, Laboratory of Nanoscale Biology, Paul Scherrer Institute, Villigen, Switzerland

Description

High-throughput data collection in crystallography poses significant challenges in handling massive amounts of data. Here, TERSE/PROLIX (or TRPX for short) is presented, a novel lossless compression algorithm specifically designed for diffraction data. The algorithm is compared with established lossless compression algorithms implemented in gzipbzip2, CBF (crystallographic binary file), Zstandard(zstd), LZ4 and HDF5 with gzipLZF and bitshuffle+LZ4 filters, in terms of compression efficiency and speed, using continuous-rotation electron diffraction data of an inorganic compound and raw cryo-EM data. The results show that TRPX significantly outperforms all these algorithms in terms of speed and compression rate. It was 60 times faster than bzip2 (which achieved a similar compression rate), and more than 3 times faster than LZ4, which was the runner-up in terms of speed, but had a much worse compression rate. TRPX files are byte-order independent and upon compilation the algorithm occupies very little memory. It can therefore be readily implemented in hardware. By providing a tailored solution for diffraction and raw cryo-EM data, TRPX facilitates more efficient data analysis and interpretation while mitigating storage and transmission concerns. The C++20 compression/decompression code, custom TIFF library and an ImageJ/Fiji Java plugin for reading TRPX files are open-sourced on GitHub under the permissive MIT license.

Files

lu5031.pdf

Files (387.6 kB)

Name Size Download all
md5:85987dd1d9cf1efa89cf0da062b706a0
387.6 kB Preview Download

Additional details

Funding

NanED – Electron Nanocrystallography 956099
European Commission
Single molecule electron diffraction 205320_201012
Swiss National Science Foundation