Python file loading time comparison across filetypes

doi:10.5281/zenodo.6411883

Published April 4, 2022 | Version v1

Plot Open

Python file loading time comparison across filetypes

J.W.F. Mes¹

1. Leiden Observatory

A quick test of loading times of various file formats that can be used to store image data in B x N x M shape (B: number of bands, N, M: number of pixels on both axes). The Jupyter Notebook used to obtain these results is provided.

Methods

I used Python 3.6.15 with the following packages:

astropy 4.1
numpy 1.19.5
h5py 3.1.0
hdf5 1.10.6
matplotlib 3.3.4
tqdm 4.62.3
_pickle (version packaged with Python 3.6.15)

The tests were run on Fedora 35 with an Intel© Xeon© W-1250 CPU @ 3.30GHz × 6, 16 GB of RAM and the data stored on 3 disks (HGST WD Ultrastar HUS726T4TALE6L4) in a RAID 5 configuration.

Images are generated in (3, 64, 64) shape with pixel values drawn from a normal distribution. Two tests are run, both on 1000 images in total. For the first, those images are saved individually to the four file formats in question: .fits, .npy, .h5 and .pkl and then read one by one, and the operation np.mean() is applied (to prevent any memcaching). In the second test, they are saved in batches of 64, resulting in 64 images of (3, 64, 64) per file. This yields the second plot.

Results

When saving images individually, the .pkl files are loaded the quests at 0.077 ms per file for loading + running np.mean. .npy files take 3.0 times longer, .h5 files 6.1 times longer and .fits files 7.7 times longer. This changes when loading the batched files. Loading + applying np.mean is then fastest with .npy files at 0.034 ms per (3, 64, 64) data unit (the loading file of the batched file divided by the batch size), then .fits 1.4 times longer, .pkl 2.0 times longer and .h5 at 2.1 times longer.

Conclusion

The .fits format widely used in astronomy has a long loading time for individual files, most likely due to the overhead caused by reading the header. It is however one of the fastest file formats when saving images in batches. This should therefore be considered when storing large numbers of images.

Files

file_loading_test.ipynb

Files (419.5 kB)

Name	Size	Download all
file_loading_test.ipynb md5:38841ad0a0cc7d4e2c3b274cbea47fdf	156.4 kB	Preview Download
Filetypes_loading_times_batch_operation.png md5:c9f5932eed052a3b87b33b06a20cee83	132.7 kB	Preview Download
Filetypes_loading_times_single.png md5:a38662d88579f160b5f185de1bc1b3fe	130.4 kB	Preview Download

	All versions	This version
Views	36	36
Downloads	2	2
Data volume	393.4 kB	393.4 kB

Python file loading time comparison across filetypes

Creators

Description

Files

file_loading_test.ipynb

Files (419.5 kB)