Published April 4, 2022 | Version v1
Plot Open

Python file loading time comparison across filetypes

Creators

  • 1. Leiden Observatory

Description

A quick test of loading times of various file formats that can be used to store image data in B x N x M shape (B: number of bands, N, M: number of pixels on both axes). The Jupyter Notebook used to obtain these results is provided.

 

Methods

I used Python 3.6.15 with the following packages:

  • astropy 4.1
  • numpy 1.19.5
  • h5py 3.1.0
  • hdf5 1.10.6
  • matplotlib 3.3.4
  • tqdm 4.62.3
  • _pickle (version packaged with Python 3.6.15)

The tests were run on Fedora 35 with an Intel© Xeon© W-1250 CPU @ 3.30GHz × 6, 16 GB of RAM and the data stored on 3 disks (HGST WD Ultrastar HUS726T4TALE6L4) in a RAID 5 configuration.

 

Images are generated in (3, 64, 64) shape with pixel values drawn from a normal distribution. Two tests are run, both on 1000 images in total. For the first, those images are saved individually to the four file formats in question: .fits, .npy, .h5 and .pkl and then read one by one, and the operation np.mean() is applied (to prevent any memcaching). In the second test, they are saved in batches of 64, resulting in 64 images of (3, 64, 64) per file. This yields the second plot.

 

Results

When saving images individually, the .pkl files are loaded the quests at 0.077 ms per file for loading + running np.mean. .npy files take 3.0 times longer, .h5 files 6.1 times longer and .fits files 7.7 times longer. This changes when loading the batched files. Loading + applying np.mean is then fastest with .npy files at 0.034 ms per (3, 64, 64) data unit (the loading file of the batched file divided by the batch size), then .fits 1.4 times longer, .pkl 2.0 times longer and .h5 at 2.1 times longer.

 

Conclusion

The .fits format widely used in astronomy has a long loading time for individual files, most likely due to the overhead caused by reading the header. It is however one of the fastest file formats when saving images in batches. This should therefore be considered when storing large numbers of images.

Files

file_loading_test.ipynb

Files (419.5 kB)

Name Size Download all
md5:38841ad0a0cc7d4e2c3b274cbea47fdf
156.4 kB Preview Download
md5:c9f5932eed052a3b87b33b06a20cee83
132.7 kB Preview Download
md5:a38662d88579f160b5f185de1bc1b3fe
130.4 kB Preview Download