There is a newer version of the record available.

Published November 5, 2020 | Version v0.12.0
Software Open

esowc/ecPoint-Calibrate: v0.12.0

  • 1. Ledger
  • 2. @ecmwf @esowc

Description

Introducing Cheaper mode

This release introduces a new mode called Cheaper, for efficiently accessing large point-data tables. There's no functional difference if this mode is enabled. It works for both ASCII files as well as the recently introduced Parquet format. It can be toggled from the GUI in the post-processing workflow (disabled by default).

<img width="1139" alt="Screenshot 2020-11-05 at 02 34 45" src="https://user-images.githubusercontent.com/3684187/98186622-79996a80-1f0f-11eb-9294-00e51f923e8c.png">

When Cheaper is activated, the point-data loaders use a modified decision-tree algorithm that lazy-loads only the required columns, as opposed to holding the entire DataFrame in memory. This is particularly useful for very large point-data files that may not fit in the user's RAM. Note that compared to ASCII files, the performance of Parquet should be much better since it is designed for columnar storage.

⚠️ The modified algorithm has been carefully reviewed for correctness, although there may be small bugs due to the magnitude of the changes.

Performance

Evaluating a decision-tree is slower when the cheaper mode is activated. This is due to frequent disk I/O which is essential for keeping the memory usage minimal.

Memory usgae

In my tests with an ASCII point-data table of size 584 MB on disk, I got a significant reduction in the consumption of memory by the backend process.

Cheaper disabled Cheaper enabled 1.8 GB 380 MB Summary of changes
  • Implement cheaper algorithm to evaluate decision tree. (9d9a9c6, fixes #112 and #116)
  • Interpret CSV as ASCII file. (81b1260; fixes #120)
  • Fix sanitization of Docker paths. (maybe fixes #119)
  • Add /tmp, /var/tmp, and /scratch to default volume bindings.
    • Users can now read/write in these directories.
  • Fix a bug with displaying of computation logs in the GUI. (d3554b2)

Files

esowc/ecPoint-Calibrate-v0.12.0.zip

Files (38.2 MB)

Name Size Download all
md5:4e7abd54f195d52bd3536945b9189211
38.2 MB Preview Download

Additional details