Planned intervention: On Thursday March 28th 07:00 UTC Zenodo will be unavailable for up to 5 minutes to perform a database upgrade.

There is a newer version of the record available.

Published March 7, 2020 | Version v0.4.1
Software Open

krassowski/data-vault: v0.4.1

  • 1. University of Oxford

Description

IPython magic for simple, organized, compressed and encrypted storage & transfer of files between notebooks.

The %vault magic provides a reproducible caching mechanism for variables exchange between notebooks. The cache is compressed, persistent and safe.

Differently to the builtin %store magic, the variables are stored in plain sight, in a zipped archive, so that they can be easily accessed for manual inspection, or for the use by other tools.

 

Usage demonstration:

Let's open the vault (it will be created if not here yet):

%open_vault -p data/storage.zip

Generate some dummy dataset:

from pandas import DataFrame
from random import choice, randint
cities = ['London', 'Delhi', 'Tokyo', 'Lagos', 'Warsaw', 'Chongqing']
salaries = DataFrame([
    {'salary': randint(0, 100), 'city': choice(cities)}
    for i in range(10000)
])

Store variable in a module

And store it in the vault:

%vault store salaries in datasets

Stored salaries (None → 40CA7812) at Sunday, 08. Dec 2019 11:58

A short description is printed out (including a CRC32 hashsum and a timestamp) by default, but can be disabled by passing --timestamp False to %open_vault magic. Even more information enhancing the reproducibility is stored in the cell metadata.

 

Import variable from a module

We can now load the stored DataFrame in another (or the same) notebook:

%vault import salaries from datasets

Imported salaries (40CA7812) at Sunday, 08. Dec 2019 12:02

Thanks to (optional) memory optimizations we saved some RAM (87% as compared to unoptimized pd.read_csv() result). To track how many MB were saved use --report_memory_gain setting which will display memory optimization results below imports, for example:

Reduced memory usage by 87.28%, from 0.79 MB to 0.10 MB.

Files

krassowski/data-vault-v0.4.1.zip

Files (28.2 kB)

Name Size Download all
md5:40dab86258c231abbf713aab456f24f4
28.2 kB Preview Download

Additional details

Related works