There is a newer version of the record available.

Published January 28, 2020 | Version 1.0.0
Software Open

Pooch v1.0.0: A friend to fetch your data files

  • 1. Polar Science Center, University of Washington Applied Physics Lab, USA
  • 2. Department of Earth, Ocean and Ecological Sciences, School of Environmental Sciences, University of Liverpool, UK

Description

Does your Python package include sample datasets? Are you shipping them with the code? Are they getting too big?

Pooch is here to help! It will manage a data registry by downloading your data files from a server only when needed and storing them locally in a data cache (a folder on your computer).

Here are Pooch's main features:

  • Pure Python and minimal dependencies.
  • Download a file only if necessary (it's not in the data cache or needs to be updated).
  • Verify download integrity through SHA256 hashes (also used to check if a file needs to be updated).
  • Designed to be extended: plug in custom download (FTP, scp, etc) and post-processing (unzip, decompress, rename) functions.
  • Includes utilities to unzip/decompress the data upon download to save loading time.
  • Can handle basic HTTP authentication (for servers that require a login) and printing download progress bars.
  • Easily set up an environment variable to overwrite the data cache location.

Are you a scientist or researcher? Pooch can help you too!

  • Automatically download your data files so you don't have to keep them in your GitHub repository.
  • Make sure everyone running the code has the same version of the data files (enforced through the SHA256 hashes).

Pooch v0.7.1 was reviewed at the Journal of Open Source Software: https://github.com/openjournals/joss-reviews/issues/1943

Documentation: https://www.fatiando.org/pooch

Source code: https://github.com/fatiando/pooch

Part of the Fatiando a Terra project.

Files

pooch-1.0.0.zip

Files (215.1 kB)

Name Size Download all
md5:d15aae78da5b84eeef8fc0f670ef51af
215.1 kB Preview Download