Allow large and binary data files to “piggyback” on top of your existing repositories. push
and pull
large-ish (< 2GB) data files to & from GitHub repositories as attachments to a GitHub release;
Paste the full DESCRIPTION file inside a code block below:
Package: piggyback
Version: 0.0.0.9000
Title: Managing Larger Data on a GitHub Repository
Description: Because larger (> 50 MB) data files cannot easily be committed to git,
a different approach is required to manage data associated with an analysis in a
GitHub repository. This package provides a simple work-around by allowing larger
(up to 2 GB) data files to piggyback on a repository as assets attached to individual
GitHub releases. These files are not handled by git in any way, but instead are
uploaded, downloaded, or edited directly by calls through the GitHub API. These
data files can be versioned manually by creating different releases. This approach
works equally well with public or private repositories. Data can be uploaded
and downloaded programmatically from scripts. No authentication is required to
download data from public repositories.
Authors@R: person("Carl", "Boettiger",
email = "cboettig@gmail.com",
role = c("aut", "cre", "cph"),
comment=c(ORCID = "0000-0002-1642-628X"))
URL: https://github.com/cboettig/piggyback
BugReports: https://github.com/cboettig/piggyback/issues
License: GPL-3
Encoding: UTF-8
LazyData: true
ByteCompile: true
Imports:
gh,
httr,
jsonlite,
git2r,
fs,
usethis,
crayon,
clisymbols
Suggests:
readr,
covr,
testthat,
datasets,
knitr,
rmarkdown
VignetteBuilder: knitr
RoxygenNote: 6.0.1.9000
Roxygen: list(markdown = TRUE)
https://github.com/cboettig/piggyback
reproducibility
, because accessing data being analyzed is essential for reproducible workflows, and yet we have no good solution for workflows with unpublished data or private workflows to do this once the data is too large for version control (e.g. files > 50 mb).
The target audience is anyone working with data files on GitHub.
datastorr
on ropenscilabs
is the closest match, which takes a very different approach (from the user perspective – on the back end both store data on GitHub assets) to the essentially the same problem. The Intro vignette discusses at greater length many of the alternative possible strategies and why I feel they have all fallen short of my needs and led to me creating this package.
Confirm each of the following by checking the box. This package:
paper.md
matching JOSS’s requirements with a high-level description in the package root or in inst/
.R CMD check
(or devtools::check()
) succeed? Paste and describe any errors or warnings:No errors, notes, or warnings.
[x] Does the package conform to rOpenSci packaging guidelines? Please describe any exceptions:
If this is a resubmission following rejection, please explain the change in circumstances:
If possible, please provide recommendations of reviewers - those with experience with similar packages and/or likely users of your package - and their GitHub user names:
Rich FitzJohn, @richfitz, would be great based on his experience in this area and with datastorr
. Jenny Bryan, @Jennybc, since this package makes heavy use of usethis
and GitHub interactions.