There is a newer version of the record available.

Published June 23, 2020 | Version 0.13.0
Software Open

datalad/datalad: 0.13.0 (June 23, 2020)

  • 1. Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany and Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
  • 2. Dartmouth College, Hanover, NH, United States
  • 3. Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
  • 4. University of Texas at Austin
  • 5. UC Berkeley - UCSF Graduate Program in Bioengineering
  • 6. UC Berkeley
  • 7. Stanford University, Stanford, CA, United States
  • 8. Psychoinformatics Lab, INM-7, Research Centre Juelich

Description

A handful of new commands, including copy-file, push, and create-sibling-ria, along with various fixes and enhancements

<details> <summary>Changes since rc2</summary> - git-annex-remote-ora has been updated for compatibility with annexremote v1.4.2. #4573 - A progress bar fix from rc2 led to unintended messages when not attached to a tty. #4575 - `publish` is no longer marked as deprecated. #4578 - `push` #4620 - `--force` no longer takes "no-datatransfer" as a value. There is instead a `--data` option that takes the values "anything", "nothing", "auto", "auto-if-wanted". "auto-if-wanted" (the default) results in `--auto` being added to `git annex copy` calls if the sibling was configured to prefer content via `git annex wanted`. - The "pushall" and "datatransfer" values of `--force` have been renamed to "all" and "checkdatapresent", respectively. - The `--since=` option of `push` now takes '^', not an empty string, to mean "the last known state of the matching branch on the sibling". #4617 - `datalad.get.subdataset-source-candidate-NAME` can now include a cost value by appending three digits to `NAME`. #4619 </details>Major refactoring and deprecations
  • The no_annex parameter of create, which is exposed in the Python API but not the command line, is deprecated and will be removed in a later release. Use the new annex argument instead, flipping the value. Command-line callers that use --no-annex are unaffected. #4321

  • datalad add, which was deprecated in 0.12.0, has been removed. #4158 #4319

  • The following GitRepo and AnnexRepo methods have been removed: get_changed_files, get_missing_files, and get_deleted_files. #4169 #4158

  • The get_branch_commits method of GitRepo and AnnexRepo has been renamed to get_branch_commits_. #3834

  • The custom commit method of AnnexRepo has been removed, and AnnexRepo.commit now resolves to the parent method, GitRepo.commit. #4168

  • GitPython's git.repo.base.Repo class is no longer available via the .repo attribute of GitRepo and AnnexRepo. #4172

  • AnnexRepo.get_corresponding_branch now returns None rather than the current branch name when a managed branch is not checked out. #4274

  • The special UUID for git-annex web remotes is now available as datalad.consts.WEB_SPECIAL_REMOTE_UUID. It remains accessible as AnnexRepo.WEB_UUID for compatibility, but new code should use consts.WEB_SPECIAL_REMOTE_UUID #4460.

Fixes
  • Widespread improvements in functionality and test coverage on Windows and crippled file systems in general. #4057 #4245 #4268 #4276 #4291 #4296 #4301 #4303 #4304 #4305 #4306

  • AnnexRepo.get_size_from_key incorrectly handled file chunks. #4081

  • create-sibling would too readily clobber existing paths when called with --existing=replace. It now gets confirmation from the user before doing so if running interactively and unconditionally aborts when running non-interactively. #4147

  • update #4159

    • queried the incorrect branch configuration when updating non-annex repositories.
    • didn't account for the fact that the local repository can be configured as the upstream "remote" for a branch.
  • When the caller included --bare as a git init option, create crashed creating the bare repository, which is currently unsupported, rather than aborting with an informative error message. #4065

  • The logic for automatically propagating the 'origin' remote when cloning a local source could unintentionally trigger a fetch of a non-local remote. #4196

  • All remaining get_submodules() call sites that relied on the temporary compatibility layer added in v0.12.0 have been updated. #4348

  • The custom result summary renderer for get, which was visible with --output-format=tailored, displayed incorrect and confusing information in some cases. The custom renderer has been removed entirely. #4471

  • The documentation for the Python interface of a command listed an incorrect default when the command overrode the value of command parameters such as result_renderer. #4480

Enhancements and new features
  • The default result renderer learned to elide a chain of results after seeing ten consecutive results that it considers similar, which improves the display of actions that have many results (e.g., saving hundreds of files). #4337

  • The default result renderer, in addition to "tailored" result renderer, now triggers the custom summary renderer, if any. #4338

  • The new command create-sibling-ria provides support for creating a sibling in a [RIA store][handbook-scalable-datastore]. #4124

  • DataLad ships with a new special remote, git-annex-remote-ora, for interacting with [RIA stores][handbook-scalable-datastore] and a new command [export-archive-ora][] for exporting an archive from a local annex object store. #4260 #4203

  • The new command push provides an alternative interface to publish for pushing a dataset hierarchy to a sibling. #4206 #4581 #4617 #4620

  • The new command [copy-file][] copies files and associated availability information from one dataset to another. #4430

  • The command examples have been expanded and improved. #4091 #4314 #4464

  • The tooling for linking to the [DataLad Handbook][handbook] from DataLad's documentation has been improved. #4046

  • The --reckless parameter of clone and install learned two new modes:

    • "ephemeral", where the .git/annex/ of the cloned repository is symlinked to the local source repository's. #4099
    • "shared-{group|all|...}" that can be used to set up datasets for collaborative write access. #4324
  • clone

    • learned to handle dataset aliases in RIA stores when given a URL of the form ria+<protocol>://<storelocation>#~<aliasname>. #4459
    • now checks datalad.get.subdataset-source-candidate-NAME to see if NAME starts with three digits, which is taken as a "cost". Sources with lower costs will be tried first. #4619
  • update #4167

    • learned to disallow non-fast-forward updates when ff-only is given to the --merge option.
    • gained a --follow option that controls how --merge behaves, adding support for merging in the revision that is registered in the parent dataset rather than merging in the configured branch from the sibling.
    • now provides a result record for merge events.
  • create-sibling now supports local paths as targets in addition to SSH URLs. #4187

  • siblings now

    • shows a warning if the caller requests to delete a sibling that does not exist. #4257
    • phrases its warning about non-annex repositories in a less alarming way. #4323
  • The rendering of command errors has been improved. #4157

  • save now

    • displays a message to signal that the working tree is clean, making it more obvious that no results being rendered corresponds to a clean state. #4106
    • provides a stronger warning against using --to-git. #4290
  • diff and save learned about scenarios where they could avoid unnecessary and expensive work. #4526 #4544 #4549

  • Calling diff without --recursive but with a path constraint within a subdataset ("<subdataset>/<path>") now traverses into the subdataset, as "<subdataset>/" would, restricting its report to "<subdataset>/<path>". #4235

  • New option datalad.annex.retry controls how many times git-annex will retry on a failed transfer. It defaults to 3 and can be set to 0 to restore the previous behavior. #4382

  • wtf now warns when the specified dataset does not exist. #4331

  • The repr and str output of the dataset and repo classes got a facelift. #4420 #4435 #4439

  • The DataLad Singularity container now comes with p7zip-full.

  • DataLad emits a log message when the current working directory is resolved to a different location due to a symlink. This is now logged at the DEBUG rather than WARNING level, as it typically does not indicate a problem. #4426

  • DataLad now lets the caller know that git annex init is scanning for unlocked files, as this operation can be slow in some repositories. #4316

  • The log_progress helper learned how to set the starting point to a non-zero value and how to update the total of an existing progress bar, two features needed for planned improvements to how some commands display their progress. #4438

  • The ExternalVersions object, which is used to check versions of Python modules and external tools (e.g., git-annex), gained an add method that enables DataLad extensions and other third-party code to include other programs of interest. #4441

  • All of the remaining spots that use GitPython have been rewritten without it. Most notably, this includes rewrites of the clone, fetch, and push methods of GitRepo. #4080 #4087 #4170 #4171 #4175 #4172

  • When GitRepo.commit splits its operation across multiple calls to avoid exceeding the maximum command line length, it now amends to initial commit rather than creating multiple commits. #4156

  • GitRepo gained a get_corresponding_branch method (which always returns None), allowing a caller to invoke the method without needing to check if the underlying repo class is GitRepo or AnnexRepo. #4274

  • A new helper function datalad.core.local.repo.repo_from_path returns a repo class for a specified path. #4273

  • New AnnexRepo method localsync performs a git annex sync that disables external interaction and is particularly useful for propagating changes on an adjusted branch back to the main branch. #4243

Files

datalad/datalad-0.13.0.zip

Files (1.8 MB)

Name Size Download all
md5:9c14d97b8d8cc5fc3acf3e0bf608ba1f
1.8 MB Preview Download

Additional details

Related works

Funding

CRCNS US-German Data Sharing: DataGit - converging catalogues, warehouses, and deployment logistics into a federated 'data distribution' 1429999
National Science Foundation