datalad/datalad: First release candidate for 0.14.0 (January 26, 2021)
Creators
- Hanke, Michael1
- Halchenko, Yaroslav O.2
- Poldrack, Benjamin3
- Meyer, Kyle2
- Solanky, Debanjum Singh2
- Alteva, Gergana
- Gors, Jason2
- MacFarlane, Dave
- Olaf Häusler, Christian
- Olson, Taylor
- Waite, Alex3
- De La Vega, Alejandro4
- Sochat, Vanessa
- Keshavan, Anisha5
- Ma, Feilong2
- Christian, Horea
- Poelen, Jorrit
- Skytén, Kusti
- Visconti di Oleggio Castello, Matteo6
- Hardcastle, Nell
- Stoeter, Torsten
- C Lau, Vicky
- Markiewicz, Christopher J.7
- Wagner, Adina S.8
- Nichols, B. Nolan9
- 1. Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany and Institute of Systems Neuroscience, Medical Faculty, Heinrich Heine University Düsseldorf, Düsseldorf, Germany
- 2. Dartmouth College, Hanover, NH, United States
- 3. Institute of Neuroscience and Medicine, Brain & Behaviour (INM-7), Research Centre Jülich, Jülich, Germany
- 4. University of Texas at Austin
- 5. UC Berkeley - UCSF Graduate Program in Bioengineering
- 6. UC Berkeley
- 7. Stanford University, Stanford, CA, United States
- 8. Psychoinformatics Lab, INM-7, Research Centre Juelich
- 9. Maze Therapeutics, South San Francisco, CA, United States
Description
Major refactoring and deprecations
Git versions below v2.19.1 are no longer supported. #4650
The minimum supported version of Python is now 3.6. #4879
publish is now deprecated in favor of push. It will be removed in the 0.15.0 release at the earliest.
A new command runner was added in v0.13. Functionality related to the old runner has now been removed:
Runner
,GitRunner
, andrun_gitcommand_on_file_list_chunks
from thedatalad.cmd
module along with thedatalad.tests.protocolremote
,datalad.cmd.protocol
, anddatalad.cmd.protocol.prefix
configuration options. #5229The
--no-storage-sibling
switch ofcreate-sibling-ria
is deprecated in favor of--storage-sibling=off
and will be removed in a later release. #5090The
get_git_dir
static method ofGitRepo
is deprecated and will be removed in a later release. Use thedot_git
attribute of an instance instead. #4597The
ProcessAnnexProgressIndicators
helper fromdatalad.support.annexrepo
has been removed. #5259The
save
argument of install, a noop since v0.6.0, has been dropped. #5278The
get_URLS
method ofAnnexCustomRemote
is deprecated and will be removed in a later release. #4955ConfigManager.get
now returns a single value rather than a tuple when there are multiple values for the same key, as very few callers correctly accounted for the possibility of a tuple return value. Callers can restore the old behavior by passingget_all=True
. #4924In 0.12.0, all of the
assure_*
functions indatalad.utils
were renamed asensure_*
, keeping the old names around as compatibility aliases. Theassure_*
variants are now marked as deprecated and will be removed in a later release. #4908The
datalad.inteface.run
module, which was deprecated in 0.12.0 and kept as a compatibility shim fordatalad.core.local.run
, has been removed. #4583The
saver
argument ofdatalad.core.local.run.run_command
, marked as obsolete in 0.12.0, has been removed. #4583The
dataset_only
argument of theConfigManager
class was deprecated in 0.12 and has now been removed. #4828The
linux_distribution_name
,linux_distribution_release
, andon_debian_wheezy
attributes indatalad.utils
are no longer set at import time and will be removed in a later release. Usedatalad.utils.get_linux_distribution
instead. #4696datalad.distribution.clone
, which was marked as obsolete in v0.12 in favor ofdatalad.core.distributed.clone
, has been removed. #4904datalad.support.annexrepo.N_AUTO_JOBS
, announced as deprecated in v0.12.6, has been removed. #4904The
compat
parameter ofGitRepo.get_submodules
, added in v0.12 as a temporary compatibility layer, has been removed. #4904The long-deprecated (and non-functional)
url
parameter ofGitRepo.__init__
has been removed. #5342
Cloning onto a system that enters adjusted branches by default (as Windows does) did not properly record the clone URL. #5128
The RIA-specific handling after calling clone was correctly triggered by
ria+http
URLs but notria+https
URLs. #4977The remote calls to
cp
andchmod
in create-sibling were not portable and failed on macOS. #5108A more reliable check is now done to decide if the configuration files need to be reloaded. #5276
The internal command runner's handling of the event loop has been improved to play nicer with outside applications and scripts that use asyncio. #5350 #5367
The subdataset handling for adjusted branches, which is particularly important on Windows where git-annex enters an adjusted branch by default, has been improved. A core piece of the new approach is registering the commit of the primary branch, not its checked out adjusted branch, in the superdataset. Note: This means that
git status
will always considered a subdataset on an adjusted branch as dirty whiledatalad status
will look more closely and see if the tip of the primary branch matches the registered commit. #5241create-sibling-github learned how to create private repositories (thanks to Nolan Nichols). #4769
create-sibling-ria gained a
--storage-sibling
option. When--storage-sibling=only
is specified, the storage sibling is created without an accompanying Git sibling. This enables using hosts without Git installed for storage. #5090get, save, and addurls gained support for parallel operations that can be enabled via the
--jobs
command-line option or the newdatalad.runtime.max-jobs
configuration option. #5022The download machinery (and thus the
datalad
special remote) gained support for a new scheme,shub://
, which follows the same format used bysingularity run
and friends. In contrast to the short-lived URLs obtained by querying Singularity Hub directly,shub://
URLs are suitable for registering with git-annex. #4816A provider is now included for https://registry-1.docker.io URLs. This is useful for storing an image's blobs in a dataset and registering the URLs with git-annex. #5129
-
- learned how to read data from standard input. #4669
- now supports tab-separated input. #4845
- now lets Python callers pass in a list of records rather than a file name. #5285
- gained a
--drop-after
switch that signals to drop a file's content after downloading and adding it to the annex. #5081 - is now able to construct a tree of files from known checksums without downloading content via its new
--key
option. #5184 - records the URL file in the commit message as provided by the caller it rather than using the resolved absolute path. #5091
- is now speedier. #4867 #5022
The
add-readme
command now links to the DataLad handbook rather than http://docs.datalad.org. #4991DataLad now ships with a module that is capable of installing git-annex via various methods. See
python -m datalad.install -h
. #5098 #5139New option
datalad.locations.extra-procedures
specifies an additional location that should be searched for procedures. #5156The class for handling configuration values,
ConfigManager
, now takes a lock before writes to allow for multiple processes to modify the configuration of a dataset. #4829clone now records the original, unresolved URL for a subdataset under
submodule.<name>.datalad-url
in the parent's .gitmodules, enabling later get calls to use the original URL. This is particularly useful forria+
URLs. #5346Installing a subdataset now uses custom handling rather than calling
git submodule update --init
. This avoids some locking issues when running get in parallel and enables more accurate source URLs to be recorded. #4853The performance of the subdatasets command has been improved, with substantial speedups for recursive processing of many subdatasets. #4868 #5076
Adding new subdatasets via save has been sped up. #4793
GitRepo.get_content_info
, a helper that gets triggered by many commands, got faster by tweaking itsgit ls-files
call. #5067wtf now includes credentials-related information (e.g. active backends) in the its output. #4982
The
call_git*
methods ofGitRepo
now have aread_only
parameter. Callers can set this toTrue
to promise that the provided command does not write to the repository, bypassing the cost of some checks and locking. #5070New
call_annex*
methods in theAnnexRepo
class provide an interface for running git-annex commands similar to that of theGitRepo.call_git*
methods. #5163It's now possible to register a custom metadata indexer that is discovered by search and used it to generate an index. #4963
The
ConfigManager
methodsget
,getbool
,getfloat
, andgetint
now return a single value (with same precedence asgit config --get
) when there are multiple values for the same key (in the non-committed git configuration, if the key is present there, or in the dataset configuration). Forget
, the old behavior can be restored by specifyingget_all=True
. #4924Command-line scripts are now defined via the
entry_points
argument ofsetuptools.setup
instead of thescripts
argument. #4695Interactive use of
--help
on the command-line now invokes a pager on more systems and installation setups. #5344The
datalad
special remote now tries to eliminate some unnecessary interactions with git-annex by being smarter about how it queries for URLs associated with a key. #4955The
GitRepo
class now does a better job of handling bare repositories, a step towards bare repositories support in DataLad. #4911More internal work to move the code base over to the new command runner. #4699 #4855 #4900 #4996 #5002 #5141 #5142 #5229
Files
datalad/datalad-0.14.0rc1.zip
Files
(1.9 MB)
Name | Size | Download all |
---|---|---|
md5:16d97ec50ada69e0d1fa597c7f2dca5a
|
1.9 MB | Preview Download |
Additional details
Related works
- Is supplement to
- https://github.com/datalad/datalad/tree/0.14.0rc1 (URL)
Funding
- CRCNS US-German Data Sharing: DataGit - converging catalogues, warehouses, and deployment logistics into a federated 'data distribution' 1429999
- National Science Foundation