Prev Home Up Next |
Abstract
This section describes how to use the DQ2 enduser tools.
dq2-ls [-h/--help | options] PATTERN
List information about datasets matching a given pattern. The pattern may contain wildcards. By default it will use complete and incomplete datasets.
Options:
* --version
Give program version.
* -h, --help
Print online help.
* -v VERSION, --dataset-version=VERSION
Specify dataset version to use. Version 0 points to the latest version.
* -f, --files
List the files in the dataset. The output corresponds to:
( [ ] not available locally, [X] available locally) filename GUID checksum
* -p, --pfn
Display the PFN (in addition to -f/--files) for the current site. Use it like: -fp
* -P, --pool
Create a PoolFileCatalog.xml for the given pattern
* -r, --replicas
List replicas of the dataset.
* -L LOCAL, --local-site=LOCAL
Override environment variable DQ2_LOCAL_SITE_ID.
* -s SITE, --site=SITE
List datasets in site.
* -n, --number
List number of files in dataset.
* -i, --incomplete
Only use incomplete datasets.
* -c, --complete
Only use complete datasets.
* -d, --debug
Print debug information.
PFNs: Files that are not locally available will not be displayed, since there is no PFN!
Please note, that datasets are case-insensitive.
$ dq2-ls *.mlassnig.dataset.cernprod.* user.mlassnig.dataset.CERNPROD.2 user.mlassnig.dataset.CERNPROD.1 user.mlassnig.dataset.CERNPROD.44 user.mlassnig.dataset.cernprod.1.001 user.mlassnig.dataset.cernprod.78 user.mlassnig.dataset.cernprod.79 user.mlassnig.dataset.CERNPROD.77
$ dq2-ls -r user.mlassnig.dataset.1 user.mlassnig.dataset.1 INCOMPLETE: AGLT2 COMPLETE: ASGCDISK_V2 TRIUMFDISK BNLDISK PICDISK CNAFDISK RALDISK NDGFT1DISK LYONDISK FZKDISK SARADISK NDGFT1TAPE BNLTAPE TOKYO GLASGOW CYF UVIC HEPHY-UIBK BU CERNPROD SHEF RAL-LCG2_DATADISK RAL-LCG2_MCDISK ROMA1
$ dq2-ls -f user.mlassnig.dataset.1 user.mlassnig.dataset.1 [X] dummyfile1 35aeb84a-aeef-41f4-bd2f-616972f4cd69 md5:bf7700bd815231d79ec96613f35e175c 52428800 [X] dummyfile2 0c752981-0432-48e6-84b4-54baa245a61a md5:6b2c90fa80e7160b1ff615c606796acb 52428800 [X] dummyfile3 b712f157-e5da-4946-983e-536a97473652 md5:8e46d1a0ebd2de342df654e18737f849 52428800 total files: 3 local files: 3 total size: 157286400 date: 2007-10-11 12:23:39
Marked with [X] means, that the file is locally available. Followed by filename, GUID, checksum and filesize. Summary information is given at the bottom, with the date of last modification.
$ dq2-ls -f -p user.mlassnig.usr.test.2 user.mlassnig.usr.test.2 srm://srm-atlas.cern.ch/castor/cern.ch/grid/atlas/tzero/atlasusertape//user/mlassnig/user.mlassnig.usr.test.2/user.mlassnig.usr.test.2._dummy1 srm://srm-atlas.cern.ch/castor/cern.ch/grid/atlas/tzero/atlasusertape//user/mlassnig/user.mlassnig.usr.test.2/user.mlassnig.usr.test.2._dummy2 total files: 2 total size: 20971520 date: 2008-05-14 08:25:05
Marked with [X] means, that the file is locally available. Followed by filename, GUID, checksum and filesize. Summary information is given at the bottom, with the date of last modification.
$ dq2-ls -P user.mlassnig.usr.test.* Querying DQ2 central catalogues to resolve datasetname user.mlassnig.usr.test.* Processing user.mlassnig.usr.test.4 with PoolFileCatalog.xml Processing user.mlassnig.usr.test.3 with PoolFileCatalog.xml Processing user.mlassnig.usr.test.2 with PoolFileCatalog.xml Processing user.mlassnig.usr.test.1 with PoolFileCatalog.xml
This will create a new PoolFileCatalog.xml and add all files from the given dataset pattern to the PoolFileCatalog.xml. The GUIDs will be unique, so there will be no duplicate entries in the catalogue. If the PoolFileCatalog.xml is empty, please check that a local replica is available at your site. If not, you have to explicitly select the site that you want to use with the -L option.
dq2-get [-h/--help | options] PATTERN
Get a dataset to local storage, with the datasetname matching a pattern. The pattern may contain wildcards.
Options:
* --version
Give program version.
* -h, --help
Print online help.
* -v VERSION, --dataset-version=VERSION
Specify dataset version to use. Version 0 points to the latest version.
* -L LOCAL, --local-site=LOCAL
Override environment variable DQ2_LOCAL_SITE_ID.
* -d, --debug
Print debug information.
* -D, --no-directories
Do not create directories for datasets. Instead put all of them in the current directory. Take care, as this might overwrite files.
* -F READFROMFILE, --read-from-file=READFROMFILE
Read datasetnames (one datasetname per line) from a textfile.
* -s REMOTE, --site=REMOTE
Specify remote site to get the dataset from.
* -i, --incomplete
Only use incomplete replicas.
* -c, --complete
Only use complete replicas.
* -n NSAMPLE, --nsamples=NSAMPLE
Specify NSAMPLE number of random files to download from the dataset.
* -f FILES, --files=FILES
Specify a comma-separated list (no blanks) of filenames from the dataset. Wildcards are allowed. Only those files will be downloaded.
* -p PROTOCOL, --protocol=PROTOCOL
Specify a specific protocol to use. Available are: lcg (LCG), srm (SRM), ng (NorduGrid), castor and rfio (CASTOR), dcap (dCache), posix (POSIX), xrd (xrootd)
Take care, as the system will figure out the best protocol by itself.
* -P, --pool-xml
After succesfully getting all files create a PoolFileCatalog.xml with all the files. If you get multiple datasets, all files from all datasets will go to one PoolFileCatalog.xml
* -S TOSTORAGE, --to-storage=TOSTORAGE
Get the files directly to the specified mass storage directory. Make sure that the directory exists and is writable to your user. You have to give the destination directory as an SRM URL, e.g. srm://srm-atlas.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/grid/atlas/user/mlassnig/testdirectory
* -H TODIRECTORY, --to-here=TODIRECTORY
Get everything into this directory, defined by an absolute path. If the directory does not exist, it will be created.
* -V, --skip-check
If you are downloading a dataset more than once, this option will inhibit verification of already downloaded files.
* -t TIMEOUT, --timeout=TIMEOUT
Specify a timeout to wait on stalled transfers in seconds. Default is 60.
* -T THREADS, --threads=THREADS
Specify the number of concurrent dataset/file threads in D,F notation. Default is 3,3 (meaning 3 datasets can be downloaded concurrently with 3 concurrent file transfers per dataset).
This is the preferred way to get datasets. The system will figure out everything on its own and you don't have to do anything. Note that you can get different output from the transfer-commands depending on the locality of your shell and the dataset.
$ dq2-get user.mlassnig.dataset.1 Querying DQ2 central catalogues to resolve datasetname user.mlassnig.dataset.1 Datasets found: 1 user.mlassnig.dataset.1: Querying DQ2 central catalogues for replicas... Querying DQ2 central catalogues for files in dataset... user.mlassnig.dataset.1: Using site CERNPROD user.mlassnig.dataset.1: Querying local file catalogue of site CERNPROD... user.mlassnig.dataset.1/dummyfile3: Getting metadata for srm://srm.cern.ch:8443/castor/cern.ch/grid/atlas/dq2/user/user.mlassnig.dataset.1/dummyfile3 user.mlassnig.dataset.1/dummyfile2: Getting metadata for srm://srm.cern.ch:8443/castor/cern.ch/grid/atlas/dq2/user/user.mlassnig.dataset.1/dummyfile2 user.mlassnig.dataset.1/dummyfile1: Getting metadata for srm://srm.cern.ch:8443/castor/cern.ch/grid/atlas/dq2/user/user.mlassnig.dataset.1/dummyfile1 user.mlassnig.dataset.1/dummyfile1: is cached at source. user.mlassnig.dataset.1/dummyfile1: Starting transfer: rfcp /castor/cern.ch/grid/atlas/dq2/user/user.mlassnig.dataset.1/dummyfile1 /tmp/mlassnig/user.mlassnig.dataset.1/dummyfile1 user.mlassnig.dataset.1/dummyfile3: is cached at source. user.mlassnig.dataset.1/dummyfile3: Starting transfer: rfcp /castor/cern.ch/grid/atlas/dq2/user/user.mlassnig.dataset.1/dummyfile3 /tmp/mlassnig/user.mlassnig.dataset.1/dummyfile3 user.mlassnig.dataset.1/dummyfile2: is cached at source. user.mlassnig.dataset.1/dummyfile2: Starting transfer: rfcp /castor/cern.ch/grid/atlas/dq2/user/user.mlassnig.dataset.1/dummyfile2 /tmp/mlassnig/user.mlassnig.dataset.1/dummyfile2 user.mlassnig.dataset.1/dummyfile3: 3014656/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 0/52428800 transferred user.mlassnig.dataset.1/dummyfile1: 32768000/52428800 transferred user.mlassnig.dataset.1/dummyfile3: 35520512/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 38141952/52428800 transferred user.mlassnig.dataset.1/dummyfile1: 52428800/52428800 transferred user.mlassnig.dataset.1/dummyfile3: 52428800/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 52428800/52428800 transferred user.mlassnig.dataset.1/dummyfile1: validated user.mlassnig.dataset.1/dummyfile2: validated user.mlassnig.dataset.1/dummyfile3: validated Finished
You can see the output for all concurrent transfers. If something is stuck, please see the FAQ for more information.
This will download all files ending with "2" from the dataset.
$ dq2-get -f *2 user.mlassnig.dataset.1 Querying DQ2 central catalogues to resolve datasetname user.mlassnig.dataset.1 Datasets found: 1 user.mlassnig.dataset.1: Querying DQ2 central catalogues for replicas... Querying DQ2 central catalogues for files in dataset... user.mlassnig.dataset.1: Using site CERNPROD user.mlassnig.dataset.1: Querying local file catalogue of site CERNPROD... user.mlassnig.dataset.1/dummyfile2: Getting metadata for srm://srm.cern.ch:8443/castor/cern.ch/grid/atlas/dq2/user/user.mlassnig.dataset.1/dummyfile2 user.mlassnig.dataset.1/dummyfile2: is cached at source. user.mlassnig.dataset.1/dummyfile2: Starting transfer: rfcp /castor/cern.ch/grid/atlas/dq2/user/user.mlassnig.dataset.1/dummyfile2 /tmp/mlassnig/user.mlassnig.dataset.1/dummyfile2 user.mlassnig.dataset.1/dummyfile2: 10485760/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 52428800/52428800 transferred user.mlassnig.dataset.1/dummyfile2: validated Finished
This will download the dataset into a directory on mass storage. This can be any path that is addressable by SRM. Therefore you have to give the destination path as an SRM URL. Make sure that the directory exists and is writable to you. If they are not, this will fail. Also, take care of shell wildcard expansion. Characters like ? might get expanded by your shell, so put the SRM string into quotes "..."
The following code snippet shows to how create a directory on CASTOR at CERN in a private area (unmanaged by DQ2) and then make it group writable (mandatory for CASTOR at CERN) and dq2-get a dataset into it.
$ rfmkdir /castor/cern.ch/grid/atlas/users/mlassnig/tutorial $ rfchmod 775 /castor/cern.ch/grid/atlas/users/mlassnig/tutorial $ nsls -l /castor/cern.ch/grid/atlas/users/mlassnig/ drwxrwxr-x 0 mlassnig zp 0 Jul 02 11:20 tutorial $ dq2-get -S "srm://srm-atlas.cern.ch:8443/castor/cern.ch/grid/atlas/users/mlassnig/put_tutorial" user.mlassnig.usr.test.2 Querying DQ2 central catalogues to resolve datasetname user.mlassnig.usr.test.2 Datasets found: 1 user.mlassnig.usr.test.2: Querying DQ2 central catalogues for replicas... Querying DQ2 central catalogues for files in dataset... user.mlassnig.usr.test.2: Using site CERN-PROD_USERTAPE user.mlassnig.usr.test.2: Querying local file catalogue of site CERN-PROD_USERTAPE... user.mlassnig.usr.test.2/user.mlassnig.usr.test.2._dummy1: Getting metadata for srm://srm-atlas.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/grid/atlas/tzero/atlasusertape//user/mlassnig/user.mlassnig.usr.test.2/user.mlassnig.usr.test.2._dummy1 user.mlassnig.usr.test.2/user.mlassnig.usr.test.2._dummy2: Getting metadata for srm://srm-atlas.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/grid/atlas/tzero/atlasusertape//user/mlassnig/user.mlassnig.usr.test.2/user.mlassnig.usr.test.2._dummy2 user.mlassnig.usr.test.2/user.mlassnig.usr.test.2._dummy2: is cached at source. user.mlassnig.usr.test.2/user.mlassnig.usr.test.2._dummy2: Starting transfer: lcg-cp -v --vo atlas srm://srm-atlas.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/grid/atlas/tzero/atlasusertape//user/mlassnig/user.mlassnig.usr.test.2/user.mlassnig.usr.test.2._dummy2 srm://srm-atlas.cern.ch:8443/castor/cern.ch/grid/atlas/users/mlassnig/tutorial//user.mlassnig.usr.test.2._dummy2 user.mlassnig.usr.test.2/user.mlassnig.usr.test.2._dummy1: is cached at source. user.mlassnig.usr.test.2/user.mlassnig.usr.test.2._dummy1: Starting transfer: lcg-cp -v --vo atlas srm://srm-atlas.cern.ch:8443/srm/managerv2?SFN=/castor/cern.ch/grid/atlas/tzero/atlasusertape//user/mlassnig/user.mlassnig.usr.test.2/user.mlassnig.usr.test.2._dummy1 srm://srm-atlas.cern.ch:8443/castor/cern.ch/grid/atlas/users/mlassnig/tutorial//user.mlassnig.usr.test.2._dummy1 user.mlassnig.usr.test.2/user.mlassnig.usr.test.2._dummy2: directly put to mass storage, skipping validation user.mlassnig.usr.test.2/user.mlassnig.usr.test.2._dummy1: directly put to mass storage, skipping validation Finished $ nsls -l /castor/cern.ch/grid/atlas/users/mlassnig/tutorial -rwxrwxr-- 1 atlas004 zp 10485760 Jul 02 11:23 user.mlassnig.usr.test.2._dummy1 -rwxrwxr-- 1 atlas004 zp 10485760 Jul 02 11:23 user.mlassnig.usr.test.2._dummy2
If there are no files in the directory after this operation, check your permissions on the directory. For CASTOR at CERN this means making the directory group writable (as shown in the example above)!
This will force the system to download all files ending in "2" of the dataset from the NorduGrid Tier1. Please note that manually selecting a remote site is discouraged as it will put load spikes to that site; it is highly recommended to let dq2-get figure out which remote site to use.
$ dq2-get -s NDGF-T1_DATADISK -f *2 user.mlassnig.dataset.1 Querying DQ2 central catalogues to resolve datasetname user.mlassnig.dataset.1 Datasets found: 1 user.mlassnig.dataset.1: Querying DQ2 central catalogues for replicas... user.mlassnig.dataset.1: Using complete replica at given site Querying DQ2 central catalogues for files in dataset... user.mlassnig.dataset.1: Using site NDGF-T1_DATADISK user.mlassnig.dataset.1: Querying local file catalogue of site NDGF-T1_DATADISK... user.mlassnig.dataset.1/dummyfile2: Getting metadata for srm://srm.ndgf.org:8443/pnfs/ndgf.org/data/atlas/disk/user/user.mlassnig.dataset.1/dummyfile2 user.mlassnig.dataset.1/dummyfile2: is cached at source. user.mlassnig.dataset.1/dummyfile2: Starting transfer: lcg-cp -v --vo atlas srm://srm.ndgf.org:8443/pnfs/ndgf.org/data/atlas/disk/user/user.mlassnig.dataset.1/dummyfile2 file:///tmp/mlassnig/user.mlassnig.dataset.1/dummyfile2 user.mlassnig.dataset.1/dummyfile2: 0/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 2097152/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 5242880/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 8388608/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 10485760/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 13631488/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 16777216/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 19922944/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 22425600/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 25165824/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 28311552/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 31457280/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 34603008/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 36831232/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 39845888/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 42991616/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 46137344/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 49283072/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 51380224/52428800 transferred user.mlassnig.dataset.1/dummyfile2: 52428800/52428800 transferred user.mlassnig.dataset.1/dummyfile2: validated Finished
As you can see, the system automatically figured out how to get the file from the remote site.
dq2-put [-h/--help | options] DATASET
Create a dataset from directories or files from your local disk (this will put the files on a mass-storage-system and register them to the local catalogues). If a file has '.pool.root' in his name, the GUID will be extracted from the file using Athena.
Options:
* --version
Give program version.
* -h, --help
Print online help.
* -d, --debug
Print debug information.
* -a, --automatic
Automatic mode. Does not ask for confirmation.
* -L LOCAL, --local-site=LOCAL
Override environment variable DQ2_LOCAL_SITE_ID.
* -l, --long-surls
Instead of registering compact SURLs to the catalogues, include the port and the srm server string as well.
* -f FILES, --files=FILES
Specify a comma-separated list (no blanks) of filenames from the dataset. Wildcards are allowed. Only those files will be put.
* -C, --do-not-close
Does not close the dataset when the operation finished. Take care, as this leaves the dataset marked as incomplete.
* -F, --freeze
When the operation finishes successfully, freeze the dataset. Take care, as you cannot edit the dataset after that.
* -p PROTOCOL, --protocol=PROTOCOL
Specify a specific protocol to use. Available are: lcg (LCG), srm (SRM), ng (NorduGrid), castor and rfio (CASTOR), dcap (dCache), posix (POSIX), xrd (xrootd).
Take care, as the system will figure out the best protocol by itself.
* -D DATASETSOURCE, --dataset-source=DATASETSOURCE
Source dataset containing the files for the dataset. If the source is a directory, use -s/--source instead.
Automatic file selection via LFNs and option -f is supported.
* -s SOURCE, --source=SOURCE
Gives the name of the source directory to use. If this is not given, the files in the current directory will be put.
Attention:
If you want to register pool files, you need to have Athena set up correctly!
You have a directory of files on your local disk and you want to create a dataset out of those.
directoryWithFiles/dummyfile1 /dummyfile2
You execute:
dq2-put -s directoryWithFiles user.mlassnig.my.new.dataset.1
dq2-put will ask you the following, unless you provide the -a/--automatic switch:
user.mlassnig.my.new.dataset.1 -- confirm creating dataset (2 files): user.mlassnig.my.new.dataset.1._dummyfile1 user.mlassnig.my.new.dataset.1._dummyfile2 Confirm (y/n)?
Now you can confirm and the files will get put on your local mass storage, registered into the LFC and into the DQ2 central catalogues. You will see the some output of the progress of the operation:
Confirmed nsmkdir -p /castor/cern.ch/grid/atlas/dq2/user/mlassnig/user.mlassnig.my.new.dataset.1: starting nsmkdir -p /castor/cern.ch/grid/atlas/dq2/user/mlassnig/user.mlassnig.my.new.dataset.1: done rfcp /tmp/mlassnig/directoryWithFiles/dummyfile1 /castor/cern.ch/grid/atlas/dq2/user/mlassnig/user.mlassnig.my.new.dataset.1/user.mlassnig.my.new.dataset.1._dummyfile1: starting rfcp /tmp/mlassnig/directoryWithFiles/dummyfile2 /castor/cern.ch/grid/atlas/dq2/user/mlassnig/user.mlassnig.my.new.dataset.1/user.mlassnig.my.new.dataset.1._dummyfile2: starting rfcp /tmp/mlassnig/directoryWithFiles/dummyfile2 /castor/cern.ch/grid/atlas/dq2/user/mlassnig/user.mlassnig.my.new.dataset.1/user.mlassnig.my.new.dataset.1._dummyfile2: done rfcp /tmp/mlassnig/directoryWithFiles/dummyfile1 /castor/cern.ch/grid/atlas/dq2/user/mlassnig/user.mlassnig.my.new.dataset.1/user.mlassnig.my.new.dataset.1._dummyfile1: done user.mlassnig.my.new.dataset.1._dummyfile1: verifying user.mlassnig.my.new.dataset.1._dummyfile2: verifying user.mlassnig.my.new.dataset.1._dummyfile1: registered in LFC (2e4031dd-2769-4a0a-83c4-f884538bdab0) user.mlassnig.my.new.dataset.1._dummyfile2: registered in LFC (d5d4ed72-b54f-48a5-bca6-1480129bd272) Dataset user.mlassnig.my.new.dataset.1 registered with version 1 at location CERNPROD with status CLOSED Finished
If you add a file to the source directory and want to add that to the dataset, this will create a new version of the dataset. For example, we add dummyfile3
directoryWithFiles/dummyfile1 /dummyfile2 /dummyfile3
Executing dq2-put -s directoryWithFiles user.mlassnig.my.new.dataset.1 will look through changed files and update accordingly:
Dataset user.mlassnig.my.new.dataset.1 exists already, confirm to create new version 2 and add files (y/n)? y user.mlassnig.my.new.dataset.1 -- confirm versioning dataset (1 file): user.mlassnig.my.new.dataset.1._dummyfile3 Confirm (y/n)?
If you confirm, the files will get added to local storage, the LFC and the existing dataset:
Confirmed nsmkdir -p /castor/cern.ch/grid/atlas/dq2/user/mlassnig/user.mlassnig.my.new.dataset.1: starting nsmkdir -p /castor/cern.ch/grid/atlas/dq2/user/mlassnig/user.mlassnig.my.new.dataset.1: done rfcp /tmp/mlassnig/directoryWithFiles/dummyfile3 /castor/cern.ch/grid/atlas/dq2/user/mlassnig/user.mlassnig.my.new.dataset.1/user.mlassnig.my.new.dataset.1._dummyfile3: starting rfcp /tmp/mlassnig/directoryWithFiles/dummyfile3 /castor/cern.ch/grid/atlas/dq2/user/mlassnig/user.mlassnig.my.new.dataset.1/user.mlassnig.my.new.dataset.1._dummyfile3: done user.mlassnig.my.new.dataset.1._dummyfile3: verifying user.mlassnig.my.new.dataset.1._dummyfile3: registered in LFC (15938689-4856-4bed-8452-4ea5fdba44af) Dataset user.mlassnig.my.new.dataset.1 registered with version 2 at location CERNPROD with status CLOSED Finished
Prev Home Up Next |