Prev Home Up Next |
Abstract
This section presents frequently asked questions (and their answers).
Q: What's the deal with the site identification (DQ2_LOCAL_SITE_ID)
A: We need to have a consistent view on all available sites to figure out the best sources and protocols for a transfer. If you set a wrong site id, then dq2-get may perform abysmal and dq2-put will create the dataset in the wrong place. You can check all available sites with the dq2-sources command. Please note that there are a few compound sitenames like CERN, which include all subsites like CERN-PROD_USERDISK. Never use such compound sites for dq2-put operations.
Q: SRM metadata requests seem to hang or fail completely or return wrong values.
A: You are not using a voms-enabled proxy. Check that you use "voms-proxy-init -voms atlas" and not grid-proxy-init.
Q: When I dq2-get a dataset, the files are transferred but then get deleted. What happened?
A: The downloaded files mismatch on filesize and/or checksum. Please check your storage and network.
Q: I cannot create a dataset because it tells me that the dataset already exists. I checked but could not find it.
A: The dataset got deleted at some point in time. To support full consistency DQ2 prohibits the use of already used dataset names, even if they were deleted.
Q: The tools always say that I am at CERN-PROD_something when this is not the case.
A: You have to set the name of the site you consider local, either permanently through the use of environment variable DQ2_LOCAL_SITE_ID, or the -L option to the tool.
Q: dq2-get says 'starting transfer' but then nothing is happening.
A: Once dq2-get says 'starting transfer' the actual command for transferring was successfully executed. If the progress information display does not start it means that the actual command for transferring is still waiting for the remote site to react. Once the first byte is transferred and locally on disk the progress information will start to display. If it hangs for a long time and you are absolutely sure you are not waiting for a tape stage-in, you can cancel with a SIGINT (CTRL-C). The progress information only displays updates if a change to the filesize locally on disk is detected.
Q: I wanted to get 'filename' but I see something like 'filename__DQ2-1192282702'. Is it the same file?
A: Someone outside of the DQ2 infrastructure did a manual intervention and managed to register a filename that is in use already at the local file catalogues. The site services corrected the consistency problem by adding a timestamp so your data does not get lost. After the file is downloaded the extension will be removed automatically.
Q: What is the algorithm to select a site to transfer from?
A: For now we only have a simple algorithm. First, we check if there is a local replica. If yes, we use that. If no, we check if there is a remote replica on DISK sites. If yes, we randomly shuffle that set. Then we check if there is a remote replica on any other site. If yes, we randomly shuffle that set as well. Then we take the first element of that combined list of sites. This ensures an equal spread over all available sites. Future improvements will be based on usage statistics.
Q: After a while, after some data was transferred, it grinds to a halt and does not continue.
A: This is a problem with external tools. There's not much we can do about it. Restart dq2-get. Already transferred files will be deep-check validated and only missing files will be transferred again. Since most of the external tools do not support resumable transfers each individual file will have to be restarted from the beginning.
Q: It seems that SRM does not return any values.
A: Most probably the remote SRM endpoint is currently down. There is nothing we can do about that.
Q: I try to get a dataset from an gsiftp endpoint, and even though the tool finishes, there are no files there.
A: Please execute the copy-command manually (it is printed in the tool log). If there is a message saying "Gridmap lookup failure", then please contact the site administrator to update the gridmap file with your certificate DN.
Q: How does the -T/--threads option in dq2-get work?
A: You can give the number of dataset and file threads, like "--threads=2,5". This means that at most 2 datasets will be processed concurrently and in each processed dataset at most 5 files will be processed concurrently. So setting this to 2:5 will spawn 2*5 = 10 workerthreads. Default is 3,3. Please remember that more threads does not equal more performance. As a general rule of thumb you should not spawn more than numberOfCPUs*5 threads in total.
Prev Home Up Next |