This script allows for simple downloading of the VAST Pilot survey from Dropbox.
Features:
- State which Epochs are available.
- Flexible download of data to users wants.
This script requires you to have a Dropbox App 'access token'. You do this by making an 'app' on your Dropbox account and then generating an OAuth token for that app.
This tutorial shows you how to obtain one: http://99rabbits.com/get-dropbox-access-token/. Make sure you select the Full Dropbox
option in the access section.
Otherwise the requirements installed from the main repo will cover all the python needs.
You also need to know the shared Dropbox URL of the Pilot survey and the password.
usage: get_vast_pilot_dbx.py [-h] [--dropbox-config DROPBOX_CONFIG]
[--output OUTPUT] [--available-epochs]
[--available-files AVAILABLE_FILES]
[--get-available-files] [--download]
[--find-fields-input FIND_FIELDS_INPUT]
[--user-files-list USER_FILES_LIST]
[--only-epochs ONLY_EPOCHS]
[--only-fields ONLY_FIELDS] [--stokes STOKES]
[--skip-xml] [--skip-txt] [--skip-qc]
[--skip-components] [--skip-islands]
[--skip-field-images] [--skip-bkg-images]
[--skip-rms-images] [--skip-all-images]
[--combined-only] [--tile-only] [--overwrite]
[--dry-run] [--debug]
[--write-template-dropbox-config]
[--legacy-download LEGACY_DOWNLOAD] [--include-legacy]
[--max-retries MAX_RETRIES]
[--download-threads DOWNLOAD_THREADS]
optional arguments:
-h, --help show this help message and exit
--dropbox-config DROPBOX_CONFIG
Dropbox config file to be read in containing the shared
url, password and access token. A template can be generated
using '--write-template-dropbox-config'. (default:
dropbox.cfg)
--output OUTPUT Name of the local output directory where files will be
saved (default: vast_dropbox)
--available-epochs Print out what Epochs are available. (default: False)
--available-files AVAILABLE_FILES
Provide a file containing the files available on Dropbox to
download. Only use this option only if you wish to override
the built-in list of Dropbox files that is already
provided. (default: None)
--get-available-files
Generate the list of available files on the shared folder.
The list will be saved to a file. (default: False)
--download Download data according to the filter options entered
(default: None)
--find-fields-input FIND_FIELDS_INPUT
Input of fields to fetch (can be obtained from
'find_sources.py'). (default: None)
--user-files-list USER_FILES_LIST
Input of files to fetch. (default: None)
--only-epochs ONLY_EPOCHS
Only download files from the selected epochs. Enter as a
list with no spaces, e.g. '1,2,4x'. If nothing is entered
then all epochs are fetched. The current epochs are: 1, 2,
3x, 4x, 5x, 6x, 7x, 8, 9, 10x, 11x. (default: None)
--only-fields ONLY_FIELDS
Only download files from the selected fields. Enter as a
list with no spaces, e.g. 'VAST_0012+00A,VAST_0012-06A'. If
nothing is entered then all fields are fetched. (default:
None)
--stokes STOKES Select which Stokes data products are to be downloaded
Enter as a list separated by a comma with no space, e.g.
'I,V' (default: None)
--skip-xml Do not download XML files. (default: False)
--skip-txt Do not download txt files. (default: False)
--skip-qc Do not download the QC plots. (default: False)
--skip-components Do not download components selavy files. (default: False)
--skip-islands Do not download island selavy files. (default: False)
--skip-field-images Do not download field images. (default: False)
--skip-bkg-images Do not download background images. (default: False)
--skip-rms-images Do not download background images. (default: False)
--skip-all-images Only download non-image data products. (default: False)
--combined-only Only download the combined products. (default: False)
--tile-only Only download the combined products. (default: False)
--overwrite Overwrite any files that already exist in the output
directory. If overwrite is not selected, integrity checking
will still be performed on the existing files and if the
check fails, the file will be re-downloaded. (default:
False)
--dry-run Only print files that will be downloaded, without
downloading them. (default: False)
--debug Set logging level to debug. (default: False)
--write-template-dropbox-config
Create a template dropbox config file. (default: False)
--legacy-download LEGACY_DOWNLOAD
Select the legacy version to download from. Enter with the
included 'v', e.g. 'v0.6'. Using this option will only
download the legacy data, no other data shall be
downloaded. (default: None)
--include-legacy Include the 'LEGACY' directory when searching through
files. Only valid when using the '--get-available-files'
option. (default: False)
--max-retries MAX_RETRIES
How many times to attempt to retry a failed download
(default: 2)
--download-threads DOWNLOAD_THREADS
How many parallel downloads to attempt. EXPERIMENTAL! See
the VASTDROPBOX.md file for full information. (default: 1)
To run the script needs a Dropbox configuration file, which by default is assumed to be named 'dropbox.cfg'. Create a text file in the following format and enter the respective values:
[dropbox]
shared_url = ENTER_URL
password = ENTER_PASSWORD
access_token = ENTER_ACCESS_TOKEN
Double check that the password is correct before running! Because of the many calls, a wrong password can lead to the link being locked for a period of time.
There is no need to put quotes around the strings. A template can be generated by using:
get_vast_pilot_dbx.py --write-template-dropbox-config
Use the option --dropbox-config
if your config file is named something different than the default.
A log file will be saved for every run of the script.
There are 2 main ways in which the script is intended to be used:
- Download according to filter flags - Use the options such as
--only-epochs
,--only-fields
and other filter options to download the data you want. - Easy downloading of required fields using
--find-fields-input
- Directly uses the output fromfind_sources.py --find-fields
to auto fill the--only-fields
option. Other flags also apply.
Data will only download when the --download
option is provided. This can be used in combination with --dry-run
to see exactly what files will be downloaded before starting the download process.
Note As of vast-tools v1.2.0, the module comes with a packaged list of Dropbox files of the latest release, so users are no longer required to either fetch, or let the script fetch, a list of available files or provide one.
There are a few other options that present but are now mostly considered legacy as they should not be required often, if at all:
--available-epochs
will only display the currently released epochs. Nothing will be downloaded.--get-available-files
will generate a complete list of all the files avaialble in the Dropbox folder. This is a legacy option at this point which shouldn't be needed. The module has an inbuilt file list that is kept up to date and the flexibility in downloading means users no longer need to build their own list of files.--user-files-list
defines a text file that contains the files you wish to download. Usually used in combination with the previous option.
Take note of the --overwrite
option. By default this is set to False
such that it will skip files already present in the output directory. Using this option will download all files and overwrite any exisiting files if they are already present.
The script will check the downloaded file checksum against the correct checksum stored in it's own data file. It should also catch exceptions when downloads timeout or there are network issues. In each case if there is a problem the file will be remembered as failed, and when the main download loop has finished, it will attempt again to download any failed files. You can set how many times it retries using the --max-retries
option.
Check the log output for any files that failed even after retries!
Note the following options in the Dropbox script:
--only-epochs Only download data of the requested epochs. (default: all epochs)
--only-fieds Only download data of the requested fields. (default: all fields)
--stokes Only download selected Stokes. (default: all stokes)
--skip-xml Do not download XML files. (default: False)
--skip-txt Do not download txt files. (default: False)
--skip-qc Do not download the QC plots. (default: False)
--skip-components Do not downlaod selavy component files. (default: False)
--skip-islands Do not downlaod selavy island files. (default: False)
--skip-field-images Do not download field images. (default: False)
--skip-bkg-images Do not download background images. (default: False)
--skip-rms-images Do not download background images. (default: False)
--skip-all-images Only download non-image data products. (default: False)
--combined-only Only download the combined products. (default: False)
--tile-only Only download the combined products. (default: False)
You can use these flags to only obtain the bits of the data you need (see examples below).
Note The filtering does not quite work on the quality control plots due to the slight different naming scheme. This is hoped to be addressed in a future update, but if you wish to view the plots then the suggested method is to filter out everything apart from the QC and download this.
Note It's now easier to perform custom queries using the flags as presented above, so this method should not be needed often, if at all.
When supplying a list of files it needs to follow the directory structure of the Dropbox. It also needs to explictly state the files - i.e. you cannot use wildcards (sorry it's the limitations of using Dropbox this way).
For example if I wanted to download a set of STOKES I COMBINED images from EPOCH01, the file would be:
/EPOCH01/COMBINED/STOKESI_IMAGES/VAST_0918+00A.EPOCH01.I.fits
/EPOCH01/COMBINED/STOKESI_IMAGES/VAST_1739-25A.EPOCH01.I.fits
/EPOCH01/COMBINED/STOKESI_IMAGES/VAST_1753-18A.EPOCH01.I.fits
/EPOCH01/COMBINED/STOKESI_IMAGES/VAST_0943+00A.EPOCH01.I.fits
/EPOCH01/COMBINED/STOKESI_IMAGES/VAST_0216+00A.EPOCH01.I.fits
/EPOCH01/COMBINED/STOKESI_IMAGES/VAST_2143-06A.EPOCH01.I.fits
/EPOCH01/COMBINED/STOKESI_IMAGES/VAST_2208-06A.EPOCH01.I.fits
Note the leading /
which is also needed.
I recommend you either run get_vast_pilot_dbx.py --get-available-files
or grab the file list directly from the dropbox repository, and use this output to build your request (warning --get-available-fields
will take a while to run, up to 30 mins with legacy).
There is an experimental option of --download-threads
. This allows multiple Dropbox download commands to be launched in parallel to speed up the download of large requests. However this mode is considered experimental and logging is not set up to use with this mode. Warning level messages will be printed to the terminal but you will not be receiving any feedback on the download until it completes (see issue #141 on Github). Integrity checking is still performed.
Do not use a high number of parallel downloads, we suggest no more than 6. And if only downloading a small amount of files it is recommended to not use this mode.
Example
get_vast_pilot_dbx.py --download --only-epochs 1 --output VAST_DOWNLOAD --download-threads 4
To download data from a specific legacy version you can use the --legacy-download
option, which takes an argument of the legacy version you wish to download from in the form of the directory name on Dropbox (we suggest browsing the Dropbox folder via a browser to check the versions). For example, to limit the download to v0.6
legacy data, the input would be:
get_vast_pilot_dbx.py --download --output VAST_DOWNLOAD --legacy-download v0.6 <use normal filter flags here>
The above would limit the download request to only use the data that is present in the /LEGACY/v0.6/
directory.
Below are examples of how to download the data with different scenarios in mind. Remember you can add --dry-run
to your command to see exactly what will be downloaded without actually downloading.
To download the entire release structure:
get_vast_pilot_dbx.py --download --output VAST_DOWNLOAD
This will place all the files in the directory VAST_DOWNLOAD
. As of data releast v1.0, the total size stands at 8.0 TB. To clarify this will not download the legacy directory.
Using epoch 01 as an example:
get_vast_pilot_dbx.py --download --only-epochs 1 --output VAST_DOWNLOAD
This will place the EPOCH01 directory in VAST_DOWNLOAD
.
Scenario:
- Download fields VAST_0918+00A and VAST_1739-25A.
- From epochs 1, 2, 8 and 9.
- Combined data products only.
- Stokes I and V.
- Field images and selavy components only (txt files).
- No quality control plots.
get_vast_pilot_dbx.py --download --output VAST_DOWNLOAD --only-epochs 1,2,8,9 --only-fields VAST_0918+00A,VAST_1739-25A --combined-only --stokes I,V --skip-bkg-images --skip-rms-images --skip-islands --skip-xml --skip-qc
This will place the relevant files in the directory VAST_DOWNLOAD
.
-
Run
find_sources.py
for your sources and you will obtain an output like so:ra,dec,name,sbid,field_name 321.749583333333,-44.2686111111111,Q 2123-4429B,9673,VAST_2112-43A 348.945,-59.0544444444444,ESO 148-IG02,9673,VAST_2256-56A
Tip: The input here is only looking for the
field_name
column. So it's also possible to pass a CSV file with just that column header and the fields you want. -
Pass this output to
get_vast_pilot_dbx.py
to download only these fields. In addition, in this example we assume that we only want the combined Stokes I field, and rms images and the selavy component files (txt format only).The command for this becomes:
get_vast_pilot_dbx.py --find-fields-input find-fields-ouput.csv --output VAST_DOWNLOAD --stokes I --skip-xml --skip-bkg-images --skip-qc --skip-islands --combined-only
- Create the text file containing the files, e.g.
to_download.txt
:
/EPOCH01/COMBINED/STOKESI_IMAGES/VAST_0918+00A.EPOCH01.I.fits
/EPOCH01/COMBINED/STOKESI_IMAGES/VAST_1739-25A.EPOCH01.I.fits
/EPOCH01/COMBINED/STOKESI_IMAGES/VAST_1753-18A.EPOCH01.I.fits
/EPOCH01/COMBINED/STOKESI_IMAGES/VAST_0943+00A.EPOCH01.I.fits
/EPOCH01/COMBINED/STOKESI_IMAGES/VAST_0216+00A.EPOCH01.I.fits
/EPOCH01/COMBINED/STOKESI_IMAGES/VAST_2143-06A.EPOCH01.I.fits
/EPOCH01/COMBINED/STOKESI_IMAGES/VAST_2208-06A.EPOCH01.I.fits
- Then run with:
get_vast_pilot_dbx.py --files-list to_download.txt --output VAST_DOWNLOAD
This will place these files in VAST_DOWNLOAD
. The directory structure will be mimiced. You can still apply flags to this method if you want to filter your own list.