Kindly if you find this repo useful for your work, cite & star this repo
This simple script aims to download data from HMP portal. Although the presence of tools to download data from this website like HMP client or portal client, they do not work with me for some files. So, I made this simple script to work around it. And you can use it in case the main tools do not work with you.
Note: Till now, this script supports only valid HTTPS (not Amazon s3, FTP clients).
First, after you get your manifest file, put it in the same directory of the script and run this python3 script.
python3 download_urls.py -i example_manifest.tsv
As dependencies, you need to have (via pip3 or conda) pandas , agrpase and get
To examine some current manifest HTTPS validity, you have two options.
Randomly pick a few of them.
1- Manually, on the website itself like in example, try the manual download button per individual file, if it works, a good sign.
2- From your TSV file copy and paste the link (https://.............bz2) in your browser, if you can see be downloaded, then this is a good sign.
As an output, your manifest will be divided into one successful manifest and one failed manifest file (to list the samples that were not downloaded).
Everything is clear, right? But anyhow, contact me here or directly via email: [email protected]
This tool aims to help others. Kindly, cite my GitHub page!