Arvados sequence uploader and analyzer for MRSA project
To get started, you need to install the uploader first and then run the main.py script in uploader directory.
- Download. You can download the uploader by cloning the github repository using following command:
git clone https://github.com/bio-ontology-research-group/mrsa-sequences.git
- Prepare your system. You need to make sure you have Python, and the ability to install modules such as
pycurl
andpyopenssl
. On Ubuntu 18.04, you can run:
sudo apt update
sudo apt install -y virtualenv git libcurl4-openssl-dev build-essential python3-dev libssl-dev
- Create and enter your virtualenv. Go to downloaded uploader directory and make and enter a virtualenv:
virtualenv --python python3 venv
. venv/bin/activate
Note that you will need to repeat the . venv/bin/activate
step from this directory to enter your virtualenv whenever you want to use the installed tool.
- Install the dependencies. Once the virtualenv is setup, install the dependencies:
pip install -r requirements.txt
- Test the tool. Try running:
python uploader/main.py --help
- Set Arvados API Token. Before uploading the sequence files, you need to set arvados api token value to environment variable ARVADOS_API_TOKEN. It will look something as the following:
export ARVADOS_API_TOKEN=2jv9346o396exampledonotuseexampledonotuseexes7j1ld
You can find the arvados token at current token link in your user profile menu on arvados web portal.
Run the uploader with a FASTA or FASTQ reads gzipped files and accompanying metadata file in YAML:
python uploader/main.py reads1.fastq.gz reads2.fastq.gz metadata.yaml
You can find the example files on mrsa web uploader. Here are the links to example files:
- Example fastq read file 1
- Example fastq read file 2
- Example Metadata file
Once the sequence is uploaded, you can see the status of the job in state.json file.