Skip to content

Commit

Permalink
Updated README
Browse files Browse the repository at this point in the history
  • Loading branch information
Nam-phuong Nguyen committed Mar 25, 2019
1 parent 04205c2 commit 8b6770a
Show file tree
Hide file tree
Showing 2 changed files with 131 additions and 58 deletions.
169 changes: 115 additions & 54 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,77 +4,90 @@ ViFi is a tool for detecting viral integration and fusion mRNA sequences from Ne

ViFi is currently in alpha testing, is is constantly undergoing revisions. High on the priority list is an easier installation process, as well as improve user interface. Please report any problems/bugs to Nam Nguyen ([email protected]) so that ViFi can be improved and problems can be quickly corrected.

## UPDATE

## Installation:
We provide instructions for installing ViFi on Linux below.
Due to major issues with incompatibilities between versions of Pysam and Samtools, Python versions, as well as issues with software compatibility between different platforms, we highly recommend that users discontinue the use the Python version of ViFi, and instead, use the Dockerized version of ViFi. The Dockerized version is platform independent and only requires Python (either version 2.7 or 3.0) and Docker to be installed, and no other software package is needed. We outline below how to set up and install the Dockerized version, and how to run the Dockerized version.

In addition, we include a [Tutorial] for all the different options within ViFi below. We will include instructures
on how to run ViFi from the source code, but again, strongly discourage against this usage.

## Installation of ViFi for use in Docker

We provide instructions for preparing ViFi to be used for Docker below. If Perl is installed,
the setup.sh script can be run that will automatically perform steps 3-7. Note that ViFi requires a large
amount of diskspace to setup and run (10 Gb) due to the large size of the initial reference repositories.

1. Install Dependencies:
1. Python (2.7 or 3.0; instructions for 2.7 is shown)
2. Docker (https://docs.docker.com/install/)

2. Download and run setup.sh (If Perl is installed and on Mac/Linux system).
Running this script will automatically download ViFi from GitHub, automatically download
the repositories from Google Drive, pull the latest ViFi docker image, set all the
environmental variables for ViFi, build the BWA index for
hg19+HPV via Docker, and run a test run of ViFi via Docker. It can take up to an hour for the full
set of tests to complete and run. Make sure you have at least 10 Gb of space free for the process to complete.

1. ViFi download (if you have not already cloned this source code):
```
wget https://raw.githubusercontent.com/namphuon/ViFi/master/setup_linux_mac.sh
sh setup_linux_mac.sh
```

Run steps 3-7 are only necessary if Perl is not installed on the machine or on Windows machine. If Perl is on the machine, then setup_linux_mac.sh can be run to automatically set up ViFi (see to Step 2).

3. Clone the ViFi repository
```
git clone https://github.com/namphuon/ViFi.git
```
2. Install Dependencies:
1. Python 2.7
```
sudo dnf install python2
```
2. Pysam verion 0.9.0 or higher (https://github.com/pysam-developers/pysam):
```
sudo pip install pysam
```
3. Samtools 1.3.1 or higher (www.htslib.org/)
```
sudo apt-get install samtools
```
4. BWA 0.7.15 or higher (bio-bwa.sourceforge.net/)
```
sudo apt-get install bwa
```
5. Install HMMER v3.1b2 and have it on the path (http://hmmer.org/)
```
sudo apt-get install hmmer
```
3. Set the ViFi directory and include the python source to your Python path

4. Set the ViFi directory and include the python source to your Python path
```
echo export VIFI_DIR=/path/to/ViFi >> ~/.bashrc
echo export PYTHONPATH=/path/to/ViFi:/path/to/ViFi/src:$PYTHONPATH >> ~/.bashrc
```
4. Download the data repositories:
5. Download the data repositories:
While we include some annotations, we are unable to host some large files in the git repository. These may be downloaded from https://drive.google.com/open?id=0ByYcg0axX7udUDRxcTdZZkg0X1k. Thanks to Peter Ulz for noticing incorrect link earlier.
```
tar zxf data_repo.tar.gz
echo export AA_DATA_REPO=$PWD/data_repo >> ~/.bashrc
source ~/.bashrc
```
5. Download the HMM models:
6. Download the HMM models:
We have pre-build HMM models for HPV and HBV. They can be downloaded from https://drive.google.com/open?id=0Bzp6XgpBhhghSTNMd3RWS2VsVXM.
```
unzip data.zip
echo export REFERENCE_REPO=$PWD/data >> ~/.bashrc
```
6. Build a BWA index on the reference sequences from human+viral sequences:
7. Build a BWA index on the reference sequences from human+viral sequences:
We show an example of building an index of human+viral sequences using Hg19 and **HPV** and **HBV** below. However
any reference organism+viral family could be used.
```
cat $AA_DATA_REPO//hg19/hg19full.fa $REFERENCE_REPO/hpv/hpv.unaligned.fas > $REFERENCE_REPO/hpv/hg19_hpv.fas
bwa index $REFERENCE_REPO/hpv/hg19_hpv.fas
cat $AA_DATA_REPO//hg19/hg19full.fa $REFERENCE_REPO/hbv/hbv.unaligned.fas > $REFERENCE_REPO/hbv/hg19_hbv.fas
bwa index $REFERENCE_REPO/hbv/hg19_hbv.fas
```
## Running ViFi
```

We show the most basic example of running ViFi below. This version assumes that the user has
followed all the previous steps. More advanced options, such as using a customized reference organism/viral
family is provided in the [Advanced Notes](#advanced_notes) section.
## Dockerized ViFi (RECOMMENDED)

We have also created a dockerized version of ViFi to enable easier time running (see previous section for installation and setup). To get the latest version of the Dockerized ViFo, run:
```
python run_vifi.py -f <input_R1.fq.gz> -r <input_R2.fq.gz> -o <output_dir>
docker pull namphuon/vifi
```

Note that this version defaults to searching for **HPV**. To search for HBV, run the following command.
To run the dockerized version of ViFi, first create the data repositories as above, including setting the environmental variables. Next, run the following command:

`python $VIFI_DIR/scripts/run_vifi.py -f <READ1> -r <READ2> --docker`

where <READ1> and <READ2> are the FASTQ files (gzipped or unzipped). Note that the $VIFI_DIR, $AA_DATA_REPO and $REFERENCE_REPO variables must be set in order for the script to find the necessary files.

Example (assuming that $VIFI_DIR is set):

```
python run_vifi.py -f <input_R1.fq.gz> -r <input_R2.fq.gz> -o <output_dir> -v hbv
python $VIFI_DIR/scripts/run_vifi.py -f $VIFI_DIR/test/data/test_R1.fq.gz -r $VIFI_DIR/test/data/test_R2.fq.gz --docker
```


## ViFi Output

The output of ViFi is the list of read clusters discovered, and for each read cluster, the relaxed, stringent, and exact (if split reads are present) ranges are reported, aswell as the read names of the reads in the cluster.
Expand Down Expand Up @@ -138,28 +151,76 @@ Finally, ViFi outputs several working files that can be deleted after a run. Th
7. \<prefix\>.fixed.trans.bam - A BAM file created by merging 6. and any human/viral paired end reads discovered by running the viral HMMs on 3.
8. \<prefix\>.fixed.trans.cs.bam - A coordinate sorted BAM file of 7.

## Dockerized ViFi

We have also created a dockerized version of ViFi to enable easier time running. The docker version of ViFi can be obtained
by installing Docker (https://www.docker.com/), and running the following command:

docker pull namphuon/vifi

To run the dockerized version of ViFi, first create the data repositories as above, including setting the environmental variables.
Next, run the following script in the ViFi scripts directory:

`docker_vifi.sh <INPUT_DIR> <READ1> <READ2> <OUTPUT> <CPUS>`
## Installation (Depreciated):
We provide instructions for installing ViFi on Linux below.

where <INPUT_DIR> is the directory containing the <READ1> and <READ2> files, and <CPUS> is the number of
CPUs to use. Note that the full path must
be given for the input and output directory, and the $AA_DATA_REPO and $REFERENCE_REPO variables must be
set in order for the script to find the necessary files.
1. ViFi download (if you have not already cloned this source code):
```
git clone https://github.com/namphuon/ViFi.git
```
2. Install Dependencies:
1. Python 2.7
```
sudo dnf install python2
```
2. Pysam verion 0.9.0 or higher (https://github.com/pysam-developers/pysam):
```
sudo pip install pysam
```
3. Samtools 1.3.1 or higher (www.htslib.org/)
```
sudo apt-get install samtools
```
4. BWA 0.7.15 or higher (bio-bwa.sourceforge.net/)
```
sudo apt-get install bwa
```
5. Install HMMER v3.1b2 and have it on the path (http://hmmer.org/)
```
sudo apt-get install hmmer
```
3. Set the ViFi directory and include the python source to your Python path
```
echo export VIFI_DIR=/path/to/ViFi >> ~/.bashrc
echo export PYTHONPATH=/path/to/ViFi:/path/to/ViFi/src:$PYTHONPATH >> ~/.bashrc
```
4. Download the data repositories:
While we include some annotations, we are unable to host some large files in the git repository. These may be downloaded from https://drive.google.com/open?id=0ByYcg0axX7udUDRxcTdZZkg0X1k. Thanks to Peter Ulz for noticing incorrect link earlier.
```
tar zxf data_repo.tar.gz
echo export AA_DATA_REPO=$PWD/data_repo >> ~/.bashrc
source ~/.bashrc
```
5. Download the HMM models:
We have pre-build HMM models for HPV and HBV. They can be downloaded from https://drive.google.com/open?id=0Bzp6XgpBhhghSTNMd3RWS2VsVXM.
```
unzip data.zip
echo export REFERENCE_REPO=$PWD/data >> ~/.bashrc
```
6. Build a BWA index on the reference sequences from human+viral sequences:
We show an example of building an index of human+viral sequences using Hg19 and **HPV** and **HBV** below. However
any reference organism+viral family could be used.
```
cat $AA_DATA_REPO//hg19/hg19full.fa $REFERENCE_REPO/hpv/hpv.unaligned.fas > $REFERENCE_REPO/hpv/hg19_hpv.fas
bwa index $REFERENCE_REPO/hpv/hg19_hpv.fas
Example:
cat $AA_DATA_REPO//hg19/hg19full.fa $REFERENCE_REPO/hbv/hbv.unaligned.fas > $REFERENCE_REPO/hbv/hg19_hbv.fas
bwa index $REFERENCE_REPO/hbv/hg19_hbv.fas
```
## Running ViFi (Depreciated)

If /home/input/ contains read1.fastq.gz and read2.fastq.gz, then
We show the most basic example of running ViFi below. This version assumes that the user has
followed all the previous steps. More advanced options, such as using a customized reference organism/viral
family is provided in the [Advanced Notes](#advanced_notes) section.
```
python run_vifi.py -f <input_R1.fq.gz> -r <input_R2.fq.gz> -o <output_dir>
```

sh docker_vifi.sh /home/input read1.fastq.gz read2.fastq.gz /home/output/ 2
Note that this version defaults to searching for **HPV**. To search for HBV, run the following command.
```
python run_vifi.py -f <input_R1.fq.gz> -r <input_R2.fq.gz> -o <output_dir> -v hbv
```

## References
1. Nguyen ND, Deshpande V, Luebeck J, Mischel PS, Bafna V (2018) ViFi: accurate detection of viral integration and mRNA fusion reveals indiscriminate and unregulated transcription in proximal genomic regions in cervical cancer. Nucleic Acids Res (April):1–17.
Expand Down
20 changes: 16 additions & 4 deletions setup_linux_mac.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,41 @@ cd ViFi
VIFI_DIR=`pwd`

#Get data repos
echo "Downloading the data_repo"
wget https://raw.githubusercontent.com/circulosmeos/gdown.pl/master/gdown.pl
perl gdown.pl "https://drive.google.com/open?id=0ByYcg0axX7udUDRxcTdZZkg0X1k" data_repo.tar.gz
tar zxf data_repo.tar.gz
rm data_repo.tar.gz
echo "Downloading the HMM models"
perl gdown.pl "https://drive.google.com/open?id=0Bzp6XgpBhhghSTNMd3RWS2VsVXM" data.zip
unzip data.zip
rm data.zip

#Set up environmental variables
echo "Set environmental variables"
echo export VIFI_DIR=$VIFI_DIR >> ~/.bashrc
echo export AA_DATA_REPO=$PWD/data_repo >> ~/.bashrc
echo export REFERENCE_REPO=$PWD/data >> ~/.bashrc

VIFI_DIR=$VIFI_DIR
AA_DATA_REPO=$PWD/data_repo
REFERENCE_REPO=$PWD/data

source ~/.bashrc

#Set up reference for alignment
cat $AA_DATA_REPO//hg19/hg19full.fa $REFERENCE_REPO/hpv/hpv.unaligned.fas > $REFERENCE_REPO/hpv/hg19_hpv.fas

#Pull the Docker file
echo "Getting the dockerized version of ViFi"
docker pull docker.io/namphuon/vifi

docker run -v $REFERENCE_REPO/hpv/:/home/hpv/ docker.io/namphuon/vifi bwa index /home/hpv/hg19_hpv.fa
#Set up reference for alignment
echo "Building the hg19+HPV reference"
cat $AA_DATA_REPO//hg19/hg19full.fa $REFERENCE_REPO/hpv/hpv.unaligned.fas > $REFERENCE_REPO/hpv/hg19_hpv.fas
docker run -v $REFERENCE_REPO/hpv/:/home/hpv/ docker.io/namphuon/vifi bwa index /home/hpv/hg19_hpv.fas

#Build reduced list of HMMs for testing
echo "Running test for ViFi"
ls $VIFI_DIR/data/hpv/hmms/hmmbuild.[0-9].hmm > $VIFI_DIR/data/hpv/hmms/hmms.txt
source ~/.bashrc

#Run ViFi under docker mode on test dataset on reduced HMM list set
python $VIFI_DIR/scripts/run_vifi.py --cpus 2 --hmm_list $VIFI_DIR/data/hpv/hmms/hmms.txt -f $VIFI_DIR/test/data/test_R1.fq.gz -r $VIFI_DIR/test/data/test_R2.fq.gz -o $VIFI_DIR/tmp/docker/ --docker

0 comments on commit 8b6770a

Please sign in to comment.