INTERCHANGE

INTERCHANGE for horIzoNtal TransfER CHAracterization in Non-assembled GEnome is a pipeline for the detection of horizontal tranfers in non-assembled genomes by characterization of reads from conserved regions between 2 species.

Requirements

Python3.8 with the following packages

# Command if you work with multiple version of Python installed in parallel
python3 -m pip install biopython # default Python 3
python3.8 -m pip install biopython # specifically Python 3.8
python3 -m pip install pandas # default Python 3
python3.8 -m pip install pandas # specifically Python 3.8
python3 -m pip install numpy # default Python 3
python3.8 -m pip install numpy # specifically Python 3.8

Pigz version 2.4
- A tool for gzip that exploits multiple processors and multiple cores to the hilt when compressing data
```
# Installation with Ubuntu/Debian
sudo apt-get install pigz
```

GenomeTools version 1.6.1

The GenomeTools genome analysis system is a free collection of bioinformatics tools

# Installation with Ubuntu/Debian
sudo apt-get install genometools

# download GenomeTools versio 1.6.1
wget http://genometools.org/pub/genometools-1.6.1.tar.gz
# decompress
tar zxvf genometools-1.6.1.tar.gz
cd genometools-1.6.1
# Install in current directory
sudo make install

PRINSEQ LITE version 0.20.4

PRINSEQ is a tool to preprocess genomic or metagenomic sequence data in FASTA or FASTQ format
The lite version is a standalone perl script (prinseq-lite.pl) that does not require any non-core perl modules for processing

# download PRINSEQ LITE version 0.20.4
wget https://sourceforge.net/projects/prinseq/files/standalone/prinseq-lite-0.20.4.tar.gz
# decompress
tar zxvf prinseq-lite-0.20.4.tar.gz
# make it executable
cd prinseq-lite-0.20.4
sudo chmod +x prinseq-lite.pl

SPAdes version 3.15.2

SPAdes is an assembly toolkit containing various assembly pipelines

# download SPAdes version 3.15.2
wget http://cab.spbu.ru/files/release3.15.2/SPAdes-3.15.2-Linux.tar.gz
# decompress
tar zxvf SPAdes-3.15.2-Linux.tar.gz

DIAMOND version 0.9.29

DIAMOND is a sequence aligner for protein and translated DNA searches with high performance analysis of big sequence data

# Installation with Ubuntu/Debian
sudo apt-get install diamond-aligner

# download DIAMOND version 0.9.29
wget http://github.com/bbuchfink/diamond/releases/download/v2.0.9/diamond-linux64.tar.gz
# decompress
tar zxvf diamond-linux64.tar.gz

ncbi blast+

A suite of command-line tools to run BLAST

# Installation with Ubuntu/Debian
sudo apt-get install ncbi-blast+

# download BLAST version 2.9.0
wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.9.0/ncbi-blast-2.9.0+-x64-linux.tar.gz
# decompress
tar -zxvf ncbi-blast-2.8.1+-x64-linux.tar.gz
# add BLAST location to system PATH
export PATH=$HOME/tools/BLAST/ncbi-blast-2.8.1+/bin:$PATH

minimap2

Minimap2 is a sequence alignment programm that aligns DNA or mRNA sequences against a large reference database.

# Installation with Ubuntu/Debian
sudo apt-get install minimap2

# download Minimap2
curl -L https://github.com/lh3/minimap2/releases/download/v2.20/minimap2-2.20_x64-linux.tar.bz2
# decompress
tar -jxvf minimap2-2.20_x64-linux/minimap2

Samtools Version: 1.10

Samtools is a suite of programs for interacting with high-throughput sequencing data

# Installation with Ubuntu/Debian
sudo apt-get install samtools

Or download Samtools here

cd samtools-1.x
./configure --prefix=/where/to/install
make
make install

The pipeline has not been tested with other versions of the above programs, but newer versions probably work by checking that the options used still exist

Hardware requirements: this pipeline is developed for Linux/Unix operating system.

With the test dataset, we used:
- x86-64 CPUs
- 32 Go of system memory
- >= 200 Go of free hard drive space (depending to the genomes size of the analyzed species)

Users' Guide

Installation

In bash compatible terminal:

# download INTERCHANGE version 1.0
wget https://github.com/emaubin/INTERCHANGE/archive/V.1.0.zip
# decompress
unzip  V.1.0.zip

Usage

You must start by filling in the dependencies_paths.txt with the paths to each tool and databases used as indicated in the file.

Then, run [step]_param.py scripts whose name start with numbers in the corresponding order. Adapting this pipeline to other datasets, hardware configuration, and automating all procedures require modifications to the code.

Firstly, before to run each step, you need to complete the input file consisting of a table with different informations of your input data as for example in fastq_tab.csv file. For the tag of each species, we recommend you to write a name without space or special characters like underscore or dash.

To know the arguments needed and options for each step you can use the help for each script as follows:

python3 /INTERCHANGE-V.1.0/scripts/1.Genome_format/format_param.py -h

# Help message for Step 1
"""
usage: python3 format_param.py -i -p

Script to prepare and format input data for INTERCHANGE

Positional arguments:
  -i TABLE             Input file containing Table of species.
  -p PATHS             File of tools paths.

Settings:
  -t THREAD            Number of CPU for gzip/gunzip. Default [2]

Output options:
  -o OUTPUT_DIRECTORY  Output directory for INTERCHANGE results. Default: /INTERCHANGE_results in current directory

Other:
  -h, --help           Show this help message and exit.
  -v, --version        Show program's version number and exit.
"""

Example

Here, an example of all command lines to run:

### Step 1
python3 ~/INTERCHANGE-V.1.0/scripts/1.Genome_format/format_param.py -p ~/INTERCHANGE-V.1.0/dependencies_paths.txt -i fastq.tab.csv -o ~/HT_test_pipeline

### Step 2
python3 ~/INTERCHANGE-V.1.0/scripts/2.Index/index_parameters.py -p ~/INTERCHANGE-V.1.0/dependencies_paths.txt -o ~/HT_test_pipeline

### Step 3
python3 ~/INTERCHANGE-V.1.0/scripts/3.Search_identical_kmers/identical_kmers_param.py -p ~/INTERCHANGE-V.1.0/dependencies_paths.txt -o ~/media/emilie/massane/HT_test_pipeline

### Step 4
python3 ~/INTERCHANGE-V.1.0/scripts/4.Assembly/assembly_param.py -p ~/INTERCHANGE-V.1.0/dependencies_paths.txt -o ~/HT_test_pipeline

### Step 5
python3 ~/INTERCHANGE-V.1.0/scripts/5.Annotation/annotation_param.py -p ~/INTERCHANGE-V.1.0/dependencies_paths.txt -o ~/HT_test_pipeline

### Step 6

python3 ~/INTERCHANGE-V.1.0/scripts/6.Homologous_scaffolds/homologous_scfd_param.py -p ~/INTERCHANGE-V.1.0/dependencies_paths.txt -o ~/HT_test_pipeline

### Step 7
python3 ~/INTERCHANGE-V.1.0/scripts/7.Annotation_table/annotation_table.py -i fastq.tab.csv -p ~/INTERCHANGE-V.1.0/dependencies_paths.txt -o ~/HT_test_pipeline

### Step 8
python3 ~/INTERCHANGE-V.1.0/scripts/8.Busco_genes/busco_identification_param.py -i fastq.tab_2.csv -p ~/INTERCHANGE-V.1.0/dependencies_paths.txt -o ~/HT_test_pipeline

### Step 9
python3 ~/INTERCHANGE-V.1.0/scripts/9_High_similarity/high_similarity_param.py -p ~/INTERCHANGE-V.1.0/dependencies_paths.txt -o ~/HT_test_pipeline

Output

INTERCHANGE produces results in an output directory named 'HS_candidates/', where you can find two tables (GENE_HSvalidation.txt and TE_HSvalidation.txt) and two fasta files (GENE_HSvalidation.fa and TE_HSvalidation.fa) containing scaffolds shared between the 2 species which pass the high similarity criterion.

Here, an example of table content:

Species1  Species2 GENE/TE_name  PID  ID_scaffold_sp1  ID_scaffold_sp2  High_similarity_value

Citation

If you use INTERCHANGE in your work, please cite our paper:

Aubin et al., 2023. Genome-wide analysis of horizontal transfer in non-model wild species from a natural ecosystem reveals new insights into genetic exchange in plants. PLoS Genet, 19(10):e1010964.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

INTERCHANGE

Table of Contents

Requirements

Users' Guide

Installation

Usage

Example

Output

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
Tools		Tools
scripts		scripts
README.md		README.md
dependencies_paths.txt		dependencies_paths.txt
fastq_tab.csv		fastq_tab.csv

emaubin/INTERCHANGE

Folders and files

Latest commit

History

Repository files navigation

INTERCHANGE

Table of Contents

Requirements

Users' Guide

Installation

Usage

Example

Output

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages