MERGE

MERGE represents a method that combines direct coupling analysis and machine learning techniques to predict a protein's fitness from sequence. It requires a binary parameter file outputted by plmc and variant-fitness pairs.

Usage

The most important steps for model construction are briefly described below. Step-by-step instructions are given here. To generate a model of the fitness landscape of a protein and explore it in silico, the following files are required:

protein sequence in fasta format
variant-fitness pairs in csv format

Generate a multiple sequence alignment (MSA) using jackhmmer

To generate a multiple sequence alignment, the target sequence must be provided in fasta format and the inclusion threshold (--incT) must be set.

jackhmmer [-options] <seqfile> <seqdb>

Post-process the MSA

In a next step, the MSA is being post-processed by

excluding all positions, where the wild type sequence has a gap,
excluding all positions that contain more than 30 % gaps,
excluding all sequences that contain more than 50 % gaps.

The script sto2a2m.py can be found here.

python sto2a2m.py -sto <stoFile>

Infer parameters for the Potts model using PLMC

Once the a2m file is generated, the parameters of the statitstical model are inferred.

plmc [options] alignmentfile

Construct and explore the model of the fitness landscape using MERGE

Finally, a model of the fitness landscape is generated. See the example for details on how to use MERGE.

Prerequisites

1. Get the UniRef100 database

Download the latest version of UniRef100 (this can take a while, large file > 100 GB)

wget https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref100/uniref100.fasta.gz

Unzip the file to get a fasta file

gzip -d uniref100.fasta.gz

For further information see uniprot help or here

2. Installing HMMER

Download the tarball

wget http://eddylab.org/software/hmmer/hmmer.tar.gz

Unpack the tarball

tar zxf hmmer.tar.gz

Enter the directory 'hmmer-3.4'

cd hmmer-3.4

Set the installaion path (adjust "/your/install/path" accordingly!)

./configure --prefix /your/install/path

Build HMMER

make

Run self tests (optional)

make check

Install programs and man pages

make install

Add executable to PATH for session (adjust "/your/install/path" accordingly!)

export PATH="/your/install/path/bin:$PATH"

or permanently (adjust "/your/install/path" accordingly!)

echo 'export PATH="/your/install/path/bin:$PATH"' >> ~/.bashrc

Exit the directory 'hmmer-3.4'

cd ..

For further information see hmmer documentation

3. Installing PLMC

Clone the plmc repository

git clone https://github.com/debbiemarkslab/plmc.git

Enter the directory 'plmc'

cd plmc

Build with GCC and OpenMP to enable multicore parallelism

make all-openmp

Add executable to PATH for session (adjust "/your/install/path" accordingly!)

export PATH="/your/install/path/bin:$PATH"

or permanently (adjust "/your/install/path" accordingly!)

echo 'export PATH="/your/install/path/bin:$PATH"' >> ~/.bashrc

Exit the directory 'plmc'

cd ..

For further information see plmc repository

4. MERGE

Clone the MERGE repository

git clone https://github.com/amillig/MERGE.git

Enter the MERGE directory

cd MERGE

Install the dependencies

pip install -r requirements.txt

Import MERGE as module in Python

import merge

References

“Combining evolutionary probability and machine learning enables data-driven protein engineering with minimized experimental effort” by Alexander-Maurice Illig, Niklas E. Siedhoff, Mehdi D. Davari*, and Ulrich Schwaneberg*

Author

MERGE was developed and written by Alexander-Maurice Illig at RWTH Aachen University.

Credits

MERGE uses binary parameter files that are generated with plmc written by John Ingraham.

License

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
datasets		datasets
example		example
merge		merge
scripts		scripts
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MERGE

Usage

Prerequisites

1. Get the UniRef100 database

2. Installing HMMER

3. Installing PLMC

4. MERGE

References

Author

Credits

License

About

Releases

Packages

Languages

License

amillig/MERGE

Folders and files

Latest commit

History

Repository files navigation

MERGE

Usage

Prerequisites

1. Get the UniRef100 database

2. Installing HMMER

3. Installing PLMC

4. MERGE

References

Author

Credits

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages