Skip to content

agitter/papers_for_protein_design_using_DL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

List of papers about Proteins Design using Deep Learning

This repository is inspired by the remarkable work of Kevin Kaichuang Yang and their outstanding project Machine-learning-for-proteins. We have established this repository to provide a specialized and focused platform for the field of Deep Learning for Protein Design, a rapidly advancing domain in computational biology.

Contributions and suggestions are warmly welcome! Community Values, Guiding Principles, and Commitments for the Responsible Development of AI for Protein Design: details

Papers last week, updated on 2024.07.20:



deep learning for protein design

0) Benchmarks and datasets
Sequence datasetsStructure datasetsPublic databaseSimilar listGuides
1) Reviews and surveys
De novo designAntibody designPeptide designBinder designEnzyme design
2) Model-based design
trRosetta-basedAlphaFold2-basedDMPfold2-basedCM-AlignMSA transformer-basedDeepAb-basedTRFold2-basedGPT-basedESM-basedSampling-algorithms
3) Function to Scaffold
GAN-basedVAE-basedDAE-basedMLP-basedDiffusion-basedRL-basedFlow-based
4) Scaffold to Sequence
ReviewMLP-basedVAE-basedLSTM-basedCNN-basedGNN-basedGAN-basedTransformer-basedResNet-basedDiffusion-basedBayesian methodFlow-based
5) Function to Sequence
CNN-basedVAE-basedGAN-basedTransformer-basedBayesian methodReinforcement LearningFlow-basedRNN-basedLSTM-basedAutoregressiveBoltzmann machineDiffusion-basedGNN-basedScore-based
6) Function to Structure
LSTM-basedDiffusion-basedRoseTTAFold-basedCNN-basedGNN-basedTransformer-basedMLP-basedFlow-based
7) Other
Effects of mutations & Fitness LandscapeProtein Language Model & Representation LearningMolecular Design Model


0. Benchmarks and datasets

0.1 Sequence Datasets

FLIP: Benchmark tasks in fitness landscape inference for proteins
Christian Dallago, Jody Mou, Kadina E Johnston, Bruce Wittmann, Nick Bhattacharya, Samuel Goldman, Ali Madani, Kevin K Yang
NeurIPS 2021 Datasets and Benchmarks Track/bioRxiv 2021websitecodeSupplementary

A Benchmark Framework for Evaluating Structure-to-Sequence Models for Protein Design
Jeffrey Chan, Seyone Chithrananda, David Brookes, Sam Sinai
Paper unavailable at Machine Learning in Structural Biology Workshop 2022

PDBench: Evaluating Computational Methods for Protein-Sequence Design
Leonardo V Castorina, Rokas Petrenas, Kartic Subr, Christopher W Wood
Bioinformatics, 2023;, btad027code

Benchmarking deep generative models for diverse antibody sequence design
Igor Melnyk, Payel Das, Vijil Chenthamarakshan, Aurelie Lozano
arXiv:2111.06801

The Protein Engineering Tournament: An Open Science Benchmark for Protein Modeling and Design
Chase Armer, Hassan Kane, Dana Cortade, Dave Estell, Adil Yusuf, Radhakrishna Sanka, Henning Redestig, TJ Brunette, Pete Kelly, Erika DeBenedictis
arXiv:2309.09955

Computational Scoring and Experimental Evaluation of Enzymes Generated by Neural Networks
Sean R.Johnson, Xiaozhi Fu, Sandra Viknander, Clara Goldin, Sarah Monaco, Aleksej Zelezniak, Kevin K. Yang
bioRxiv (2023)code

FLOP: Tasks for Fitness Landscapes Of Protein Wildtypes
Peter Mørch Groth, Richard Michael, Jesper Salomon, Pengfei Tian, Wouter Boomsma
bioRxiv 2023.06.21.545880code

ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction
Pascal Notin, Aaron W Kollasch, Daniel Ritter, Lood van Niekerk, Steffanie Paul, Hansen Spinner, Nathan Rollins, Ada Shaw, Ruben Weitzman, Jonathan Frazer, Mafalda Dias, Dinko Franceschi, Rose Orenbuch, Yarin Gal, Debora S Marks
bioRxiv 2023.12.07.570727code

0.2 Structure Datasets

AlphaDesign: A graph protein design method and benchmark on AlphaFoldDB
Zhangyang Gao, Cheng Tan, Stan Z. Li
arxiv (2022)

SidechainNet: An All-Atom Protein Structure Dataset for Machine Learning Jonathan E. King, David Ryan Koes
arxivgithub::sidechainnet

TDC maintains a resource list that currently contains 22 tasks (and its datasets) related to small molecules and macromolecules, including PPI, DDI and so on. MoleculeNet published a small molecule related benchmark four years ago.

In terms of datasets and benchmarks, protein design is far less mature than drug discovery (paperwithcode drug discovery benchmarks). (Maybe should add the evaluation of protein design for deep learning method (especially deep generative model))
Difficulties and opportunities always coexist. Happy to see the work of Christian Dallago, Jody Mou, Kadina E. Johnston, Bruce J. Wittmann, Nicholas Bhattacharya, Samuel Goldman, Ali Madani, Kevin K. Yang and Zhangyang Gao, Cheng Tan, Stan Z. Li.

Sampling of structure and sequence space of small protein folds
Thomas W. Linsky, Kyle Noble, Autumn R. Tobin, Rachel Crow, Lauren Carter, Jeffrey L. Urbauer, David Baker & Eva-Maria Strauch
Nat Commun 13, 7151 (2022)codeSupplementary

OpenProteinSet: Training data for structural biology at scale
Gustaf Ahdritz, Nazim Bouatta, Sachin Kadyan, Lukas Jarosch, Daniel Berenberg, Ian Fisk, Andrew M. Watkins, Stephen Ra, Richard Bonneau, Mohammed AlQuraishi arXiv:2308.05326OpenFold

ProteinInvBench: Benchmarking Protein Design on Diverse Tasks, Models, and Metrics
Zhangyang Gao, Cheng Tan, Yijie Zhang, Xingran Chen, Stan Z. Li
GitHub

PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design
Chuanrui Wang, Bozitao Zhong, Zuobai Zhang, Narendra Chaudhary, Sanchit Misra, Jian Tang
arXiv preprint arXiv:2312.00080 (2023)code

Scaffold-Lab: Critical Evaluation and Ranking of Protein Backbone Generation Methods in A Unified Framework
Zhuoqi Zheng, Bo Zhang, Bozitao Zhong, Kexin Liu, Jinyu Yu, Zhengxin Li, JunJie Zhu, Ting Wei, Hai-Feng Chen
bioRxiv 2024.02.10.579743codeSupplementary

0.3 Databases

A list of suggested protein databases, more lists at CNCB.

0.3.1 Sequence Database

  1. UniProt
  2. DisProt
  3. MobiDB
  4. Peptipedia

0.3.2 Structure Database

Database Description
PDB The Protein Data Bank (PDB) is a database of 3D structural data of large biological molecules, such as proteins and nucleic acids. These data are gathered using experimental methods such as X-ray crystallography, NMR spectroscopy, or cryo-electron microscopy.
AlphaFoldDB AlphaFoldDB is a database of protein structure predictions produced by DeepMind's AlphaFold system. It provides highly accurate predictions of protein 3D structures.
PDBbind PDBbind is a comprehensive collection of the binding data of all types of biomolecular complexes in the PDB database. It is primarily used for the development and validation of computational methods for predicting molecular interactions.
AB-Bind AB-Bind is a database for antibody binding affinity data. It offers a curated set of experimental binding data and corresponding antibody-protein complex structures.
AntigenDB AntigenDB is a manually curated database of experimentally verified antigens that includes detailed information about the antigen, the source organism, and the associated antibodies.
CAMEO CAMEO (Continuous Automated Model EvaluatiOn) is a project for the automated evaluation of methods predicting macromolecular structure. It continuously assesses the performance of automated protein structure prediction servers.
CAPRI The Critical Assessment of PRediction of Interactions (CAPRI) is a community-wide experiment to evaluate protein-protein interaction prediction methods.
PIFACE PIFACE is a web server for the prediction of protein-protein interactions. It identifies potential interaction interfaces on protein surfaces.
SAbDab The Structural Antibody Database (SAbDab) is an automatically updated resource for the structural information of antibodies from the PDB. It allows for easy access to curated, annotated, and classified antibody structures.
SKEMPI v2.0 SKEMPI 2.0 is a database of experimental measurements of the change in binding free energy caused by mutations in protein-protein complexes.
ProtCAD ProtCAD is a suite of tools for the design and engineering of novel protein structures, sequences, and functions. It allows users to build and manipulate complex protein structures, generate and evaluate sequence libraries, and simulate mutational effects. ProtCAD is a suite of tools for the design and engineering of novel protein structures, sequences, and functions. It allows users to build and manipulate complex protein structures, generate and evaluate sequence libraries, and simulate mutational effects.

0.4 Similar List

Some similar GitHub lists that include papers about protein design using deep learning:

  1. design_tools
  2. awesome-AI-based-protein-design
  3. ProteinStructureWithDL
  4. List of available bioinformatic tools and services

0.5 Guides

Guides/Tutorials for beginners on GitHub:

  1. how_to_create_a_protein
  2. protein-design-tutorials

1. Reviews

1.1 De novo protein design

Protein design: from computer models to artificial intelligence
Antonella Paladino, Filippo Marchetti, Silvia Rinaldi, Giorgio Colombo
Wiley Interdisciplinary Reviews: Computational Molecular Science 7.5 (2017): e1318

Advances in protein structure prediction and design
Kuhlman B., Bradley P.
Nat Rev Mol Cell Biol 20, 681-697 (2019)

Deep learning in protein structural modeling and design
Wenhao Gao, Sai Pooja Mahajan, Jeremias Sulam, and Jeffrey J. Gray
Patterns 1.9 • 2020

100th anniversary of macromolecular science viewpoint: Data-driven protein design
Ferguson, Andrew L., and Rama Ranganathan.
ACS Macro Letters 10.3 (2021)

Artificial intelligence in early drug discovery enabling precision medicine
Fabio Bonioloa, Emilio Dorigattia, Alexander J. Ohnmachta, Dieter Saurb, Benjamin Schuberta, and Michael P. Menden
Expert Opinion on Drug Discovery 16.9 (2021)

Protein design with deep learning
Defresne, Marianne, Sophie Barbe, and Thomas Schiex.
International Journal of Molecular Sciences 22.21 (2021)

Protein sequence design with deep generative models
Zachary Wu, Kadina E. Johnston, Frances H. Arnold, Kevin K. Yang
Current Opinion in Chemical Biology 65note • 2021

Structure-based protein design with deep learning
Ovchinnikov, Sergey, and Po-Ssu Huang.
Current opinion in chemical biology 65note • 2021

Deep learning techniques have significantly impacted protein structure prediction and protein design
Pearce, Robin, and Yang Zhang.
Current opinion in structural biology 68 (2021)

Recent advances in de novo protein design: Principles, methods, and applications
Pan, Xingjie, and Tanja Kortemme.
Journal of Biological Chemistry 296 (2021)

Protein design via deep learning
Wenze Ding, Kenta Nakai, Haipeng Gong
Briefings in Bioinformatics • 25 March 2022

Deep generative modeling for protein design
Strokach, Alexey, and Philip M. Kim.
Current Opinion in Structural Biology • 2022

Dawn of a new era for membrane protein design
Sowlati-Hashjin, Shahin, Aanshi Gandhi, and Michael Garton
BioDesign Research (2022)

Deep learning approaches for conformational flexibility and switching properties in protein design
Rudden, Lucas SP, Mahdi Hijazi, and Patrick Barth
Frontiers in Molecular Biosciences

Computational protein design with evolutionary-based and physics-inspired modeling: current and future synergies
Cyril Malbranke, David Bikard, Simona Cocco, Rémi Monasson, Jérôme Tubiana
arXiv:2208.13616v2

From sequence to function through structure: deep learning for protein design
Noelia Ferruz, Michael Heinzinger, Mehmet Akdel, Alexander Goncearenco, Luca Naef, Christian Dallago
bioRxiv 2022.08.31.505981/Computational and Structural Biotechnology Journal Volume 21, 2023Supplementaryaccompanying list

Computational protein design with data-driven approaches: Recent developments and perspectives
Liu H, Chen Q.
WIREs Comput Mol Sci. 2022. e1646

Understanding by design: Implementing deep learning from protein structure prediction to protein design
Gao, Yuanxu, Jiangshan Zhan, and Albert CH Yu.
MedComm-Future Medicine 1.2 (2022): e22

Diffusion Models in Bioinformatics: A New Wave of Deep Learning Revolution in Action
Zhiye Guo, Jian Liu, Yanli Wang, Mengrui Chen, Duolin Wang, Dong Xu, Jianlin Cheng
arXiv:2302.10907

Machine learning for evolutionary-based and physicsinspired protein design: Current and future synergies
Cyril Malbranke, David Bikard, Simona Cocco, Rémi Monasson, Jérôme Tubiana
Current Opinion in Structural Biology

De novo design of polyhedral protein assemblies: before and after the AI revolution
Bhoomika Basu Mallik, Jenna Stanislaw, Tharindu Madhusankha Alawathurage, and Alena Khmelinskaia
ChemBioChem 2023, e202300117

Research progress of artificial intelligence in protein design
CHEN Zhihang, JI Menglin, QI Yifei
Synthetic Biology Journal (2023)

A Survey on Graph Diffusion Models: Generative AI in Science for Molecule, Protein and Material
Mengchun Zhang, Maryam Qamar, Taegoo Kang, Yuna Jung, Chenshuang Zhang, Sung-Ho Bae, Chaoning Zhang
https://arxiv.org/abs/2304.01565

Exploring the Protein Sequence Space with Global Generative Models
Sergio Romero-Romero, Sebastian Lindner, Noelia Ferruz
arXiv:2305.01941

The Era of Machine Learning for Protein Design, Summarized in Four Key Methods
LucianoSphere
Towards Data Science

Is novelty predictable?
Clara Fannjiang, Jennifer Listgarten
arXiv:2306.00872

Computational protein design - where it goes?
Xu Binbin, Chen Yingjun and Xue Weiwei
Current Medicinal Chemistry 2023

How can the protein design community best support biologists who want to harness AI tools for protein structure prediction and design?
Birte Höcker, Peilong Lu, Anum Glasgow, Debora S. Marks Pranam Chatterjee, Joanna S.G. Slusky, Ora Schueler-Furman, Possu Huang
Cell Systems 14.8 (2023)

De novo 設計ナノポアの創製
新津藍
生物工学会誌 101.8 (2023)

Generative artificial intelligence for de novo protein design
Adam Winnifrith, Carlos Outeiral, Brian Hie
arXiv:2310.09685

Generative models for protein sequence modeling: recent advances and future directions
Mehrsa Mardikoraem, Zirui Wang, Nathaniel Pascual, Daniel Woldring
Briefings in Bioinformatics

A new age in protein design empowered by deep learning
Hamed Khakzad, Ilia Igashov, Arne Schneuing, Casper Goverde Michael Bronstein, Bruno Correia
Cell Systems, Volume 14, Issue 11

Deep learning for protein structure prediction and design—progress and applications
Jürgen Jänes and Pedro Beltrao
Mol Syst Biol(2024)

De novo protein design—From new structures to programmable functions
Kortemme, Tanja.
Cell 187.3 (2024)

Generative models for protein structures and sequences
Hsu, C., Fannjiang, C. & Listgarten, J.
Nat Biotechnol 42, 196–199 (2024)

What does it take for an ‘AlphaFold Moment’ in functional protein engineering and design?
Roberto A. Chica & Noelia Ferruz
Nat Biotechnol 42, 173–174 (2024)

Protein design: the experts speak
Doerr, A.
Nat Biotechnol 42, 175–178 (2024)

Machine learning for functional protein design
Pascal Notin, Nathan Rollins, Yarin Gal, Chris Sander & Debora Marks
Nat Biotechnol 42, 216–228 (2024)

Sparks of function by de novo protein design
Chu, A.E., Lu, T. & Huang, PS.
Nat Biotechnol 42, 203–215 (2024)poster

A Survey of Generative AI for De Novo Drug Design: New Frontiers in Molecule and Protein Generation
Xiangru Tang, Howard Dai, Elizabeth Knight, Fang Wu, Yunyang Li, Tianxiao Li, Mark Gerstein
arXiv:2402.08703

Security challenges by AI-assisted protein design Philip Hunter
EMBO Rep(2024)

Opportunities and challenges in design and optimization of protein function
Dina Listov, Casper A. Goverde, Bruno E. Correia & Sarel Jacob Fleishman
Nat Rev Mol Cell Biol (2024)

The State-of-the-Art Overview to Application of Deep Learning in Accurate Protein Design and Structure Prediction
Saber Saharkhiz, Mehrnaz Mostafavi, Amin Birashk, Shiva Karimian, Shayan Khalilollah, Sohrab Jaferian, Yalda Yazdani, Iraj Alipourfard, Yun Suk Huh, Marzieh Ramezani Farani & Reza Akhavan-Sigari
Top Curr Chem (Z) 382, 23 (2024)

Computational methods for protein design
Noelia Ferruz, Amelie Stein
Protein Engineering, Design and Selection, Volume 37, 2024

1.2 Antibody design

A review of deep learning methods for antibodies
Jordan Graves, Jacob Byerly, Eduardo Priego, Naren Makkapati , S. Vince Parish, Brenda Medellin and Monica Berrondo
Antibodies 9.2 (2020)

Progress and challenges for the machine learning-based design of fit-for-purpose monoclonal antibodies
Rahmad Akbar, Habib Bashour, Puneet Rawat, Philippe A. Robert, Eva Smorodina, Tudor-Stefan Cotet, Karine Flem-Karlsen, Robert Frank, Brij Bhushan Mehta, Mai Ha Vu, Talip Zengin, Jose Gutierrez-Marcos, Fridtjof Lund-Johansen, Jan Terje Andersen, and Victor Greif
Mabs. Vol. 14. No. 1. Taylor & Francis, 2022

Advances in computational structure-based antibody design
Hummer, Alissa M., Brennan Abanades, and Charlotte M. Deane.
Current Opinion in Structural Biology 74 (2022)

Computational and artificial intelligence-based methods for antibody development
Jisun Kim, Matthew McFee, Qiao Fang, Osama Abdin, Philip M. Kim
Trends in Pharmacological Sciences (2023)

Leveraging deep learning to improve vaccine design
Hederman AP, Ackerman ME
Trends in immunology (2023)

In Silico Approaches to Deliver Better Antibodies by Design: The Past, the Present and the Future
Andreas Evers, Shipra Malhotra, Vanita D. Sood
arXiv:2305.07488

AI Models for Protein Design are Driving Antibody Engineering
Michael Chungyoun, Jeffrey J. Gray
Current Opinion in Biomedical Engineering (2023): 100473

Computational Methods in Immunology and Vaccinology: Design and Development of Antibodies and Immunogens
Federica Guarra and Giorgio Colombo
Journal of Chemical Theory and Computation (2023)

Simplifying complex antibody engineering using machine learning
Makowski, Emily K., Hsin-Ting Chen, and Peter M. Tessier.
Cell Systems 14.8 (2023)/2022 AIChE Annual Meeting. AIChE, 2022.

AI driven B-cell Immunotherapy Design
Bruna Moreira da Silva, David B. Ascher, Nicholas Geard, Douglas E. V. Pires
arXiv:2309.01122

Best practices for machine learning in antibody discovery and development
Leonard Wossnig, Norbert Furtmann, Andrew Buchanan, Sandeep Kumar, Victor Greiff
arXiv:2312.08470/Drug Discovery Today (2024)

Next generation of multispecific antibody engineering
Daniel Keri, Matt Walker, Isha Singh, Kyle Nishikawa, Fernando Garces
Antibody Therapeutics (2023): tbad027

A primer on ML in antibody engineering
ABHISHAIKE MAHAJAN
Substack • blog

Antibody design using deep learning: from sequence and structure design to affinity maturation
Sara Joubbi, Alessio Micheli, Paolo Milazzo, Giuseppe Maccari, Giorgio Ciano, Dario Cardamone, Duccio Medini
Briefings in Bioinformatics, Volume 25, Issue 4, July 2024, bbae307

1.3 Peptide design

Deep generative models for peptide design
Wan, Fangping, Daphne Kontogiorgos-Heintz, and Cesar de la Fuente-Nunez
Digital Discovery (2022)

Design of protein segments and peptides for binding to protein targets
Gupta, Suchetana, Noora Azadvari, and Parisa Hosseinzadeh.
BioDesign Research 2022 (2022)

Revolutionizing peptide-based drug discovery: Advances in the post-AlphaFold era
Liwei Chang, Arup Mondal, Bhumika Singh, Yisel Martínez-Noa, Alberto Perez
Wiley Interdisciplinary Reviews: Computational Molecular Science

Peptide-based drug discovery through artificial intelligence: towards an autonomous design of therapeutic peptides
Montserrat Goles, Anamaría Daza, Gabriel Cabas-Mora, Lindybeth Sarmiento-Varón, Julieta Sepúlveda-Yañez, Hoda Anvari-Kazemabad, Mehdi D Davari, Roberto Uribe-Paredes, Álvaro Olivera-Nappa, Marcelo A Navarrete, David Medina-Ortiz
Briefings in Bioinformatics 25.4 (2024)

1.4 Binder design

Improving de novo Protein Binder Design with Deep Learning
Nathaniel Bennett, Brian Coventry, Inna Goreshnik, Buwei Huang, Aza Allen, Dionne Vafeados, Ying Po Peng, Justas Dauparas, Minkyung Baek, Lance Stewart, Frank DiMaio, Steven De Munck, Savvas Savvides, David Baker
bioRxiv 2022.06.15.495993/Nat Commun 14, 2625 (2023)codenews

1.5 Enzyme design

A review of enzyme design in catalytic stability by artificial intelligence
Yongfan Ming, Wenkang Wang, Rui Yin, Min Zeng, Li Tang, Shizhe Tang, Min Li
Briefings in Bioinformatics, 2023

Application of "foldability" in the intelligent of enzymes engineering and design: take AlphaFold2 for example
MENG Qiaozhen, GUO Fei
Synthetic Biology Journal (2023)

AlphaFold2 and Deep Learning for Elucidating Enzyme Conformational Flexibility and Its Application for Design
Casadevall, Guillem, Cristina Duran, and Sí­lvia Osuna.
JACS Au (2023)

Accelerating Biocatalysis Discovery with Machine Learning: A Paradigm Shift in Enzyme Engineering, Discovery, and Design
Braun Markus, Gruber Christian C, Krassnigg Andreas, Kummer Arkadij, Lutz Stefan, Oberdorfer Gustav, Siirola Elina, and Snajdrova Radka
ACS Catal. 2023

Building Enzymes through Design and Evolution
Hossack, Euan J., Florence J. Hardy, and Anthony P. Green.
ACS Catalysis 13.19 (2023)

Advances in generative modeling methods and datasets to design novel enzymes for renewable chemicals and fuels
Rana A Barghout, Zhiqing Xu, Siddharth Betala, Radhakrishnan Mahadevan
Current Opinion in Biotechnology, Volume 84, 2023

Opportunites and Challenges for Machine Learning-Assisted Enzyme Engineering
Jason Yang, Francesca-Zhoufan Li, Frances H. Arnold
ACS Central Science (2024)

Navigating the landscape of enzyme design: from molecular simulations to machine learning
Jiahui Zhoua, Meilan Huang Chemical Society Reviews (2024)

2. Model-based design

Invert trained models with optimize algorithms through iterations for sequence design. Inverted structure prediction models are known as Hallucination.

2.1 trRosetta-based

Design of proteins presenting discontinuous functional sites using deep learning
Doug Tischer, Sidney Lisanza, Jue Wang, Runze Dong, View ORCID ProfileIvan Anishchenko, Lukas F. Milles, Sergey Ovchinnikov, David Baker
bioRxiv (2020)

Fast differentiable DNA and protein sequence optimization for molecular design
Linder, Johannes, and Georg Seelig.
arXiv preprint arXiv:2005.11275 (2020)

De novo protein design by deep network hallucination
Ivan Anishchenko, Samuel J. Pellock, Tamuka M. Chidyausiku, Theresa A. Ramelot, Sergey Ovchinnikov, Jingzhou Hao, Khushboo Bafna, Christoffer Norn, Alex Kang, Asim K. Bera, Frank DiMaio, Lauren Carter, Cameron M. Chow, Gaetano T. Montelione & David Baker
Nature (2021)codetrRosetta

Protein sequence design by conformational landscape optimization
Christoffer Norn, Basile I. M. Wicky, David Juergens, and Sergey Ovchinnikov
Proceedings of the National Academy of Sciences 118.11 (2021)code

De novo design of small beta barrel proteins
David E. Kim, Davin R. Jensen, David Feldman, Doug Tischer and Ayesha Saleem, Cameron M. Chow, Xinting Li, Lauren Carter, Lukas Milles, Hannah Nguyen, Alex Kang, Asim K. Bera, Francis C. Peterson, Brian F. Volkman, Sergey Ovchinnikov, David Baker
PNAS(2023),e2207974120code

Exploring "dark matter" protein folds using deep learning
Zander Harteveld, Alexandra Van Hall-Beauvais, Irina Morozova, Joshua Southern, Casper Alexander Goverde, Sandrine Georgeon, Stephane Rosset, Andreas Loukas, Pierre Vandergheynst, Michael Bronstein, Bruno Correia
bioRxiv 2023.08.30.555621Suppplymentarycode

Carving out a Glycoside Hydrolase Active Site for Incorporation into a New Protein Scaffold Using Deep Network Hallucination
Anders Lønstrup Hansen, Frederik Friis Theisen, Ramon Crehuet, Enrique Marcos, Nushin Aghajari, and Martin Willemoës
ACS Synth. Biol. 2024

2.2 AlphaFold2-based

End-to-end learning of multiple sequence alignments with differentiable Smith-Waterman
Petti, Samantha, Bhattacharya, Nicholas, Rao, Roshan, Dauparas, Justas, Thomas, Neil, Zhou, Juannan, Rush, Alexander M, Koo, Peter K, Ovchinnikov, Sergey
bioRxiv (2021)/Bioinformatics, 2022;, btac724ColabDesign, SMURF, AF2 back propagationour notes1, notes2lecture1, lecture2Discord

AlphaDesign: A de novo protein design framework based on AlphaFold
Jendrusch, Michael, Jan O. Korbel, and S. Kashif Sadiq.
bioRxiv (2021)

Using AlphaFold for Rapid and Accurate Fixed Backbone Protein Design
Moffat, Lewis, Joe G. Greener, and David T. Jones.
bioRxiv (2021)

State-of-the-art estimation of protein model accuracy using AlphaFold
James P. Roney, Sergey Ovchinnikov
bioRxiv 2022.03.11.484043/Physical Review Letters 129.23 (2022)code

Solubility-aware protein binding peptide design using AlphaFold
Takatsugu Kosugi, Masahito Ohue
bioRxiv 2022.05.14.491955/Biomedicines 10.7 (2022)Supplemental Materialscode

Hallucinating protein assemblies
Basile I M Wicky, Lukas F Milles, Alexis Courbet, Robert J Ragotte, Justas Dauparas, Elias Kinfu, Sam Tipps, Ryan D Kibler, Minkyung Baek, Frank DiMaio, Xinting Li, Lauren Carter, Alex Kang, Hannah Nguyen, Asim K Bera, David Baker
bioRxiv 2022.06.09.493773/Science (2022)related slidesour notesnews

EvoBind: in silico directed evolution of peptide binders with AlphaFold
Patrick Bryant, Arne Elofsson
bioRxiv 2022.07.23.501214code

Hallucination of closed repeat proteins containing central pockets
Linna An, Derrick R Hicks, Dmitri Zorine, Justas Dauparas, Basile I. M. Wicky, Lukas F Milles, Alexis Courbet, Asim K. Bera, Hannah Nguyen, Alex Kang, Lauren Carter, David Baker
bioRxiv 2022.09.01.506251/Nat Struct Mol Biol 30, 1755-1760 (2023)Supplementary data

Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search
Patrick Bryant, Gabriele Pozzati, Wensi Zhu, Aditi Shenoy, Petras Kundrotas & Arne Elofsson
Nature communications 13.1 (2022)gitlba, githubSupplementary data1, Supplementary data2

De novo protein design by inversion of the AlphaFold structure prediction network
Casper Goverde, Benedict Wolf, Hamed Khakzad, Stephane Rosset, Bruno E Correia
bioRxiv 2022.12.13.520346codelecture1lecture2

Code of OpenComplex
Jingcheng, Yu and Zhaoming, Chen and Zhaoqun, Li and Mingliang, Zeng and Wenjun, Lin and He, Huang and Qiwei, Ye
code

Efficient and scalable de novo protein design using a relaxed sequence space
Christopher Josef Frank, Ali Khoshouei, Yosta de Stigter, Dominik Schiewitz, Shihao Feng, Sergey Ovchinnikov, Hendrik Dietz
bioRxiv 2023.02.24.529906code

Cyclic peptide structure prediction and design using AlphaFold
Stephen A. Rettie, Katelyn V. Campbell, Asim K. Bera, Alex Kang, Simon Kozlov, Joshmyn De La Cruz, Victor Adebomi, Guangfeng Zhou, Frank DiMaio, Sergey Ovchinnikov, Gaurav Bhardwaj
bioRxivCodeSupplementary

De novo design of luciferases using deep learning
Andy Hsien-Wei Yeh, Christoffer Norn, Yakov Kipnis, Doug Tischer, Samuel J. Pellock, Declan Evans, Pengchen Ma, Gyu Rie Lee, Jason Z. Zhang, Ivan Anishchenko, Brian Coventry, Longxing Cao, Justas Dauparas, Samer Halabiya, Michelle DeWitt, Lauren Carter, K. N. Houk & David Baker
NatureCodeSupplementary Materials

In silico evolution of protein binders with deep learning models for structure prediction and sequence design
Odessa J Goudy, Amrita Nallathambi, Tomoaki Kinjo, Nicholas Randolph, Brian Kuhlman
bioRxiv 2023.05.03.539278Supplementarycode

Computational design of soluble analogues of integral membrane protein structures
Casper Alexander Goverde, Martin Pacesa, Lars Jeremy Dornfeld, Sandrine Georgeon, Stephane Rosset, Justas Dauparas, Christian Shellhaas, Simon Kozlov, David Baker, Sergey Ovchinnikov, Bruno Correia
bioRxiv 2023.05.09.540044/Nature (2024)codeSupplementary

Antibody Complementarity-Determining Region Sequence Design using AlphaFold2 and Binding Affinity Prediction Model
Takafumi Ueki, Masahito Ohue
bioRxiv 2023.06.02.543382

Context-Dependent Design of Induced-fit Enzymes using Deep Learning Generates Well Expressed, Thermally Stable and Active Enzymes
Lior Zimmerman, Noga Alon, Itay Levin, Anna Koganitsky, Nufar Shpigel, Chen Brestel, Gideon David Lapidoth
bioRxiv 2023.07.27.550799Supplementary

Highly accurate and robust protein sequence design with CarbonDesign/Accurate and robust protein sequence design with CarbonDesign
Milong Ren, Chungong Yu, Dongbo Bu, Haicang Zhang
bioRxiv 2023.08.07.552204/Nat Mach Intell 6, 536–547 (2024)code

Design of Cyclic Peptides Targeting Protein-Protein Interactions using AlphaFold
Takatsugu Kosugi, Masahito Ohue
bioRxiv 2023.08.20.554056Supplementarycode

MetaPPI: In Silico Screen for Novel CRBN-based Substrates
neoxbio
websitenews • masif-based • commercial

AlphaFold Distillation for Protein Design
Anonymous
ICLR 2024 under reviewcode

High-throughput computational discovery of inhibitory protein fragments with AlphaFold
Andrew Savinov, Sebastian Swanson, Amy E. Keating, Gene-Wei Li
bioRxiv 2023.12.19.572389code

An integrative approach to protein sequence design through multiobjective optimization
Lu Hong, Tanja Kortemme
bioRxiv 2024.03.01.582670/PLOS Computational Biology 20(7)codeSupplementary

Protein Design Using Structure-Prediction Networks: AlphaFold and RoseTTAFold as Protein Structure Foundation Models
Jue Wang, Joseph L. Watson and Sidney L. Lisanza
Cold Spring Harbor Perspectives in Biology(2024)

Context-dependent design of induced-fit enzymes using deep learning generates well-expressed, thermally stable and active enzymes
Lior Zimmerman, Noga Alon, Itay Levin, and Gideon D. Lapidoth
Proceedings of the National Academy of Sciences 121.11(2024)

Design of Repeat Alpha-Beta Proteins with Capping Helices
Dmitri Zorine, David Baker
bioRxiv 2024.06.15.590358code

Design of linear and cyclic peptide binders of different lengths only from a protein target sequence
Qiuzhen Li, Efstathios Nikolaos Vlachos, Patrick Bryant
bioRxiv 2024.06.20.599739codeSupplementary

2.3 DMPfold2-based

Design in the DARK: Learning Deep Generative Models for De Novo Protein Design
Moffat, Lewis, Shaun M. Kandathil, and David T. Jones.
bioRxiv (2022)DMPfold2

2.4 CM-Align

AutoFoldFinder: An Automated Adaptive Optimization Toolkit for De Novo Protein Fold Design
Shuhao Zhang, Youjun Xu, Jianfeng Pei, Luhua Lai
NeurIPS 2021

2.5 MSA-transformer-based

Protein language models trained on multiple sequence alignments learn phylogenetic relationships
Damiano Sgarbossa, Umberto Lupo, Anne-Florence Bitbol
arXiv preprint arXiv:2203.15465 (2022)/bioRxiv 2022.04.14.488405

EvoOpt: an MSA-guided, fully unsupervised sequence optimization pipeline for protein design
Hideki Yamaguchi, Yutaka Saito
NeurIPS 2022

Generative power of a protein language model trained on multiple sequence alignments
Sgarbossa, Damiano, Umberto Lupo, and Anne-Florence Bitbol
Elife 12 (2023): e79854code

2.6 DeepAb-based

Towards deep learning models for target-specific antibody design
Sai Pooja Mahajan, Jeffrey Ruffolo, Rahel Frick, Jeffrey J. Gray
Biophysical Journal 121.3 (2022)DeepAblecture

Hallucinating structure-conditioned antibody libraries for target-specific binders
Sai Pooja Mahajan, Jeffrey A Ruffolo, Rahel Frick, Jeffrey J. Gray
bioRxiv 2022.06.06.494991/Front. Immunol. 13:999034Supplementarycode

2.7 TRFold2-based

News of TRDesign
TIANRANG XLab
paper unavailable • slideswebsite • commercial • news

2.8 GPT-based

Multi-segment preserving sampling for deep manifold sampler
Daniel Berenberg, Jae Hyeon Lee, Simon Kelow, Ji Won Park, Andrew Watkins, Vladimir Gligorijević, Richard Bonneau, Stephen Ra, Kyunghyun Cho
arXiv preprint arXiv:2205.04259 (2022)

Preference optimization of protein language models as a multi-objective binder design paradigm
Pouria Mistani, Venkatesh Mysore
arXiv:2403.04187

HMAMP: Hypervolume-Driven Multi-Objective Antimicrobial Peptides Design
Li Wang, Yiping Li, Xiangzheng Fu, Xiucai Ye, Junfeng Shi, Gary G. Yen, Xiangxiang Zeng
arXiv:2405.00753

2.9 ESM-based

Generating novel protein sequences using Gibbs sampling of masked language models
Sean R. Johnson, Sarah Monaco, Kenneth Massie, Zaid Syed
bioRxiv 2021.01.26.428322code

A high-level programming language for generative protein design
Brian Hie, Salvatore Candido, Zeming Lin, Ori Kabeli, Roshan Rao, Nikita Smetanin, Tom Sercu, Alexander Rives
bioRxiv 2022.12.21.521526

Language models generalize beyond natural proteins
Robert Verkuil, Ori Kabeli, Yilun Du, Basile IM Wicky, Lukas F Milles, Justas Dauparas, David Baker, Sergey Ovchinnikov, Tom Sercu, Alexander Rives
bioRxiv 2022.12.21.521521

ESMFold Hallucinates Native-Like Protein Sequences
Jeliazko R Jeliazkov, Diego del Alamo, Joel D Karpiak
bioRxiv 2023.05.23.541774

Protein Language Model Supervised Precise and Efficient Protein Backbone Design Method
Bo Zhang, Kexin Liu, Zhuoqi Zheng, Yunfeiyang Liu, Junxi Mu, Ting Wei, Hai-Feng Chen
bioRxiv 2023.10.26.564121codeSupplementary

Unexplored regions of the protein sequence-structure map revealed at scale by a library of foldtuned language models
Arjuna M. Subramanian, Matt Thomson
bioRxiv 2023.12.22.573145

Computational scoring and experimental evaluation of enzymes generated by neural networks
Sean R. Johnson, Xiaozhi Fu, Sandra Viknander, Clara Goldin, Sarah Monaco, Aleksej Zelezniak & Kevin K. Yang
Nature Biotechnology (2024)code

2.10 Sampling-algorithms

AdaLead: A simple and robust adaptive greedy search algorithm for sequence design
Sam Sinai, Richard Wang, Alexander Whatley, Stewart Slocum, Elina Locane, Eric D. Kelsic
arXiv preprint arXiv:2010.02141 (2020)code

Autofocused oracles for model-based design
Fannjiang, Clara, and Jennifer Listgarten.
Advances in Neural Information Processing Systems 33 (2020)

An Efficient MCMC Approach to Energy Function Optimization in Protein Structure Prediction
Lakshmi A. Ghantasala, Risi Jaiswal, Supriyo Datta
arXiv:2211.03193

Plug & Play Directed Evolution of Proteins with Gradient-based Discrete MCMC
Patrick Emami, Aidan Perreault, Jeffrey Law, David Biagioni, Peter St. Joh
NeurIPS 2022/arXiv:2212.09925

Importance Weighted Expectation-Maximization for Protein Sequence Design
Zhenqiao Song, Lei Li
arXiv:2305.00386Supplementary

Simultaneous enhancement of multiple functional properties using evolution-informed protein design
Benjamin Fram, Ian Truebridge, Yang Su, Adam J. Riesselman, John B. Ingraham, Alessandro Passera, Eve Napier, Nicole N. Thadani, Samuel Lim, Kristen Roberts, Gurleen Kaur, Michael Stiffler, Debora S. Marks, Christopher D. Bahl, Amir R. Khan, Chris Sander, Nicholas P. Gauthier
bioRxiv (2023): 2023-05

Optimizing protein fitness using Gibbs sampling with Graph-based Smoothing
Andrew Kirjner, Jason Yim, Raman Samusevich, Tommi Jaakkola, Regina Barzilay, Ila Fiete
arXiv:2307.00494code

3. Function to Scaffold

These models design backbone/scaffold/template in Cartesian coordinates, contact maps, distance maps and φ & ψ angles.

3.1 GAN-based

Generative modeling for protein structures
Anand, Namrata, and Possu Huang.
NeurIPS 2018

Fully differentiable full-atom protein backbone generation
Anand Namrata, Raphael Eguchi, and Po-Ssu Huang.
OpenReview ICLR 2019 workshop DeepGenStruct • without code

RamaNet: Computational de novo helical protein backbone design using a long short-term memory generative neural network
Sabban, Sari, and Mikhail Markovsky.
F1000Research 9 (2020)code • pyRosetta • tensorflow • maximizaing the fluorescence of a protein

A Generative Model for Creating Path Delineated Helical Proteins
Nicholas B. Woodall, Ryan Kibler, Basile Wicky, Brian Coventry
bioRxiv 2023.05.24.542095code

3.2 VAE-based

Conditioning by adaptive sampling for robust design
Brookes, David, Hahnbeom Park, and Jennifer Listgarten.
International conference on machine learning. PMLR, 2019 • without code

IG-VAE: generative modeling of immunoglobulin proteins by direct 3D coordinate generation
Raphael R. Eguchi, Christian A. Choe, Po-Ssu Huang
Biorxiv (2020) • without code •

Generating tertiary protein structures via an interpretative variational autoencoder
Xiaojie Guo, Yuanqi Du, Sivani Tadepalli, Liang Zhao, Amarda Shehu
arXiv preprint arXiv:2004.07119 (2020) • code not available

Deep sharpening of topological features for de novo protein design
Zander Harteveld, Joshua Southern, Michaël Defferrard, Andreas Loukas, Pierre Vandergheynst, Micheal Bronstein, Bruno Correia
ICLR2022 Machine Learning for Drug Discovery. 2022 • code not available

End-to-End deep structure generative model for protein design
Boqiao Lai, matthew McPartlon, Jinbo Xu
bioRxiv 2022.07.09.499440

Deep Generative Design of Epitope-Specific Binding Proteins by Latent Conformation Optimization
Raphael R Eguchi, Christian A Choe, Udit Parekh, Irene S Khalek, Michael D Ward, Neha Vithani, Gregory R Bowman, Joseph G Jardine, Possu Huang
bioRxiv 2022.12.22.521698

3.3 DAE-based

Function-guided protein design by deep manifold sampling
Vladimir Gligorijevic, Stephen Ra, Daniel Berenberg, Richard Bonneau, Kyunghyun Cho
NeurIPS 2021 • without code

3.4 MLP-based

A backbone-centred energy function of neural networks for protein design
Bin Huang, Yang Xu, Xiuhong Hu, Yongrui Liu, Shanhui Liao, Jiahai Zhang, Chengdong Huang, Jingjun Hong, Quan Chen & Haiyan Liu
Nature (2022)code

De novo Design of Cavity-Containing Proteins with a Backbone-Centered Neural Network Energy Function
Yang Xu, Xiuhong Hu, Chenchen Wang, Yongrui Liu, Quan Chen Haiyan Liu
Structure (2024)

3.5 Diffusion-based

Diffusion probabilistic modeling of protein backbones in 3D for the motif-scaffolding problem
Brian L. Trippe, Jason Yim, Doug Tischer, Tamara Broderick, David Baker, Regina Barzilay, Tommi Jaakkola
arXiv:2206.04119/NeurIPS 2022/ICLR 2023posterSupplementarycode

ProteinSGM: Score-based generative modeling for de novo protein design
Jin Sub Lee, Philip M Kim
bioRxiv 2022.07.13.499967/Nat Comput Sci (2023)code

Protein structure generation via folding diffusion
Kevin E. Wu, Kevin K. Yang, Rianne van den Berg, James Y. Zou, Alex X. Lu, Ava P. Amini
arXiv:2209.15611/Nat Commun 15, 1059 (2024)code

Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds
Yeqing Lin, Mohammed AlQuraishi
arXiv:2301.12485v3codenews

SE(3) diffusion model with application to protein backbone generation
Jason Yim, Brian L. Trippe, Valentin De Bortoli, Emile Mathieu, Arnaud Doucet, Regina Barzilay, Tommi Jaakkola
arXiv:2302.02277/ICLR 2023codeSupplementary

A Latent Diffusion Model for Protein Structure Generation
Cong Fu, Keqiang Yan, Limei Wang, Wing Yee Au, Michael McThrow, Tao Komikado, Koji Maruhashi, Kanji Uchino, Xiaoning Qian, Shuiwang Ji
arXiv:2305.04120

Practical and Asymptotically Exact Conditional Sampling in Diffusion Models
Luhuan Wu, Brian L. Trippe, Christian A. Naesseth, David M. Blei, John P. Cunningham
arXiv:2306.17775code

Dynamics-Informed Protein Design with Structure Conditioning
Simon V. Mathis, Urszula Julia Komorowska, Mateja Jamnik, Pietro Lió
WCBICML2023/ICLR 2024 under review

ForceGen: End-to-end de novo protein generation based on nonlinear mechanical unfolding responses using a protein language diffusion model
Bo Ni and David L. Kaplan and M. Buehler
arXiv:2310.10605/Science Advances 10.6 (2024)Supplementarycode

DiffSDS: A geometric sequence diffusion model for protein backbone inpainting
Anonymous
ICLR 2024 under review/arXiv:2301.09642

A framework for conditional diffusion modelling with applications in motif scaffolding for protein design
Kieran Didi, Francisco Vargas, Simon V Mathis, Vincent Dutordoir, Emile Mathieu, Urszula J Komorowska, Pietro Lio arXiv:2312.09236

TopoDiff: Improving Protein Backbone Generation with Topology-aware Latent Encoding
Yuyang Zhang, Zihui (Zinnia) Ma, Haipeng Gong
bioRxiv 2023.12.13.571602

Improved motif-scaffolding with SE(3) flow matching
Jason Yim, Andrew Campbell, Emile Mathieu, Andrew Y. K. Foong, Michael Gastegger, José Jiménez-Luna, Sarah Lewis, Victor Garcia Satorras, Bastiaan S. Veeling, Frank Noé, Regina Barzilay, Tommi S. Jaakkola
arXiv:2401.04082code

DiffTopo: Fold exploration using coarse grained protein topology representations
Yangyang Miao, Bruno Correia
bioRxiv 2024.02.01.578456/ICLR 2024

Diffusion models in protein structure and docking
Jason Yim, Hannes Stärk, Gabriele Corso, Bowen Jing, Regina Barzilay, Tommi S. Jaakkola
Wiley Interdisciplinary Reviews: Computational Molecular Science 14.2 (2024) • review

De novo antibody design with SE(3) diffusion
Daniel Cutting, Frédéric A. Dreyer, David Errington, Constantin Schneider, Charlotte M. Deane
arXiv:2405.07622

Out of Many, One: Designing and Scaffolding Proteins at the Scale of the Structural Universe with Genie 2
Yeqing Lin, Minji Lee, Zhao Zhang, Mohammed AlQuraishi
arXiv:2405.15489codenews

Diffuse StructGen-1 (DSG-1)
the Diffuse team
technical appendix • commercial

3.6 RL-based

Top-down design of protein nanomaterials with reinforcement learning
Isaac D Lutz, Shunzhi Wang, Christoffer Norn, Andrew J Borst, Yan Ting Zhao, Annie Dosey, Longxing Cao, Zhe Li, Minkyung Baek, Neil P King, Hannele Ruohola-Baker, David Baker
bioRxiv 2022.09.25.509419/Science380, 266-273(2023)code,code2

Model-based reinforcement learning for protein backbone design
Frederic Renard, Cyprien Courtot, Alfredo Reichlin, Oliver Bent
arXiv:2405.01983

3.7 Flow-based

SE(3)-Stochastic Flow Matching for Protein Backbone Generation
Avishek Joey Bose, Tara Akhound-Sadegh, Kilian Fatras, Guillaume Huguet, Jarrid Rector-Brooks, Cheng-Hao Liu, Andrei Cristian Nica, Maksym Korablyov, Michael Bronstein, Alexander Tong
arXiv:2310.02391/ICLR 2024

Fast protein backbone generation with SE(3) flow matching
Jason Yim, Andrew Campbell, Andrew Y. K. Foong, Michael Gastegger, José Jiménez-Luna, Sarah Lewis, Victor Garcia Satorras, Bastiaan S. Veeling, Regina Barzilay, Tommi Jaakkola, Frank Noé
arXiv:2310.05297code

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation
Guillaume Huguet, James Vuckovic, Kilian Fatras, Eric Thibodeau-Laufer, Pablo Lemos, Riashat Islam, Cheng-Hao Liu, Jarrid Rector-Brooks, Tara Akhound-Sadegh, Michael Bronstein, Alexander Tong, Avishek Joey Bose
arXiv:2405.20313website

4.Scaffold to Sequence

Identify amino sequence from given backbone/scaffold/template constrains: torsion angles(φ & ψ), backbone angles(θ and τ), backbone dihedrals (φ, ψ & ω), backbone atoms (Cα, N, C, & O), Cα − Cα distance, unit direction vectors of Cα−Cα, Cα−N & Cα−C, etc(aka. inverse folding). Referred from here. Energy-based models are also inculded for task of rotamer conformation(χ angles or atom coordinates) recovery.

4.0 Review

Protein sequence design on given backbones with deep learning
Yufeng Liu, Haiyan Liu
Protein Engineering, Design and Selection, 2023

Multi-indicator comparative evaluation for deep Learning-Based protein sequence design methods
Jinyu Yu, Junxi Mu, Ting Wei, Hai-Feng Chen
Bioinformatics, 2024;, btae037

Generative AI for Controllable Protein Sequence Design: A Survey
Yiheng Zhu, Zitai Kong, Jialu Wu, Weize Liu, Yuqiang Han, Mingze Yin, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou
arXiv:2402.10516

4.1 MLP-based

3D representations of amino acids-applications to protein sequence comparison and classification
Li, Jie, and Patrice Koehl.
Computational and structural biotechnology journal 11.18 (2014) • 2014

Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles
Zhixiu Li, Yuedong Yang, Eshel Faraggi, Jian Zhan, Yaoqi Zhou
Proteins: Structure, Function, and Bioinformatics 82.10 (2014) • code unavailable

SPIN2: Predicting sequence profiles from protein structures using deep neural networks
James O'Connell, Zhixiu Li, Jack Hanson, Rhys Heffernan, James Lyons, Kuldip Paliwal, Abdollah Dehzangi, Yuedong Yang, Yaoqi Zhou
Proteins: Structure, Function, and Bioinformatics 86.6 (2018) • code unavailable

Computational protein design with deep learning neural networks
Jingxue Wang, Huali Cao, John Z. H. Zhang & Yifei Qi
Scientific reports 8.1 (2018) • code unavailable

Ligand-aware protein sequence design using protein self contacts
Jody Mou, Benjamin Fry, Chun-Chen Yao, Nicholas Polizzi
NeurIPS 2022

SeqPredNN: a neural network that generates protein sequences that fold into specified tertiary structures
Lategan, F. Adriaan, Caroline Schreiber, and Hugh G. Patterton.
BMC bioinformatics 24.1 (2023)code

4.2 VAE-based

Design of metalloproteins and novel protein folds using variational autoencoders
Greener, Joe G., Lewis Moffat, and David T. Jones.
Scientific reports 8.1 (2018)

4.3 LSTM-based

To improve protein sequence profile prediction through image captioning on pairwise residue distance map
Sheng Chen, Zhe Sun, Lihua Lin, Zifeng Liu, Xun Liu, Yutian Chong, Yutong Lu, Huiying Zhao, and Yuedong Yang
Journal of chemical information and modeling 60.1 (2019)SPROF

Deep learning of Protein Sequence Design of Protein-protein Interactions
Syrlybaeva, Raulia, and Eva-Maria Strauch.
bioRxiv (2022)/Bioinformatics, 2022;, btac733Supplementarycode

4.4 CNN-based

A structure-based deep learning framework for protein engineering
Raghav Shroff, Austin W. Cole, Barrett R. Morrow, Daniel J. Diaz, Isaac Donnell, Jimmy Gollihar, Andrew D. Ellington, Ross Thyer
bioRxiv (2019)

ProDCoNN: Protein design using a convolutional neural network
Yuan Zhang, Yang Chen, Chenran Wang, Chun-Chao Lo, Xiuwen Liu, Wei Wu, Jinfeng Zhang
Proteins: Structure, Function, and Bioinformatics 88.7 (2020) • code unavailable

Protein sequence design with a learned potential
Namrata Anand, Raphael Eguchi, Irimpan I. Mathews, Carla P. Perez, Alexander Derry, Russ B. Altman & Po-Ssu Huang
Nacture Communications (2022)code

TIMED-Design: Flexible and Accessible Protein Sequence Design with Convolutional Neural Networks
Leonardo V Castorina, Suleyman Mert Ünal, Kartic Subr, Christopher W Wood
Protein Engineering, Design and Selection, 2024 • codewebsite

Biosensor and machine learning-aided engineering of an amaryllidaceae enzyme
Simon d’Oelsnitz, Daniel J. Diaz, Wantae Kim, Daniel J. Acosta, Tyler L. Dangerfield, Mason W. Schechter, Matthew B. Minus, James R. Howard, Hannah Do, James M. Loy, Hal S. Alper, Y. Jessie Zhang & Andrew D. Ellington
Nature Communications 15.1 (2024)code1, code2

4.5 GNN-based

Learning from protein structure with geometric vector perceptrons
Bowen Jing, Stephan Eismann, Patricia Suriana, Raphael J.L. Townshend, Ron Dror
arXiv preprint arXiv:2009.01411 (2020)/ICLR(2021)GVP

Fast and flexible protein design using deep graph neural networks
Alexey Strokach, David Becerra, Carles Corbi-Verge, Albert Perez-Riba, Philip M. Kim
Cell Systems (2020)code::ProteinSolver

Mimetic Neural Networks: A unified framework for Protein Design and Folding
Moshe Eliasof, Tue Boesen, Eldad Haber, Chen Keasar, Eran Treister
arXiv:2102.03881/Front. Bioinform. 2:715006

TERMinator: A Neural Framework for Structure-Based Protein Design using Tertiary Repeating Motifs
Alex J. Li, Vikram Sundar, Gevorg Grigoryan, Amy E. Keating
NeurIPS 2021 / arXiv (2022)

A neural network model for prediction of amino-acid probability from a protein backbone structure
Shintaro Minami, Koya Sakuma, Naoya Kobayashi
Unpublished yet (June 2021)• GCNdesgin

XENet: Using a new graph convolution to accelerate the timeline for protein design on quantum computers
Jack B Maguire, Daniele Grattarola, Vikram Khipple Mulligan, Eugene Klyshko, Hans Melo
PLoS computational biology 17.9 (2021)

AlphaDesign: A graph protein design method and benchmark on AlphaFoldDB
Gao, Zhangyang, Cheng Tan, and Stan Li.
arXiv preprint arXiv:2202.01079 (2022)code

Generative De Novo Protein Design with Global Context
Cheng Tan, Zhangyao Gao, Jun Xia and Stan Z. Li
arXiv • Apr 2022 • code

Masked inverse folding with sequence transfer for protein representation learning
Kevin K Yang, Hugh Yeh, Niccolò Zanichelli
bioRxiv 2022.05.25.493516/Protein Engineering, Design and Selection 36 (2023)codemodel

Robust deep learning based protein sequence design using ProteinMPNN
Justas Dauparas, Ivan Anishchenko, Nathaniel Bennett, Hua Bai, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Alexis Courbet, Robbert J. de Haas, Neville Bethel, Philip J. Y. Leung, Timothy F. Huddy, Sam Pellock, Doug Tischer, Frederick Chan, Brian Koepnick, Hannah Nguyen, Alex Kang, Banumathi Sankaran, Asim Bera, Neil P. King, David Baker
bioRxiv 2022.06.03.494563/Science (2022)codehugging facelecturecolab(in_jax)ProteinMPNN+ESMFold

Antibody-Antigen Docking and Design via Hierarchical Equivariant Refinement
Jin, Wengong, Regina Barzilay, and Tommi Jaakkola.
arXiv preprint arXiv:2207.06616 (2022)/International Conference on Machine Learning. PMLR, 2022codeposter

Neural Network-Derived Potts Models for Structure-Based Protein Design using Backbone Atomic Coordinates and Tertiary Motifs
Alex J. Li, Mindren Lu, Israel Desta, Vikram Sundar, Gevorg Grigoryan, and Amy E. Keating
bioRxiv 2022.08.02.501736/Protein Science, 32(2)

SE(3) Equivalent Graph Attention Network as an Energy-Based Model for Protein Side Chain Conformation
Deqin Liu, Sheng Chen, Shuangjia Zheng, Sen Zhang, Yuedong Yang
bioRxiv 2022.09.05.506704code

PiFold: Toward effective and efficient protein inverse folding
Zhangyang Gao, Cheng Tan, Stan Z. Li
arXiv:2209.12643v2/ICLR 2023github

Protein Sequence Design by Entropy-based Iterative Refinement
Xinyi Zhou, Guangyong Chen, Junjie Ye, Ercheng Wang, Jun Zhang, Cong Mao, Zhanwei Li, Jianye Hao, Xingxu Huang, Jin Tang, Pheng Ann Heng
bioRxiv 2023.02.04.527099

Lightweight Contrastive Protein Structure-Sequence Transformation
Jiangbin Zheng, Ge Wang, Yufei Huang, Bozhen Hu, Siyuan Li, Cheng Tan, Xinwen Fan, Stan Z. Li
arXiv:2303.11783

Modeling Protein Structure Using Geometric Vector Field Networks
Weian Mao, Muzhi Zhu, Hao Chen, Chunhua Shen
bioRxiv 2023.05.07.539736

Knowledge-Design: Pushing the Limit of Protein Deign via Knowledge Refinement
Zhangyang Gao, Cheng Tan, Stan Z. Li
arXiv:2305.15151/ICLR under reviewcode

SPIN-CGNN: Improved fixed backbone protein design with contact map-based graph construction and contact graph neural network
Xing Zhang, Hongmei Yin, Fei Ling, Jian Zhan, Yaoqi Zhou
bioRxiv 2023.07.07.548080/PLOS Computational Biologycode

ZetaDesign: an end-to-end deep learning method for protein sequence design and side-chain packing
Junyu Yan and others
Briefings in Bioinformatics, 2023code

Contextual protein encodings from equivariant graph transformers
Sai Pooja Mahajan, Jeffrey A. Ruffolo, Jeffrey J. Gray
bioRxiv 2023.07.15.549154code

Robust Design of Effective Allosteric Activators for Rsp5 E3 Ligase Using the Machine Learning Tool ProteinMPNN
Hsi-Wen Kao, Wei-Lin Lu, Meng-Ru Ho, Yu-Fong Lin, Yun-Jung Hsieh, Tzu-Ping Ko, Shang-Te Danny Hsu, and Kuen-Phon Wu
ACS Synthetic Biology (2023)Supplementary

Rapid and automated design of two-component protein nanomaterials using ProteinMPNN
Robbert J. de Haas, Natalie Brunette, Alex Goodson, Justas Dauparas, Sue Y. Yi, Erin C. Yang, Quinton Dowling, Hannah Nguyen, Alex Kang, Asim K. Bera, Banumathi Sankaran, Renko de Vries, David Baker, Neil P. King
bioRxiv 2023.08.04.551935/Proceedings of the National Academy of Sciences 121.(13) Supplementarydata

Rationally seeded computational protein design
Katherine I. Albanese, Rokas Petrenas, Fabio Pirro, Elise A. Naudin, Ufuk Borucu, William M. Dawson, D. Arne Scott, Graham J. Leggett, Orion D. Weiner, Thomas A. A. Oliver, Derek N. Woolfson
bioRxiv 2023.08.25.554789code

Computational design of sequence-specific DNA-binding proteins
Cameron J Glasscock, Robert Pecoraro, Ryan McHugh, Lindsey A. Doyle, Wei Chen, Olivier Boivin, Beau Lonnquist, Emily Na, Yuliya Politanska, Hugh K Haddox, David Cox, Christoffer Norn, Brian Coventry, Inna Goreshnik, Dionne Vafeados, Gyu Rie Lee, Raluca Gordan, Barry L Stoddard, Frank DiMaio, David Baker
bioRxiv 2023.09.20.558720Supplementary

Improving protein expression, stability, and function with ProteinMPNN
Kiera H. Sumida, Reyes Núñez Franco, Indrek Kalvet, Samuel J. Pellock, Basile I. M. Wicky, Lukas F. Milles, Justas Dauparas, Jue Wang, Yakov Kipnis, Noel Jameson, Alex Kang, Joshmyn De La Cruz, Banumathi Sankaran, Asim K Bera, Gonzalo Jimenez Oses, David Baker
bioRxiv 2023.10.03.560713/J. Am. Chem. Soc. 2024Supplementary

A Suite of Designed Protein Cages Using Machine Learning Algorithms and Protein Fragment-Based Protocols
Kyle Meador, Roger Castells-Graells, Roman Aguirre, Michael R. Sawaya, Mark A. Arbing, Trent Sherman, Chethaka Senarathne, Todd O. Yeates
bioRxiv 2023.10.09.561468codecolab

PROTEIN DESIGNER BASED ON SEQUENCE PROFILE USING ULTRAFAST SHAPE RECOGNITION
Anonymous
ICLR 2024 under review

Inverse folding for antibody sequence design using deep learning
Frédéric A. Dreyer, Daniel Cutting, Constantin Schneider, Henry Kenlay, Charlotte M. Deane
arXiv:2310.19513

ProRefiner: an entropy-based refining strategy for inverse protein folding with global graph attention
Xinyi Zhou, Guangyong Chen, Junjie Ye, Ercheng Wang, Jun Zhang, Cong Mao, Zhanwei Li, Jianye Hao, Xingxu Huang, Jin Tang, Pheng Ann Heng
Nature CommunicationsSupplementarycode

Engineered immunogens to elicit antibodies against conserved coronavirus epitopes
A. Brenda Kapingidza, Daniel J. Marston, Caitlin Harris, Daniel Wrapp, Kaitlyn Winters, Dieter Mielke, Lu Xiaozhi, Qi Yin, Andrew Foulger, Rob Parks, Maggie Barr, Amanda Newman, Alexandra Schäfer, Amanda Eaton, Justine Mae Flores, Austin Harner, Nicholas J. Catanzaro Jr., Michael L. Mallory, Melissa D. Mattocks, Christopher Beverly, Brianna Rhodes, Katayoun Mansouri, Elizabeth Van Itallie, Pranay Vure, Brooke Dunn, Taylor Keyes, Sherry Stanfield-Oakley, Christopher W. Woods, Elizabeth A. Petzold, Emmanuel B. Walter, Kevin Wiehe, Robert J. Edwards, David C. Montefiori, Guido Ferrari, Ralph Baric, Derek W. Cain, Kevin O. Saunders, Barton F. Haynes & Mihai L. Azoitei
Nat Commun 14, 7897 (2023)code

DNDesign: Enhancing Physical Understanding of Protein Inverse Folding Model via Denoising
Youhan Lee, Jaehoon Kim
bioRxiv 2023.12.05.570298

In vitro validated antibody design against multiple therapeutic antigens using generative inverse folding
Amir Shanehsazzadeh, Julian Alverio, George Kasun, Simon Levine, Jibran A Khan, Chelsea Chung, Nicolas Diaz, Breanna K Luton, Ysis Tarter, Cailen McCloskey, Katherine B Bateman, Hayley Carter, Dalton Chapman, Rebecca Consbruck, Alec Jaeger, Christa Kohnert, Gaelin Kopec-Belliveau, John M Sutton, Zheyuan Guo, Gustavo Canales, Kai Ejan, Emily Marsh, Alyssa Ruelos, Rylee Ripley, Brooke Stoddard, Rodante Caguiat, Kyra Chapman, Matthew Saunders, Jared Sharp, Douglas Ganini da Silva, Audree Feltner, Jake Ripley, Megan E Bryant, Danni Castillo, Joshua Meier, Christian M Stegmann, Katherine Moran, Christine Lemke, Shaheed Abdulhaqq, Lillian R Klug, Sharrol Bachas
bioRxiv 2023.12.08.570889

SPDesign: protein sequence designer based on structural sequence profile using ultrafast shape recognition
Hui Wang, Dong Liu, Kailong Zhao, Yajun Wang, Guijun Zhang
bioRxiv 2023.12.14.571651/Briefings in Bioinformatics 25.3 (2024): bbae146website

De novo design of diverse small molecule binders and sensors using Shape Complementary Pseudocycles
Linna An, Meerit Said, Long Tran, Sagardip Majumder, Inna Goreshnik, Gyu Rie Lee, David Juergens, Justas Dauparas, Ivan Anishchenko, Brian Coventry, Asim K Bera, Alex Kang, Paul M Levine, Valentina Alvarez, Arvindd Pillai, Christoffer Norn, David Feldman, Dmitri Zorine, Derrick R Hicks, Xinting Li, Mariana Garcia Sanchez, Dionne K Vafeados, Patrick J Salveson, Anastassia A Vorobieva, David Baker
bioRxiv 2023.12.20.572602/Science385,276-282(2024)code1, code2, code3

Atomic context-conditioned protein sequence design using LigandMPNN
Justas Dauparas, Gyu Rie Lee, Robert Pecoraro, Linna An, Ivan Anishchenko, Cameron Glasscock, D. Baker
bioRxiv 2023.12.22.573103code

Structure-conditioned masked language models for protein sequence design generalize beyond the native sequence space
Deniz Akpinaroglu, Kosuke Seki, Amy Guo, Eleanor Zhu, Mark J. S. Kelly, Tanja Kortemme
bioRxiv 2023.12.15.571823code

ProteinMPNN Recovers Complex Sequence Properties of Transmembrane β-Barrels
Marissa D Dolorfino, Anastassia A Vorobieva
bioRxiv 2024.01.16.575764code

DIProT: A deep learning based interactive toolkit for efficient and effective Protein design
He, Jieling, Wenxu Wu, and Xiaowo Wang.
Synthetic and Systems Biotechnology (2024)

Blueprinting extendable nanomaterials with standardized protein blocks
Timothy F. Huddy, Yang Hsia, Ryan D. Kibler, Jinwei Xu, Neville Bethel, Deepesh Nagarajan, Rachel Redler, Philip J. Y. Leung, Connor Weidle, Alexis Courbet, Erin C. Yang, Asim K. Bera, Nicolas Coudray, S. John Calise, Fatima A. Davila-Hernandez, Hannah L. Han, Kenneth D. Carr, Zhe Li, Ryan McHugh, Gabriella Reggiano, Alex Kang, Banumathi Sankaran, Miles S. Dickinson, Brian Coventry, T. J. Brunette, Yulai Liu, Justas Dauparas, Andrew J. Borst, Damian Ekiert, Justin M. Kollman, Gira Bhabha & David Baker
Nature (2024)RosettaScripts

All-atom protein sequence design based on geometric deep learning
Jiale Liu, Zheng Guo, Changsheng Zhang, Luhua Lai
bioRxiv 2024.03.18.585651code

Graphormer supervised de novo protein design method and function validation
Junxi Mu, Zhengxin Li, Bo Zhang, Qi Zhang, Jamshed Iqbal, Abdul Wadood, Ting Wei, Yan Feng, Hai-Feng Chen
Briefings in Bioinformatics 25.3 (2024): bbae135code

The Damietta Server: a comprehensive protein design toolkit
Iwan Grin, Kateryna Maksymenko, Tobias Wörtwein, Mohammad ElGamacy
Nucleic Acids Research, 2024;, gkae297website • ProteinMPNN-based • news, news2

Exploring the Potential of Structure-Based Deep Learning Approaches for T cell Receptor Design
Helder V. Ribeiro-Filho, Gabriel E. Jara, João V. S. Guerra, Melyssa Cheung, Nathaniel R. Felbinger, José G. C. Pereira, Brian G. Pierce, Paulo S. Lopes-de-Oliveira
bioRxiv 2024.04.19.590222code, code2

SurfPro: Functional Protein Design Based on Continuous Surface
Zhenqiao Song, Tinglin Huang, Lei Li, Wengong Jin
arXiv:2405.06693 • ProteinMPNN-based

Computational Design of Myoglobin-based Carbene Transferases for Monoterpene Derivatization
Yiyang Sun, Yinian Tang, Jing Zhou, Bingchen Guo, Feiyan Yuan, Bo Yao, Yang Yu, Chun Li
Biochemical and Biophysical Research Communications (2024)code • LigandMPNN-based

UniIF: Unified Molecule Inverse Folding
Zhangyang Gao, Jue Wang, Cheng Tan, Lirong Wu, Yufei Huang, Siyuan Li, Zhirui Ye, Stan Z. Li
arXiv:2405.18968

Integrating MHC Class I visibility targets into the ProteinMPNN protein design process
Hans-Christof Gasser, Diego A. Oyarzún, Javier Antonio Alfaro, Ajitha Rajan
bioRxiv 2024.06.04.597365

A Top-Down Design Approach for Generating a Peptide PROTAC Drug Targeting Androgen Receptor for Androgenetic Alopecia Therapy
Bohan Ma, Donghua Liu, Zhe Wang, Dize Zhang, Yanlin Jian, Kun Zhang, Tianyang Zhou, Yibo Gao, Yizeng Fan, Jian Ma, Yang Gao, Yule Chen, Si Chen, Jing Liu, Xiang Li, and Lei Li
Journal of Medicinal Chemistry (2024)

Improving Inverse Folding models at Protein Stability Prediction without additional Training or Data
Oliver Dutton, Sandro Bottaro, Michele Invernizzi, Istvan Redl, Albert Chung, Carlo Fisicaro, Fabio Airoldi, Stefano Ruschetta, Louie Henderson, Benjamin MJ Owens, Patrik Foerch, Kamil Tamiola
bioRxiv 2024.06.15.599145 • ProteinMPNN/ESMIF-based

4.6 GAN-based

De novo protein design for novel folds using guided conditional Wasserstein generative adversarial networks
Mostafa Karimi, Shaowen Zhu, Yue Cao, Yang Shen
Journal of chemical information and modeling 60.12 (2020)gcWGAN

HelixGAN: A bidirectional Generative Adversarial Network with search in latent space for generation under constraints
Xuezhi Xie, Philip M. Kim
Machine Learning for Structural Biology Workshop, NeurIPS 2021/Bioinformatics, 2023;, btad036code

4.7 Transformer-based

Generative models for graph-based protein design
John Ingraham, Vikas K Garg, Dr.Regina Barzilay, Tommi Jaakkola
NeurIPS 2019GraphTrans

Fold2Seq: A Joint Sequence (1D)-Fold (3D) Embedding-based Generative Model for Protein Design
Yue Cao, Payel Das, Vijil Chenthamarakshan, Pin-Yu Chen, Igor Melnyk, Yang Shen
International Conference on Machine Learning. PMLR, 2021

Rotamer-Free Protein Sequence Design Based on Deep Learning and Self-Consistency
Yufeng Liu, Lu Zhang, Weilun Wang, Min Zhu, Chenchen Wang, Fudong Li, Jiahai Zhang, Houqiang Li, Quan Chen& Haiyan Liu
Nature portfolio (2022)/Nature computational science(2022)SupplementaryCommentcode

A Deep SE(3)-Equivariant Model for Learning Inverse Protein Folding
Mmatthew McPartlon, Ben Lai, Jinbo Xu
bioRxiv (2022)

Learning inverse folding from millions of predicted structures
Chloe Hsu, Robert Verkuil, Jason Liu, Zeming Lin, Brian Hie, Tom Sercu, Adam Lerer, Alexander Rives
bioRxiv (2022)esm

Breaking boundaries in protein design with a new AI model that understands interactions with any kind of molecule
LucianoSphere
Towards Data Science

Accurate and efficient protein sequence design through learning concise local environment of residues
Bin Huang, Tingwen Fan, Kaiyue Wang, Haicang Zhang, Chungong Yu, Shuyu Nie, Yangshuo Qi, Wei-Mou Zheng, Jian Han, Zheng Fan, Shiwei Sun, Sheng Ye, Huaiyi Yang, Dongbo Bu
bioRxiv (2022)/Bioinformatics 39.3 (2023)Supplementarywebsitecode

PeTriBERT : Augmenting BERT with tridimensional encoding for inverse protein folding and design
Baldwin Dumortier, Antoine Liutkus, Clément Carré, Gabriel Krouk
bioRxiv 2022.08.10.503344

Evolutionary-scale prediction of atomic level protein structure with a language model
Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Salvatore Candido, Alexander Rives
bioRxiv 2022.07.20.500902bloggithub

Structure-informed Language Models Are Protein Designers
Zaixiang Zheng, Yifan Deng, Dongyu Xue, Yi Zhou, Fei YE, Quanquan Gu
arXiv:2302.01649code::ByProt

Incorporating Pre-training Paradigm for Antibody Sequence-Structure Co-design
Kaiyuan Gao, Lijun Wu, Jinhua Zhu, Tianbo Peng, Yingce Xia, Liang He, Shufang Xie, Tao Qin, Haiguang Liu, Kun He, Tie-Yan Liu
arXiv:2211.08406code

A Text-guided Protein Design Framework
Shengchao Liu, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Anthony Gitter, Chaowei Xiao, Jian Tang, Hongyu Guo, Anima Anandkumar
arXiv:2302.04611code

An end-to-end deep learning method for protein side-chain packing and inverse folding
McPartlon, Matthew, and Jinbo Xu
Proceedings of the National Academy of Sciences 120.23 (2023)codeSupplementary

Context-aware geometric deep learning for protein sequence design
Lucien Krapp, Fernado Meireles, Luciano Abriata, Matteo Dal Peraro
bioRxiv 2023.06.19.545381code

De Novo Generation and Prioritization of Target-Binding Peptide Motifs from Sequence Alone
Suhaas Bhat, Kalyan Palepu, Vivian Yudistyra, Lauren Hong, Venkata Srikar Kavirayuni, Tianlai Chen, Lin Zhao, Tian Wang, Sophia Vincoff, Pranam Chatterjee
bioRxiv 2023.06.26.546591codecolabSupplementary

ProstT5: Bilingual Language Model for Protein Sequence and Structure Michael Heinzinger
Konstantin Weissenow, Joaquin Gomez Sanchez, Adrian Henkel, Martin Steinegger, Burkhard Rost
bioRxiv 2023.07.23.550085Supplementarycode

De novo Protein Sequence Design Based on Deep Learning and Validation on CalB Hydrolase
Junxi Mu, ZhengXin Li, Bo Zhang, Qi Zhang, Jamshed Iqbal, Abdul Wadood, Ting Wei, Yan Feng, Haifeng Chen
bioRxiv 2023.08.01.551444code

Invariant point message passing for protein side chain packing and design
Nicholas Z Randolph, Brian Kuhlman
bioRxiv 2023.08.03.551328code

Atom-by-atom protein generation and beyond with language models
Daniel Flam-Shepherd, Kevin Zhu, Alán Aspuru-Guzik
arXiv:2308.09482

SaProt: Protein Language Modeling with Structure-aware Vocabulary
Jin Su, Chenchen Han, Yuyang Zhou, Junjie Shan, Xibin Zhou, Fajie Yuan
bioRxiv 2023.10.01.560349code

AntiFold: Improved antibody structure design using inverse folding
Magnus Høie, Alissa Hummer, Tobias Olsen, Morten Nielsen, Charlotte Deane
GenBio@NeurIPS2023 Spotlightcodecolab

MMDesign: Multi-Modality Transfer Learning for Generative Protein Design
Jiangbin Zheng, Siyuan Li, Yufei Huang, Zhangyang Gao, Cheng Tan, Bozhen Hu, Jun Xia, Ge Wang, Stan Z. Li
arXiv preprint arXiv:2312.06297 (2023)

ShapeProt: Top-down Protein Design with 3D Protein Shape Generative Model
Lee, Youhan, and Jaehoon Kim.
bioRxiv (2023): 2023-12

X-LoRA: Mixture of Low-Rank Adapter Experts, a Flexible Framework for Large Language Models with Applications in Protein Mechanics and Design
Eric L. Buehler, Markus J. Buehler
arXiv:2402.07148codeModel & weights

AntiFold: Improved antibody structure-based design using inverse folding
Magnus Haraldson Høie, Alissa Hummer, Tobias H. Olsen, Broncio Aguilar-Sanjuan, Morten Nielsen, Charlotte M. Deane arXiv:2405.03370codewebsite • ESM-IF-based

Protein Design with StructureGPT: a Deep Learning Model for Protein Structure-to-Sequence Translation
Nicanor Zalba Sr., Pablo Ursua-Medrano Sr., Humberto Bustince Sr.
bioRxiv 2024.06.03.597105codeSupplementary

4.8 ResNet-based

DenseCPD: improving the accuracy of neural-network-based computational protein sequence design with DenseNet
Qi, Yifei, and John ZH Zhang.
Journal of chemical information and modeling 60.3 (2020) • code unavailable

4.9 Diffusion-based

De novo protein backbone generation based on diffusion with structured priors and adversarial training
Yufeng Liu, Linghui Chen, Haiyan Liu
bioRxiv 2022.12.17.520847

Generative design of de novo proteins based on secondary-structure constraints using an attention-based diffusion model
Bo Ni, David L. Kaplan, Markus J. Buehler
Chem,(2023)codenews

Graph Denoising Diffusion for Inverse Protein Folding
Kai Yi, Bingxin Zhou, Yiqing Shen, Pietro Liò, Yu Guang Wang
arXiv:2306.16819code

Conditional Protein Denoising Diffusion Generates Programmable Endonucleases
Bingxin Zhou, Lirong Zheng, Banghao Wu, Kai Yi, Bozitao Zhong, Pietro Lio, Liang Hong
bioRxiv 2023.08.10.552783

Diffusion in a quantized vector space generates non-idealized protein structures and predicts conformational distributions
Liu Haiyan, Liu Yufeng, Chen Linghui
bioRxiv 2023.11.18.567666

Fast non-autoregressive inverse folding with discrete diffusion
John J. Yang, Jason Yim, Regina Barzilay, Tommi Jaakkola
arXiv:2312.02447code

Diffusion Language Models Are Versatile Protein Learners
Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu
arXiv:2402.18567

LéxFusion
Levinthal
paper not available • news • commercial

4.10 Bayesian-based

Inverse Protein Folding Using Deep Bayesian Optimization
Natalie Maus, Yimeng Zeng, Daniel Allen Anderson, Phillip Maffettone, Aaron Solomon, Peyton Greenside, Osbert Bastani, Jacob R. Gardner
arXiv:2305.18089code

4.11 Flow-based

Harmonic Self-Conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design
Hannes Stärk, Bowen Jing, Regina Barzilay, Tommi Jaakkola
arXiv:2310.05764code

5.Function to Sequence

These models generate sequences from expected function.

5.1 CNN-based

Antibody complementarity determining region design using high-capacity machine learning
Ge Liu, Haoyang Zeng, Jonas Mueller, Brandon Carter, Ziheng Wang, Jonas Schilz, Geraldine Horny, Michael E Birnbaum, Stefan Ewert, David K Gifford
Bioinformatics 36.7 (2020): 2126-2133code

Protein design and variant prediction using autoregressive generative models
Jung-Eun Shin, Adam J. Riesselman, Aaron W. Kollasch, Conor McMahon, Elana Simon, Chris Sander, Aashish Manglik, Andrew C. Kruse & Debora S. Marks
Nature communications 12.1 (2021)code::SeqDesign • mutation effect prediction • sequence generation • April 2021

Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning
Derek M. Mason, Simon Friedensohn, Cédric R. Weber, Christian Jordi, Bastian Wagner, Simon M. Meng, Roy A. Ehling, Lucia Bonati, Jan Dahinden, Pablo Gainza, Bruno E. Correia & Sai T. Reddy
Nature Biomedical Engineering 5.6 (2021)code

Accelerated Engineering of ELP‐based Materials through Hybrid Biomimetic‐De Novo Predictive Molecular Design
Timo Laakko, Antti Korkealaakso, Burcu Firatligil Yildirir, Piotr Batys, Ville Liljeström, Ari Hokkanen, Nonappa, Merja Penttilä, Anssi Laukkanen, Ali Miserez, Caj Södergård, Pezhman Mohammadi
Advanced Materials (2024)

5.2 VAE-based

Machine learning-aided design and screening of an emergent protein function in synthetic cells
Shunshi Kohyama, Béla P. Frohn, Leon Babl & Petra Schwille
Nature Communications 15, 2010 (2024)code

Variational auto-encoding of protein sequences
Sam Sinai, Eric Kelsic, George M. Church, Martin A. Nowak
arXiv preprint arXiv:1712.03346 (2017)

Design by adaptive sampling
Brookes, David H., and Jennifer Listgarten.
arXiv preprint arXiv:1810.03714 (2018)

Pepcvae: Semi-supervised targeted design of antimicrobial peptide sequences
Payel Das, Kahini Wadhawan, Oscar Chang, Tom Sercu, Cicero Dos Santos, Matthew Riemer, Vijil Chenthamarakshan, Inkit Padhi, Aleksandra Mojsilovic
arXiv preprint arXiv:1810.07743 (2018)

Deep generative models for T cell receptor protein sequences
Kristian Davidsen, Branden J Olson, William S DeWitt III, Jean Feng, Elias Harkins, Philip Bradley, Frederick A Matsen IV
Elife 8 (2019)

How to hallucinate functional proteins
Costello, Zak, and Hector Garcia Martin.
arXiv preprint arXiv:1903.00458 (2019)

Convergent selection in antibody repertoires is revealed by deep learning
Simon Friedensohn, Daniel Neumeier, Tarik A Khan, Lucia Csepregi, Cristina Parola, Arthur R Gorter de Vries, Lena Erlach, Derek M Mason, Sai T Reddy
BioRxiv (2020)Supplementary • code available after publication

Variational autoencoder for generation of antimicrobial peptides
Dean, Scott N., and Scott A. Walper.
ACS omega 5.33 (2020)

Generating functional protein variants with variational autoencoders
Alex Hawkins-Hooker, Florence Depardieu, Sebastien Baur, Guillaume Couairon, Arthur Chen, David Bikard
PLoS computational biology 17.2 (2021)

Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations
Payel Das, Tom Sercu, Kahini Wadhawan, Inkit Padhi, Sebastian Gehrmann, Flaviu Cipcigan, Vijil Chenthamarakshan, Hendrik Strobelt, Cicero dos Santos, Pin-Yu Chen, Yi Yan Yang, Jeremy P. K. Tan, James Hedrick, Jason Crain & Aleksandra Mojsilovic
Nature Biomedical Engineering 5.6 (2021)

Deep generative models create new and diverse protein structures
Zeming, Tom, Yann and Alexander.
NeurIPS 2021

PepVAE: variational autoencoder framework for antimicrobial peptide generation and activity prediction
Scott N. Dean, Jerome Anthony E. Alvarez, Dan Zabetakis, Scott A. Walper, and Anthony P. Malanoski
Frontiers in microbiology 12 (2021)codeSupplementary

HydrAMP: a deep generative model for antimicrobial peptide discovery
Paulina Szymczak, Marcin Możejko, Tomasz Grzegorzek, Marta Bauer, Damian Neubauer, Michał Michalski, Jacek Sroka, Piotr Setny, Wojciech Kamysz, Ewa Szczurek
bioRxiv (2022)code

Therapeutic enzyme engineering using a generative neural network
Andrew Giessel, Athanasios Dousis, Kanchana Ravichandran, Kevin Smith, Sreyoshi Sur, Iain McFadyen, Wei Zheng & Stuart Licht
Scientific Reports 12.1 (2022)

GM-Pep: A High Efficiency Strategy to De Novo Design Functional Peptide Sequences
Qushuo Chen, Changyan Yang, Yihao Xie, Yuqiang Wang, Xiaoxu Li, Kairong Wang, Jinqi Huang, and Wenjin Yan
Journal of Chemical Information and Modeling (2022)code

Mean Dimension of Generative Models for Protein Sequences
Christoph Feinauer, Emanuele Borgonovo
bioRxiv 2022.12.12.520028code

Prediction of designer-recombinases for DNA editing with generative deep learning
Lukas Theo Schmitt, Maciej Paszkowski-Rogacz, Florian Jug & Frank Buchholz
Nat Commun 13, 7966 (2022)codeSupplementary

ProT-VAE: Protein Transformer Variational AutoEncoder for Functional Protein Design
Emre Sevgen, Joshua Moller, Adrian Lange, John Parker, Sean Quigley, Jeff Mayer, Poonam Srivastava, Sitaram Gayatri, David Hosfield, Maria Korshunova, Micha Livne, Michelle Gill, Rama Ranganathan, Anthony B Costa, Andrew L Ferguson
bioRxiv 2023.01.23.525232

Target specific peptide design using latent space approximate trajectory collector
Tong Lin, Sijie Chen, Ruchira Basu, Dehu Pei, Xiaolin Cheng, Levent Burak Kara
arXiv:2302.01435

Deep-learning generative models enable design of synthetic orthologs of a signaling protein
Xinran Lian, Niksa Praljak, Andrew L. Ferguson, Rama Ranganathan
Biophysical Journal 122.3 (2023): 311a

Designing a protein with emergent function by combined in silico, in vitro and in vivo screening
Shunshi Kohyama, Bela Paul Frohn, Leon Babl, Petra Schwille
bioRxiv 2023.02.16.528840Supplementary

ProteinVAE: Variational AutoEncoder for Translational Protein Design
Suyue Lyu, Shahin Sowlati-Hashjin, Michael Garton
bioRxiv 2023.03.04.531110/Nat Mach Intell (2024)Supplementarycode

ProtWave-VAE: Integrating autoregressive sampling with latent-based inference for data-driven protein design
Niksa Praljak, Xinran Lian, Rama Ranganathan, Andrew Ferguson
bioRxiv 2023.04.23.537971Supplementarycode

Designing meaningful continuous representations of T cell receptor sequences with deep generative models
Allen Y. Leary, Darius Scott, Namita T. Gupta, Janelle C. Waite, Dimitris Skokos, Gurinder S. Atwal, Peter G. Hawkins
bioRxiv 2023.06.17.545423code

Utility of language model and physics-based approaches in modifying MHC Class-I immune-visibility for the design of vaccines and therapeutics
Hans-Christof Gasser, Diego Oyarzun, Ajitha Rajan, Javier Alfaro
bioRxiv 2023.07.10.548300

Cell-free biosynthesis combined with deep learning accelerates de novo-development of antimicrobial peptides
Amir Pandi, David Adam, Amir Zare, Van Tuan Trinh, Stefan L. Schaefer, Marie Burt, Björn Klabunde, Elizaveta Bobkova, Manish Kushwaha, Yeganeh Foroughijabbari, Peter Braun, Christoph Spahn, Christian Preußer, Elke Pogge von Strandmann, Helge B. Bode, Heiner von Buttlar, Wilhelm Bertrams, Anna Lena Jung, Frank Abendroth, Bernd Schmeck, Gerhard Hummer, Olalla Vázquez & Tobias J. Erb
Nature Communications 14.1 (2023)code

Design of target specific peptide inhibitors using generative deep learning and molecular dynamics simulations
Sijie Chen, Tong Lin, Ruchira Basu, Jeremy Ritchey, Shen Wang, Yichuan Luo, Xingcan Li, Dehua Pei, Levent Burak Kara & Xiaolin Cheng
Nat Commun 15, 1611 (2024)code

5.3 GAN-based

Feedback GAN for DNA optimizes protein functions
Gupta, Anvita, and James Zou.
Nature Machine Intelligence 1.2 (2019)code

Generating protein sequences from antibiotic resistance genes data using Generative Adversarial Networks
Chhibbar, Prabal, and Arpit Joshi.
arXiv preprint arXiv:1904.13240 (2019)

ProGAN: Protein solubility generative adversarial nets for data augmentation in DNN framework
Xi Han, Liheng Zhang, Kang Zhou, Xiaonan Wang
Computers & Chemical Engineering 131 (2019)

GANDALF: Peptide Generation for Drug Design using Sequential and Structural Generative Adversarial Networks
Rossetto, Allison, and Wenjin Zhou.
Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 2020

Designing feature-controlled humanoid antibody discovery libraries using generative adversarial networks
Tileli Amimeur, Jeremy M. Shaver, Randal R. Ketchem, J. Alex Taylor, Rutilio H. Clark, Josh Smith, Danielle Van Citters, Christine C. Siska, Pauline Smidt, Megan Sprague, Bruce A. Kerwin, Dean Pettit
BioRxiv (2020)

Generating ampicillin-level antimicrobial peptides with activity-aware generative adversarial networks
Andrejs Tucs, Duy Phuoc Tran, Akiko Yumoto, Yoshihiro Ito, Takanori Uzawa, and Koji Tsuda
ACS omega 5.36 (2020)code

Conditional Generative Modeling for De Novo Protein Design with Hierarchical Functions
Kucera, Tim, Matteo Togninalli, and Laetitia Meng-Papaxanthos
bioRxiv (2021)/Bioinformatics 38.13 (2022)code

Expanding functional protein sequence spaces using generative adversarial networks
Donatas Repecka, Vykintas Jauniskis, Laurynas Karpus, Elzbieta Rembeza, Irmantas Rokaitis, Jan Zrimec, Simona Poviloniene, Audrius Laurynenas, Sandra Viknander, Wissam Abuajwa, Otto Savolainen, Rolandas Meskys, Martin K. M. Engqvist & Aleksej Zelezniak
Nature Machine Intelligence 3.4 (2021)code

A Generative Approach toward Precision Antimicrobial Peptide Design.
Jonathon B. Ferrell, Jacob M. Remington, Colin M. Van Oort, Mona Sharafi, Reem Aboushousha, Yvonne Janssen-Heininger, Severin T. Schneebeli, Matthew J. Wargo, Safwan Wshah, Jianing Li
BioRxiv (2021)code

AMPGAN v2: Machine Learning-Guided Design of Antimicrobial Peptides
Colin M. Van Oort, Jonathon B. Ferrell, Jacob M. Remington, Safwan Wshah, and Jianing Li
Journal of chemical information and modeling 61.5 (2021)

DeepImmuno: deep learning-empowered prediction and generation of immunogenic peptides for T-cell immunity
Guangyuan Li, Balaji Iyer, V B Surya Prasath, Yizhao Ni, Nathan Salomonis
Briefings in bioinformatics 22.6 (2021)codeweb

PandoraGAN: Generating antiviral peptides using Generative Adversarial Network
Shraddha Surana, Pooja Arora, Divye Singh, Deepti Sahasrabuddhe, Jayaraman Valadi
bioRxiv (2021)

Feedback-AVPGAN: Feedback-guided generative adversarial network for generating antiviral peptides
Kano Hasegawa, Yoshitaka Moriwaki, Tohru Terada, Cao Wei, and Kentaro Shimizu
Journal of Bioinformatics and Computational Biology (2022)code

Designing antimicrobial peptides using deep learning and molecular dynamic simulations
Qiushi Cao, Cheng Ge, Xuejie Wang, Peta J Harvey, Zixuan Zhang, Yuan Ma, Xianghong Wang, Xinying Jia, Mehdi Mobli, David J Craik, Tao Jiang, Jinbo Yang, Zhiqiang Wei, Yan Wang, Shan Chang, Rilei Yu
Briefings in Bioinformatics (2023)

Generative β-Hairpin Design Using a Residue-Based Physicochemical Property Landscape
Vardhan Satalkar and Gemechis D. Degaga and Wei Li and Yui Tik Pang and Andrew C. McShan and James C. Gumbart and Julie C. Mitchell and Matthew P. Torres
Biophysical Journal(2024)code

De Novo Antimicrobial Peptide Design with Feedback Generative Adversarial Networks
Michaela Areti Zervou, Effrosyni Doutsi, Yannis Pantazis, Panagiotis Tsakalides
International Journal of Molecular Sciences 25.10 (2024)code

5.4 Transformer-based

Including protein large language models(pLLM) and autoregressive language models.

Progen: Language modeling for protein generation / Large language models generate functional protein sequences across diverse families
Ali Madani, Bryan McCann, Nikhil Naik, Nitish Shirish Keskar, Namrata Anand, Raphael R. Eguchi, Po-Ssu Huang, Richard Socher
arXiv preprint arXiv:2004.03497 (2020)/Nat Biotechnol (2023)ProGen, CTRL

Signal peptides generated by attention-based neural networks
Zachary Wu, Kevin K. Yang, Michael J. Liszka, Alycia Lee, Alina Batzilla, David Wernick, David P. Weiner, and Frances H. Arnold
ACS Synthetic Biology 9.8 (2020)

ProtTrans: towards cracking the language of Life's code through self-supervised deep learning and high performance computing
Ahmed Elnaggar, Michael Heinzinger, Christian Dallago, Ghalia Rehawi, Yu Wang, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Martin Steinegger,Debsindhu Bhowmik, and Burkhard Rost
arXiv preprint arXiv:2007.06225 (2020)code

Generative Language Modeling for Antibody Design
Shuai, Richard W., Jeffrey A. Ruffolo, and Jeffrey J. Gray.
bioRxiv (2021)/Cell SystemsSupplementarycode

Deep neural language modeling enables functional protein generation across families
Ali Madani, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P. Mohr, James M. Holton, Jose Luis Olmos Jr., Caiming Xiong, Zachary Z. Sun, Richard Socher, James S. Fraser, Nikhil Naik
bioRxiv (2021)

Protein sequence sampling and prediction from structural data
Gabriel A. Orellana, Javier Caceres-Delpiano, Roberto Ibañez, Michael P. Dunne, Leonardo Alvarez
bioRxiv 2021.09.06.459171

Transformer-based protein generation with regularized latent space optimization
Egbert Castro, Abhinav Godavarthi, Julian Rubinfien, Kevin Givechian, Dhananjay Bhaskar & Smita Krishnaswamy
Nat Mach Intell (2022)/arXiv:2201.09948code

BioPhi: A platform for antibody design, humanization, and humanness evaluation based on natural antibody repertoires and deep learning
David Prihoda, Jad Maamary, Andrew Waight, Veronica Juan, Laurence Fayadat-Dilman, Daniel Svozil, Danny A. Bitton
mAbs. Vol. 14. No. 1. Taylor & Francis, 2022

Guided Generative Protein Design using Regularized Transformers
Egbert Castro, Abhinav Godavarthi, Julian Rubinfien, Kevin B. Givechian, Dhananjay Bhaskar, Smita Krishnaswamy
arXiv preprint arXiv:2201.09948 (2022)

Towards Controllable Protein design with Conditional Transformers
Noelia Ferruz, Birte Höcker
arXiv preprint arXiv:2201.07338 (2022)/Nature Machine Intelligence (2022) • review of Heading 5.4

ProtGPT2 is a deep unsupervised language model for protein design
Noelia Ferruz, View ProfileSteffen Schmidt, View ProfileBirte Höcker
bioRxiv/Nature Communicationsmodel::huggingface datasets::hugingfacelectureresearch highlightsnews

Few Shot Protein Generation
Ram, Soumya, and Tristan Bepler.
arXiv preprint arXiv:2204.01168 (2022)

RITA: a Study on Scaling Up Generative Protein Sequence Models
Daniel Hesslow, Niccoló Zanichelli, Pascal Notin, Iacopo Poli, Debora Marks
arXiv preprint arXiv:2205.05789 (2022)code

ProGen2: Exploring the Boundaries of Protein Language Models
Erik Nijkamp, Jeffrey Ruffolo, Eli N. Weinstein, Nikhil Naik, Ali Madani
arXiv:2206.13517codeguide

AbLang: an antibody language model for completing antibody sequences
Tobias H Olsen, Iain H Moal, Charlotte M Deane
Bioinformatics Advances, Volume 2, Issue 1, 2022, vbac046

Reprogramming Pretrained Language Models for Antibody Sequence Infilling
Igor Melnyk, Vijil Chenthamarakshan, Pin-Yu Chen, Payel Das, Amit Dhurandhar, Inkit Padhi, Devleena Das
arXiv:2210.07144code

AbBERT: Learning Antibody Humanness via Masked Language Modeling
Denis Vashchenko, Sam Nguyen, Andre Goncalves, Felipe Leno da Silva, Brenden Petersen, Thomas Desautels, Daniel Faissol
bioRxiv 2022.08.02.502236

Accelerating Antibody Design with Active Learning
Seung-woo Seo, Min Woo Kwak, Eunji Kang, Chaeun Kim, Eunyoung Park, Tae Hyun Kang, Jinhan Kim
bioRxiv 2022.09.12.507690

Reprogramming Large Pretrained Language Models for Antibody Sequence Infilling
Igor Melnyk, Vijil Chenthamarakshan, Pin-Yu Chen, Payel Das, Amit Dhurandhar, Inkit Padhi, Devleena Das
ICLR 2023/arXiv:2210.07144

Machine Learning Optimization of Candidate Antibodies Yields Highly Diverse Sub-nanomolar Affinity Antibody Libraries
Lin Li, Esther Gupta, John Spaeth, Leslie Shing, Rafael Jaimes, Rajmonda Sulo Caceres, Tristan Bepler, Matthew E. Walsh
bioRxiv 2022.10.07.502662Supplementary • code will be available

ZymCTRL: a conditional language model for the contollable generation of artificial enzymes
Noelia Ferruz
NeurIPS 2022/bioRxiv 2024.05.03.592223hugging faceposter

Generative Antibody Design for Complementary Chain Pairing Sequences through Encoder-Decoder Language Model
Chu, Simon, and Kathy Wei.
NeurIPS 2023 Generative AI and Biology (GenBio) Workshop. 2023/arXiv:2301.02748

Unlocking de novo antibody design with generative artificial intelligence
Amir Shanehsazzadeh, Matt McPartlon, George Kasun, Andrea K. Steiger, John M. Sutton, Edriss Yassine, Cailen McCloskey, Robel Haile, Richard Shuai, Julian Alverio, Goran Rakocevic, Simon Levine, Jovan Cejovic, Jahir M. Gutierrez, Alex Morehead, Oleksii Dubrovskyi, Chelsea Chung, Breanna K. Luton, Nicolas Diaz, Christa Kohnert, Rebecca Consbruck, Hayley Carter, Chase LaCombe, Itti Bist, Phetsamay Vilaychack, Zahra Anderson, Lichen Xiu, Paul Bringas, Kimberly Alarcon, Bailey Knight, Macey Radach, Katherine Bateman, Gaelin Kopec-Belliveau, Dalton Chapman, Joshua Bennett, Abigail B. Ventura, Gustavo M. Canales, Muttappa Gowda, Kerianne A. Jackson, Rodante Caguiat, Amber Brown, Douglas Ganini da Silva, Zheyuan Guo, Shaheed Abdulhaqq, Lillian R. Klug, Miles Gander, Engin Yapici, Joshua Meier, Sharrol Bachas
bioRxiv (2023): 2023-01datanewsblog • commercial

A universal deep-learning model for zinc finger design enables transcription factor reprogramming
David M. Ichikawa, Osama Abdin, Nader Alerasool, Manjunatha Kogenaru, April L. Mueller, Han Wen, David O. Giganti, Gregory W. Goldberg, Samantha Adams, Jeffrey M. Spencer, Rozita Razavi, Satra Nim, Hong Zheng, Courtney Gionco, Finnegan T. Clark, Alexey Strokach, Timothy R. Hughes, Timothee Lionnet, Mikko Taipale, Philip M. Kim & Marcus B. Noyes
Nat Biotechnol (2023)

XuperNovo®/ProteinGPT XtalPi
newsnews2website • commercial

Evaluating Prompt Tuning for Conditional Protein Sequence Generation
Andrea Nathansen, Kevin Klein, Bernhard Y. Renard, Melania Nowicka, Jakub M. Bartoszewicz
bioRxiv 2023.02.28.530492code

AB-Gen: Antibody Library Design with Generative Pre-trained Transformer and Deep Reinforcement Learning
Xiaopeng Xu, Tiantian Xu, Juexiao Zhou, Xingyu Liao, Ruochi Zhang, Yu Wang, Lu Zhang, Xin Gao
bioRxiv 2023.03.17.533102codeSupplementarydata

Unsupervised cross-domain translation via deep learning and adversarial attention neural networks and application to music-inspired protein designs
Buehler, Markus J.
Patterns 4.3 (2023)code

ProtFIM: Fill-in-Middle Protein Sequence Design via Protein Language Models
Lee, Youhan, and Hasun Yu.
arXiv preprint arXiv:2303.16452 (2023)/ICLR 2023

REXzyme: A Translation Machine for the Generation of New-to-Nature Enzymes
Sebastian Lindner, Michael Heinzinger, Noelia Ferruz
paper coming soon • hugging face

MPM4: AI Text2Protein Breakthrough Tackles the Molecule Programming Challenge
310.ai
newsrepo • commercial

xTrimoPGLM: Unified 100B-Scale Pre-trained Transformer for Deciphering the Language of Protein
Bo Chen, Xingyi Cheng, Li-ao Gengyang, Shen Li, Xin Zeng, Boyan Wang, Gong Jing, Chiming Liu, Aohan Zeng, Yuxiao Dong, Jie Tang, Le Song
bioRxiv 2023.07.05.547496newswebsite • commercial

TULIP - a Transformer based Unsupervised Language model for Interacting Peptides and T-cell receptors that generalizes to unseen epitopes
Barthelemy Meynard-Piganeau, Christoph Feinauer, Martin Weigt, Aleksandra M Walczak, Thierry Mora
bioRxiv 2023.07.19.549669code

Efficient and accurate sequence generation with small-scale protein language models
Yaiza Serrano, Sergi Roda, Victor Guallar, Alexis Molina
bioRxiv 2023.08.04.551626

IMPROVING ANTIBODY AFFINITY USING LABORATORY DATA WITH LANGUAGE MODEL GUIDED DESIGN
Ben Krause, Subu Subramanian, Tom Yuan, Marisa Yang, Aaron Sato, Nikhil Naik
bioRxiv 2023.09.13.557505

PepMLM: Target Sequence-Conditioned Generation of Peptide Binders via Masked Language Modeling
Tianlai Chen, Sarah Pertsemlidis, Rio Watson, Venkata Srikar Kavirayuni, Ashley Hsu, Pranay Vure, Rishab Pulugurta, Sophia Vincoff, Lauren Hong, Tian Wang, Vivian Yudistyra, Elena Haarer, Lin Zhao, Pranam Chatterjee
arXiv:2310.03842code

De novo generation of antibody CDRH3 with a pre-trained generative large language model
HaoHuai He, Bing He, Lei Guan, Yu Zhao, Guanxing Chen, Qingge Zhu, Calvin Yu-Chian Chen, Ting Li, Jianhua Yao
bioRxiv 2023.10.17.562827codedata

NL2ProGPT: Taming Large Language Model for Conversational Protein Design
Anonymous
ICLR 2024 under review

SaLT&PepPr is an interface-predicting language model for designing peptide-guided protein degraders
Garyk Brixi, Tianzheng Ye, Lauren Hong, Tian Wang, Connor Monticello, Natalia Lopez-Barbosa, Sophia Vincoff, Vivian Yudistyra, Lin Zhao, Elena Haarer, Tianlai Chen, Sarah Pertsemlidis, Kalyan Palepu, Suhaas Bhat, Jayani Christopher, Xinning Li, Tong Liu, Sue Zhang, Lillian Petersen, Matthew P. DeLisa & Pranam Chatterjee
Commun Biol 6, 1081 (2023)code

Binary Discriminator Facilitates GPT-based Protein Design
Zishuo Zeng, Rufang Xu, Jin Guo, Xiaozhou Luo
bioRxiv 2023.11.20.567789codeSupplementary

ProteinNPT: Improving Protein Property Prediction and Design with Non-Parametric Transformers
Pascal Notin, Ruben Weitzman, Debora S Marks, Yarin Gal
bioRxiv 2023.12.06.570473code

The promises of large language models for protein design and modeling
Giorgio Valentini, Dario Malchiodi, Jessica Gliozzo, Marco Mesiti, Mauricio Soto-Gomez, Alberto Cabri, Justin Reese, Elena Casiraghi, and Peter N. Robinson
Frontiers in Bioinformatics 3 (2023)

Conversational Drug Editing Using Retrieval and Domain Feedback
Shengchao Liu, Jiongxiao Wang, Yijin Yang, Chengpeng Wang, Ling Liu, Hongyu Guo, Chaowei Xiao
ICLR (2024)codewebsite

ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning
Alireza Ghafarollahi, Markus J. Buehler
arXiv:2402.04268code

Designing proteins with language models
Ruffolo, J.A., Madani, A.
Nat Biotechnol 42, 200–202 (2024) • review

ProLLaMA: A Protein Large Language Model for Multi-Task Protein Language Processing
Liuzhenghao Lv, Zongying Lin, Hao Li, Yuyang Liu, Jiaxi Cui, Calvin Yu-Chian Chen, Li Yuan, Yonghong Tian
arXiv:2402.16445code

Combining machine learning with structure-based protein design to predict and engineer post-translational modifications of proteins
Moritz Ertelt, Vikram Khipple Mulligan, Jack B. Maguire, Sergey Lyskov, Rocco Moretti, Torben Schiffner, Jens Meiler, Clara T. Schoeder
PLOS Computational Biology 20(3): e1011939code

Combining Rosetta Sequence Design with Protein Language Model Predictions Using Evolutionary Scale Modeling (ESM) as Restraint
Moritz Ertelt, Jens Meiler, and Clara T. Schoeder
ACS Synth. Biol. 2024code

Design of Antigen-Specific Antibody CDRH3 Sequences Using AI and Germline-Based Templates
Toma M. Marinov, Alexandra A. Abu-Shmais, Alexis K. Janke, Ivelin S. Georgiev
bioRxiv 2024.03.22.586241

Design of highly functional genome editors by modeling the universe of CRISPR-Cas sequences
Jeffrey A. Ruffolo, Stephen Nayfach, Joseph Gallagher, Aadyot Bhatnagar, Joel Beazer, Riffat Hussain, Jordan Russ, Jennifer Yip, Emily Hill, Martin Pacesa, Alexander J. Meeske, Peter Cameron, Ali Madani
bioRxiv 2024.04.22.590591code

Functional Protein Design with Local Domain Alignment
Chaohao Yuan, Songyou Li, Geyan Ye, Yikun Zhang, Long-Kai Huang, Wenbing Huang, Wei Liu, Jianhua Yao, Yu Rong
arXiv:2404.16866

The Continuous Language of Protein Structure
Lukas Billera, Anton Oresten, Aron Stålmarck, Kenta Sato, Mateusz Kaduk, Ben Murrell
bioRxiv 2024.05.11.593685code

Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates
Zhenqiao Song, Yunlong Zhao, Wenxian Shi, Wengong Jin, Yang Yang, Lei Li
arXiv:2405.08205/ICML 2024code

A generative foundation model for antibody sequence understanding
Justin Barton, Aretas Gaspariunas, David A Yadin, Jorge Dias, Francesca L Nice, Danielle H Minns, Olivia Snudden, Chelsea Povall, Sara Valle Tomas, Harry Dobson, James HR Farmery, Jinwoo Leem, Jacob D Galson
bioRxiv 2024.05.22.594943huggingface

Decoupled Sequence and Structure Generation for Realistic Antibody Design
Nayoung Kim, Minsu Kim, Sungsoo Ahn, Jinkyoo Park
arXiv:2402.05982code

MoFormer: Multi-objective Antimicrobial Peptide Generation Based on Conditional Transformer Joint Multi-modal Fusion Descriptor
Li Wang, Xiangzheng Fu, Jiahao Yang, Xinyi Zhang, Xiucai Ye, Yiping Liu, Tetsuya Sakurai, Xiangxiang Zeng
arXiv:2406.00735

HELM-GPT: de novo macrocyclic peptide design using generative pre-trained transformer
Xiaopeng Xu, Chencheng Xu, Wenjia He, Lesong Wei, Haoyang Li, Juexiao Zhou, Ruochi Zhang, Yu Wang, Yuanpeng Xiong, Xin Gao
Bioinformatics (2024): btae364code

Unifying Sequences, Structures, and Descriptions for Any-to-Any Protein Generation with the Large Multimodal Model HelixProtX
Zhiyuan Chen, Tianhao Chen, Chenggang Xie, Yang Xue, Xiaonan Zhang, Jingbo Zhou, Xiaomin Fang
arXiv:2407.09274code

A foundation model approach to guide antimicrobial peptide design in the era of artificial intelligence driven scientific discovery
Jike Wang, Jianwen Feng, Yu Kang, Peichen Pan, Jingxuan Ge, Yan Wang, Mingyang Wang, Zhenxing Wu, Xingcai Zhang, Jiameng Yu, Xujun Zhang, Tianyue Wang, Lirong Wen, Guangning Yan, Yafeng Deng, Hui Shi, Chang-Yu Hsieh, Zhihui Jiang, Tingjun Hou
arXiv:2407.12296code

5.5 Bayesian-based

Optimistic Games for Combinatorial Bayesian Optimization with Applications to Protein Design
Melis Ilayda Bal, Pier Giuseppe Sessa, Mojmir Mutny, Andreas Krause NeurIPS 2023 Workshop on Adaptive Experimental Design and Active Learning in the Real World, 2023

Discovering de novo peptide substrates for enzymes using machine learning
Lorillee Tallorin, JiaLei Wang, Woojoo E. Kim, Swagat Sahu, Nicolas M. Kosa, Pu Yang, Matthew Thompson, Michael K. Gilson, Peter I. Frazier, Michael D. Burkart & Nathan C. Gianneschi
Nature communications 9.1 (2018)code

Biological Sequences Design using Batched Bayesian Optimization
David Belanger, Suhani Vora, Zelda Mariet, Ramya Deshpande, David Dohan, Christof Angermueller, Kevin Murphy, Olivier Chapelle, Lucy Colwell
Machine Learning and the Physical Sciences Workshop (NeurIPS 2019)

Lattice protein design using Bayesian learning
Takahashi, Tomoei, George Chikenji, and Kei Tokita.
arXiv:2003.06601/Physical Review E 104.1 (2021): 014404

Now What Sequence? Pre-trained Ensembles for Bayesian Optimization of Protein Sequences
Ziyue Yang, Katarina A Milas, Andrew D White
bioRxiv 2022.08.05.502972codeSupplementaryColab

AntBO: Towards Real-World Automated Antibody Design with Combinatorial Bayesian Optimisation
Asif Khan, Alexander I. Cowen-Rivers, Antoine Grosnit, Derrick-Goh-Xin Deik, Philippe A. Robert, Victor Greiff, Eva Smorodina, Puneet Rawat, Kamil Dreczkowski, Rahmad Akbar, Rasul Tutunov, Dany Bou-Ammar, Jun Wang, Amos Storkey, Haitham Bou-Ammar
arXiv preprint (2022)/Cell Reports Methods (2023): 100374

Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders
Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, Andrew Gordon Wilson
ICML 2022code

Statistical Mechanics of Protein Design
Takahashi, Tomoei, George Chikenji, and Kei Tokita.
arXiv preprint arXiv:2205.03696 (2022)

PropertyDAG: Multi-objective Bayesian optimization of partially ordered, mixed-variable properties for biological sequence design
Ji Won Park, Samuel Stanton, Saeed Saremi, Andrew Watkins, Henri Dwyer, Vladimir Gligorijevic, Richard Bonneau, Stephen Ra, Kyunghyun Cho
arXiv:2210.04096

A probabilistic view of protein stability, conformational specificity, and design
Jacob A. Stern, Tyler J. Free, Kimberlee L. Stern, Spencer Gardiner, Nicholas A. Dalley, Bradley C. Bundy, Joshua L. Price, David Wingate, Dennis Della Corte
bioRxiv 2022.12.28.521825Supplementary

Design of antimicrobial peptides containing non-proteinogenic amino acids using multi-objective Bayesian optimisation
Murakami Y, Ishida S, Demizu Y, Terayama K.
ChemRxiv. Cambridge: Cambridge Open Engage; 2023code

Vaxformer: Antigenicity-controlled Transformer for Vaccine Design Against SARS-CoV-2
Aryo Pradipta Gema, Michał Kobiela, Achille Fraisse, Ajitha Rajan, Diego A. Oyarzún, Javier Antonio Alfaro
arXiv:2305.11194code

Sample-efficient Antibody Design through Protein Language Model for Risk-aware Batch Bayesian Optimization
Yanzheng Wang, Boyue Wang, Tianyu Shi, Jie Fu, Yi Zhou, Zhizhuo Zhang
bioRxiv 2023.11.06.565922

Integrating Protein Structure Prediction and Bayesian Optimization for Peptide Design
Negin Manshour, Fei He, Duolin Wang, Dong Xu
NeurIPS 2023 Generative AI and Biology (GenBio) Workshop. 2023

5.6 RL-based

Model-based reinforcement learning for biological sequence design
Christof Angermueller, David Dohan, David Belanger, Ramya Deshpande, Kevin Murphy, Lucy Colwell
International conference on learning representations. 2019

Structured Q-learning For Antibody Design
Alexander I. Cowen-Rivers, Philip John Gorinski, Aivar Sootla, Asif Khan, Liu Furui, Jun Wang, Jan Peters, Haitham Bou Ammar
arXiv preprint arXiv:2209.04698 (2022)

Protein Sequence Design in a Latent Space via Model-based Reinforcement Learning
Minji Lee, Luiz Felipe Vecchietti, Hyunkyu Jung, Hyunjoo Ro, Ho Min Kim, Meeyoung Cha
ICLR 2023/NeurIPS 2022Supplementary

Designing Biological Sequences via Meta-Reinforcement Learning and Bayesian Optimization
Leo Feng, Padideh Nouri, Aneri Muni, Yoshua Bengio, Pierre-Luc Bacon
arXiv:2209.06259/NeurIPS 2022poster

Self-play reinforcement learning guides protein engineering
Yi Wang, Hui Tang, Lichao Huang, Lulu Pan, Lixiang Yang, Huanming Yang, Feng Mu & Meng Yang
Nature Machine Intelligence (2023)code

Curiosity Driven Protein Sequence Generation via Reinforcement Learning
Anonymous
ICLR 2024 under review

Stable Online and Offline Reinforcement Learning for Antibody CDRH3 Design
Yannick Vogt, Mehdi Naouar, Maria Kalweit, Christoph Cornelius Miething, Justus Duyster, Roland Mertelsmann, Gabriel Kalweit, Joschka Boedecker
arXiv:2401.05341

Peptide Vaccine Design by Evolutionary Multi-Objective Optimization
Dan-Xuan Liu, Yi-Heng Xu, Chao Qian
arXiv:2406.05743

Reinforcement Learning for Sequence Design Leveraging Protein Language Models
Jithendaraa Subramanian, Shivakanth Sujit, Niloy Irtisam, Umong Sain, Derek Nowrouzezahrai, Samira Ebrahimi Kahou, Riashat Islam
arXiv:2407.03154

5.7 Flow-based

Biological Sequence Design with GFlowNets
Moksh Jain, Emmanuel Bengio, Alex-Hernandez Garcia, Jarrid Rector-Brooks, Bonaventure F. P. Dossou, Chanakya Ekbote, Jie Fu, Tianyu Zhang, Micheal Kilgour, Dinghuai Zhang, Lena Simine, Payel Das, Yoshua Bengio
arXiv preprint arXiv:2203.04115 (2022)lecture

5.8 RNN-based

Deep learning to design nuclear-targeting abiotic miniproteins
Carly K. Schissel, Somesh Mohapatra, Justin M. Wolfe, Colin M. Fadzen, Kamela Bellovoda, Chia-Ling Wu, Jenna A. Wood, Annika B. Malmberg, Andrei Loas, Rafael Gómez-Bombarelli & Bradley L. Pentelute
Nature Chemistry 13.10 (2021)code

Recurrent neural network model for constructive peptide design
Müller, Alex T., Jan A. Hiss, and Gisbert Schneider.
Journal of chemical information and modeling 58.2 (2018)

Machine learning designs non-hemolytic antimicrobial peptides
Alice Capecchi, Xingguang Cai, Hippolyte Personne, Thilo Köhler, Christian van Delden, and Jean-Louis Reymond
Chemical Science 12.26 (2021)

Using molecular dynamics simulations to prioritize and understand AI-generated cell penetrating peptides
Duy Phuoc Tran, Seiichi Tada, Akiko Yumoto, Akio Kitao, Yoshihiro Ito, Takanori Uzawa & Koji Tsuda
Scientific reports 11.1 (2021)

De novo antioxidant peptide design via machine learning and DFT studies
Parsa Hesamzadeh, Abdolvahab Seif, Kazem Mahmoudzadeh, Mokhtar Ganjali Koli, Amrollah Mostafazadeh, Kosar Nayeri, Zohreh Mirjafary & Hamid Saeidian
Scientific Reports 14.1 (2024)code

5.9 LSTM-based

Computational antimicrobial peptide design and evaluation against multidrug-resistant clinical isolates of bacteria
Deepesh Nagarajan, Tushar Nagarajan, Natasha Roy, Omkar Kulkarni, Sathyabaarathi Ravichandran, Madhulika Mishra Dipshikha Chakravortty, Nagasuma Chandra
Journal of Biological Chemistry 293.10 (2018)

Deep learning enables the design of functional de novo antimicrobial proteins
Javier Caceres-Delpiano, Roberto Ibañez, Patricio Alegre, Cynthia Sanhueza, Romualdo Paz-Fiblas, Simon Correa, Pedro Retamal, Juan Cristóbal Jiménez, Leonardo Álvarez
bioRxiv (2020)

ECNet is an evolutionary context-integrated deep learning framework for protein engineering
Yunan Luo, Guangde Jiang, Tianhao Yu, Yang Liu, Lam Vo, Hantian Ding, Yufeng Su, Wesley Wei Qian, Huimin Zhao & Jian Peng
Nature communications 12.1 (2021)

Deep learning for novel antimicrobial peptide design
Wang, Christina, Sam Garlick, and Mire Zloh.
Biomolecules 11.3 (2021)

Antibody design using LSTM based deep generative model from phage display library for affinity maturation
Koichiro Saka, Taro Kakuzaki, Shoichi Metsugi, Daiki Kashiwagi, Kenji Yoshida, Manabu Wada, Hiroyuki Tsunoda & Reiji Teramoto
Scientific reports 11.1 (2021)

In silico proof of principle of machine learning-based antibody design at unconstrained scale
Akbar, Rahmad, et al.
Mabs. Vol. 14. No. 1. Taylor & Francis, 2022code

Large-scale design and refinement of stable proteins using sequence-only models
Jedediah M. Singer , Scott Novotney, Devin Strickland, Hugh K. Haddox, Nicholas Leiby, Gabriel J. Rocklin, Cameron M. Chow, Anindya Roy, Asim K. Bera, Francis C. Motta, Longxing Cao, Eva-Maria Strauch, Tamuka M. Chidyausiku, Alex Ford, Ethan Ho, Alexander Zaitzeff, Craig O. Mackenzie, Hamed Eramian, Frank DiMaio, Gevorg Grigoryan, Matthew Vaughn, Lance J. Stewart, David Baker, Eric Klavins
PloS one 17.3 (2022)code

Deep-learning based bioactive therapeutic peptides generation and screening
Haiping Zhang, Konda Mani Saravanan, Yanjie Wei, Yang Jiao, Yang Yang, Yi Pan, Xuli Wu, John Z.H. Zhang
bioRxiv 2022.11.14.516530codeSupplementary

Deep-learning based bioactive peptides generation and screening against Xanthine oxidase
Haiping Zhang, Konda Mani Saravanan, John Z.H. Zhang, Xuli Wu
bioRxiv 2023.01.11.523536

Deep Learning-Based Bioactive Therapeutic Peptide Generation and Screening
Haiping Zhang, Konda Mani Saravanan, Yanjie Wei, Yang Jiao, Yang Yang, Yi Pan, Xuli Wu, and John Z. H. Zhang
Journal of Chemical Information and Modeling 63.3 (2023)code

5.10 Autoregressive-models

Efficient generative modeling of protein sequences using simple autoregressive models
Jeanne Trinquier, Guido Uguzzoni, Andrea Pagnani, Francesco Zamponi & Martin Weigt
Nature communications 12.1 (2021): 1-11code

Conformal prediction for the design problem
Clara Fannjiang, Stephen Bates, Anastasios N. Angelopoulos, Jennifer Listgarten, Michael I. Jordan
arXiv:2202.03613v4code

5.11 Boltzmann-machine-based

How pairwise coevolutionary models capture the collective residue variability in proteins?
Figliuzzi, Matteo, Pierre Barrat-Charlaix, and Martin Weigt.
Molecular biology and evolution 35.4 (2018): 1018-1027code

A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences
Nataša Tagasovska, Nathan C. Frey, Andreas Loukas, Isidro Hötzel, Julien Lafrance-Vanasse, Ryan Lewis Kelly, Yan Wu, Arvind Rajpal, Richard Bonneau, Kyunghyun Cho, Stephen Ra, Vladimir Gligorijević
arXiv:2210.10838slides

Computational design of novel Cas9 PAM-interacting domains using evolution-based modelling and structural quality assessment
Cyril Malbranke, William Rostain, Florence Depardieu, Simona Cocco, Remi Monasson, David Bikard
bioRxiv 2023.03.20.533501codeSupplementary

Protein Discovery with Discrete Walk-Jump Sampling
Nathan C. Frey, Daniel Berenberg, Karina Zadorozhny, Joseph Kleinhenz, Julien Lafrance-Vanasse, Isidro Hotzel, Yan Wu, Stephen Ra, Richard Bonneau, Kyunghyun Cho, Andreas Loukas, Vladimir Gligorijevic, Saeed Saremi
arXiv:2306.12360/ICLR 2024 under reviewcodelecture

5.12 Diffusion-based

denoising-diffusion-protein-sequence
Zhangzhi Peng
Paper unavailable • github

Protein Design with Guided Discrete Diffusion
Nate Gruver, Samuel Stanton, Nathan C. Frey, Tim G. J. Rudner, Isidro Hotzel, Julien Lafrance-Vanasse, Arvind Rajpal, Kyunghyun Cho, Andrew Gordon Wilson
arXiv:2305.20009codelecture

PRO-LDM: Protein Sequence Generation with Conditional Latent Diffusion Models
Zixuan Jiang, Sitao Zhang, Rundong Huang, Shaoxun Mo, Letao Zhu, Peiheng Li, Ziyi Zhang, Xi Chen, Yunfei Long, Renjing Xu, Rui Qing
bioRxiv 2023.08.22.554145Supplementary

Protein generation with evolutionary diffusion: sequence is all you need
Sarah Alamdari, Nitya Thakkar, Rianne van den Berg, Alex Xijie Lu, Nicolo Fusi, Ava Pardis Amini, Kevin K Yang
bioRxiv 2023.09.11.556673codedatalecture

AntiBARTy Diffusion for Property Guided Antibody Design
Jordan Venderley
arXiv:2309.13129

ProT-Diff: A Modularized and Efficient Approach to De Novo Generation of Antimicrobial Peptide Sequences through Integration of Protein Language Model and Diffusion Model
Xue-Fei Wang, Jing-Ya Tang, Han Liang, Jing Sun, Sonam Dorje, Bo Peng, Xu-Wo Ji, Zhe Li, Xian-En Zhang, Dian-Bing Wang
bioRxiv 2024.02.22.581480Supplementary

TaxDiff: Taxonomic-Guided Diffusion Model for Protein Sequence Generation
Lin Zongying, Li Hao, Lv Liuzhenghao, Lin Bin, Zhang Junwu, Chen Calvin Yu-Chian, Yuan Li, Tian Yonghong
arXiv:2402.17156code

Diffusion on language model embeddings for protein sequence generation
Viacheslav Meshchaninov, Pavel Strashnov, Andrey Shevtsov, Fedor Nikolaev, Nikita Ivanisenko, Olga Kardymon, Dmitry Vetrov
arXiv:2403.03726

AMP-Diffusion: Integrating Latent Diffusion with Protein Language Models for Antimicrobial Peptide Generation
Tianlai Chen, Pranay Vure, Rishab Pulugurta, Pranam Chatterjee
bioRxiv 2024.03.03.583201

Atomically accurate de novo design of single-domain antibodies
Nathaniel R. Bennett, Joseph L. Watson, Robert J. Ragotte, Andrew J. Borst, DeJenae L. See, Connor Weidle, Riti Biswas, Ellen L. Shrock, Philip J. Y. Leung, Buwei Huang, Inna Goreshnik, Russell Ault, Kenneth D. Carr, Benedikt Singer, Cameron Criswell, Dionne Vafeados, Mariana Garcia Sanchez, Ho Min Kim, Susana Vazquez Torres, Sidney Chan, David Baker
bioRxiv 2024.03.14.585103Supplementary

Complex-based Ligand-Binding Proteins Redesign by Equivariant Diffusion-based Generative Models
Viet Thanh Duy Nguyen, Nhan Nguyen, Truong Son Hy
bioRxiv 2024.04.17.589997code

Context-Guided Diffusion for Out-of-Distribution Molecular and Protein Design
Leo Klarner, Tim G. J. Rudner, Garrett M. Morris, Charlotte M. Deane, Yee Whye Teh
arXiv:2407.11942code

Secondary Structure-Guided Novel Protein Sequence Generation with Latent Graph Diffusion
Yutong Hu, Yang Tan, Andi Han, Lirong Zheng, Liang Hong, Bingxin Zhou
arXiv:2407.07443code

5.13 GNN-based

Generative Pretrained Autoregressive Transformer Graph Neural Network applied to the Analysis and Discovery of Novel Proteins
Markus J. Buehler
arXiv:2305.04934code

5.14 Score-based

Microdroplet screening rapidly profiles a biocatalyst to enable its AI-assisted engineering
Maximilian Gantz, Simon V. Mathis, Friederike E. H. Nintzel, Paul J. Zurek, Tanja Knaus, Elie Patel, Daniel Boros, Friedrich-Maximilian Weberling, Matthew R. A. Kenneth, Oskar J. Klein, Elliot J. Medcalf, Jacob Moss, Michael Herger, Tomasz S. Kaminski, Francesco G. Mutti, Pietro Lio, Florian Hollfelder
bioRxiv (2024.04.08)

Bootstrapped Training of Score-Conditioned Generator for Offline Design of Biological Sequences
Minsu Kim, Federico Berto, Sungsoo Ahn, Jinkyoo Park
arXiv:2306.03111code

6. Function to Structure

These models generate protein structures(including side chains) from expected function or recover a part of protein structures(aka. inpainting)

6.1 LSTM-based

One-sided design of protein-protein interaction motifs using deep learning
Syrlybaeva, Raulia, and Eva-Maria Strauch.
bioRxiv (2022)codeour noteslecture

6.2 Diffusion-based

Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models
Namrata Anand, Tudor Achim
GitHub (2022)/arXiv (2022)our noteslecture

Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures
Shitong Luo, Yufeng Su, Xingang Peng, Sheng Wang, Jian Peng, Jianzhu Ma
bioRxiv 2022.07.10.499510/ICML (2023)codehugging face

Illuminating protein space with a programmable generative model
John Ingraham, Max Baranov, Zak Costello, Vincent Frappier, Ahmed Ismail, Shan Tie, Wujie Wang, Vincent Xue, Fritz Obermeyer, Andrew Beam, Gevorg Grigoryan
Generate Biomedicines Preprint/bioRxiv 2022.12.01.518682/Nature (2023)websitenewscodecolab • commercial

Physics-Inspired Protein Encoder Pre-Training via Siamese Sequence-Structure Diffusion Trajectory Prediction
Zuobai Zhang, Minghao Xu, Aurélie Lozano, Vijil Chenthamarakshan, Payel Das, Jian Tang
arXiv:2301.12068code

TRDiffusion
TIANRANG XLab
newswebsite • commercial

An all-atom protein generative model
Alexander E Chu, Lucy Cheng, Gina El Nesr, Minkai Xu, Po-Ssu Huang
bioRxiv 2023.05.24.542194code

DiffPack: A Torsional Diffusion Model for Autoregressive Protein Side-Chain Packing
Yangtian Zhan, Zuobai Zhang, Bozitao Zhong, Sanchit Misra, Jian Tang
arxiv 2023.06.01code

AbDiffuser: Full-Atom Generation of In-Vitro Functioning Antibodies
Karolis Martinkus, Jan Ludwiczak, Kyunghyun Cho, Wei-Ching Lian, Julien Lafrance-Vanasse, Isidro Hotzel, Arvind Rajpal, Yan Wu, Richard Bonneau, Vladimir Gligorijevic, Andreas Loukas
arXiv:2308.05027lecture

Generative Diffusion Models for Antibody Design, Docking, and Optimization
Zhangzhi Peng, Chenchen Han, Xiaohan Wang, Dapeng Li, Fajiie Yuan
bioRxiv 2023.09.25.559190codewebsite

Bridging Sequence and Structure: Latent Diffusion for Conditional Protein Generation
Anonymous
ICLR 2024 under review

Guiding diffusion models for antibody sequence and structure co-design with developability properties
Amelia Villegas-Morcillo, Jana M. Weber, Marcel J.T. Reinders
bioRxiv 2023.11.22.568230/NeurIPS 2023 Generative AI and Biology Workshopcode

A Multi-Modal Contrastive Diffusion Model for Therapeutic Peptide Generation
Yongkang Wang, Xuan Liu, Feng Huang, Zhankun Xiong, Wen Zhang
arXiv:2312.15665code

Towards Joint Sequence-Structure Generation of Nucleic Acid and Protein Complexes with SE(3)-Discrete Diffusion
Alex Morehead, Jeffrey Ruffolo, Aadyot Bhatnagar, Ali Madani
arXiv:2401.06151code

Proteus: exploring protein structure generation for enhanced designability and efficiency
Chentong Wang, Yannan Qu, Zhangzhi Peng, Yukai Wang, Hongli Zhu, Dachuan Chen, Longxing Cao
bioRxiv 2024.02.10.579791code

Full-Atom Peptide Design with Geometric Latent Diffusion
Xiangzhe Kong, Wenbing Huang, Yang Liu
arXiv:2402.13555

A Hybrid Diffusion Model for Stable, Affinity-Driven, Receptor-Aware Peptide Generation
R Vishva Saravanan, Soham Choudhuri, Bhaswar Ghosh
bioRxiv 2024.03.14.584934codedataset

Antigen-Specific Antibody Design via Direct Energy-based Preference Optimization
Xiangxin Zhou, Dongyu Xue, Ruizhe Chen, Zaixiang Zheng, Liang Wang, Quanquan Gu
arXiv:2403.16576

HelixDiff, a Score-Based Diffusion Model for Generating All-Atom α-Helical Structures
Xuezhi Xie, Pedro A Valiente, Jisun Kim, and Philip M Kim
ACS Central Science (2024)code

Combining transformer and 3DCNN models to achieve co-design of structures and sequences of antibodies in a diffusional manner
Yue Hu, Feng Tao, Jun Wen Lan, Jing Zhang bioRxiv 2024.04.25.587828code

Target-Specific De Novo Peptide Binder Design with DiffPepBuilder
Fanhao Wang, Yuzhe Wang, Laiyi Feng, Changsheng Zhang, Luhua Lai
arXiv:2405.00128code

Improving Antibody Design with Force-Guided Sampling in Diffusion Models
Paulina Kulytė, Francisco Vargas, Simon Valentin Mathis, Yu Guang Wang, José Miguel Hernández-Lobato, Pietro Liò
arXiv:2406.05832

Antibody Design Using a Score-based Diffusion Model Guided by Evolutionary, Physical and Geometric Constraints
Tian Zhu, Milong Ren, Haicang Zhang
ICML 2024code

6.3 RoseTTAFold-based

Deep learning methods for designing proteins scaffolding functional sites / Scaffolding protein functional sites using deep learning
Jue Wang, Sidney Lisanza, David Juergens, Doug Tischer, Ivan Anishchenko, Minkyung Baek, Joseph L. Watson, Jung Ho Chun, Lukas F. Milles, Justas Dauparas, Marc Expòsit, Wei Yang, Amijai Saragovi, Sergey Ovchinnikov, David Baker
bioRxiv(2021)/Science(2022)RFDesignour noteslectureRoseTTAFoldSupplementary, Other Supplementary

Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models / De novo design of protein structure and function with RFdiffusion
Joseph L. Watson, David Juergens, Nathaniel R. Bennett, Brian L. Trippe, Jason Yim, Helen E. Eisenach, Woody Ahern, Andrew J. Borst, Robert J. Ragotte, Lukas F. Milles, Basile I. M. Wicky, Nikita Hanikel, Samuel J. Pellock, Alexis Courbet, William Sheffler, Jue Wang, Preetham Venkatesh, Isaac Sappington, Susana Vázquez Torres, Anna Lauko, Valentin De Bortoli, Emile Mathieu, Regina Barzilay, Tommi S. Jaakkola, Frank DiMaio, Minkyung Baek, David Baker
Bakerlab Preprint/bioRxiv 2022.12.09.519842/Nature (2023)news, news2, news3Supplementarylecture, lecture2RFdiffusion:code, Colabblog

De novo design of high-affinity protein binders to bioactive helical peptides
Susana Vázquez Torres, Philip J. Y. Leung, Isaac D. Lutz, Preetham Venkatesh, Joseph L Watson, Fabian Hink, Huu-Hien Huynh, Andy Hsien-Wei Yeh, David Juergens, Nathaniel R. Bennett, Andrew N. Hoofnagle, Eric Huang, Michael J. MacCoss, Marc Expòsit, Gyu Rie Lee, Elif Nihal Korkmaz, Jeff Nivala, Lance Stewart, Joseph M. Rodgers, David Baker
bioRxiv 2022.12.10.519862/Nature (2023)Supplementary

Joint Generation of Protein Sequence and Structure with RoseTTAFold Sequence Space Diffusion
Sidney Lyayuga Lisanza, Jacob Merle Gershon, Sam Wayne Kenmore Tipps, Lucas Arnoldt, Samuel Hendel, Jeremiah Nelson Sims, Xinting Li, David Baker
bioRxiv 2023.05.08.539766codehugging facelecture

The structural landscape of the immunoglobulin fold by large-scale de novo design
Jorge Roel-Touris, Lourdes Carcelen, Enrique Marcos
bioRxiv 2023.10.03.560637/Protein Science (2024)Supplementarycodedata

Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom
Rohith Krishna, Jue Wang, Woody Ahern, Pascal Sturmfels, Preetham Venkatesh, Indrek Kalvet, Gyu Rie Lee, Felix S Morey-Burrows, Ivan Anishchenko, Ian R Humphreys, Ryan McHugh, Dionne Vafeados, Xinting Li, George A Sutherland, Andrew Hitchcock, C Neil Hunter, Minkyung Baek, Frank DiMaio, David Baker
bioRxiv 2023.10.09.561603/ScienceSupplementarycode

Amalga: Designable Protein Backbone Generation with Folding and Inverse Folding Guidance
Shugao Chen, Ziyao Li, Xiangxiang Zeng, Guolin Ke
bioRxiv 2023.11.07.565939

Accurate single domain scaffolding of three non-overlapping protein epitopes using deep learning
Karla M Castro, Joseph L Watson, Jue Wang, Joshua Southern, Reyhaneh Ayardulabi, Sandrine Georgeon, Stephane Rosset, David Baker, Bruno E Correia
bioRxiv 2024.05.07.592871Supplementary

Diversifying de novo TIM barrels by hallucination
Beck, Julian, Sooruban Shanmugaratnam, and Birte Höcker.
Protein Science 33.6 (2024)

De novo designed proteins neutralize lethal snake venom toxins
Susana Vázquez Torres, Melisa Benard Valle, Stephen P. Mackessy, Stefanie K. Menzies, Nicholas R. Casewell, Shirin Ahmadi, Nick J. Burlet, Edin Muratspahić, Isaac Sappington, Max D.Overath, Esperanza Rivera-de-Torre, Jann Ledergerber, Andreas H. Laustsen, Kim Boddum, Asim K.Bera, Alex Kang,Evans Brackenbrough, Iara A. Cardoso, Edouard P. Crittenden, Rebecca J.Edge, Justin Decarreau, Robert J. Ragotte, Arvind S. Pillai, Mohamad Abedi, Hannah L. Han,Stacey R. Gerben, Analisa Murray, Rebecca Skotheim, Lynda Stuart, Lance Stewart, Thomas J.A. Fryer, Timothy P. Jenkins, David Baker
PREPRINT (Version 1) available at Research Square

Controlling semiconductor growth with structured de novo protein interfaces
Amijai Saragovi, Harley Pyles, Paul Kwon, Nikita Hanikel, Fátima A. Dávila-Hernández, Asim K. Bera, Alex Kang, Evans Brackenbrough, Dionne K. Vafeados, Aza Allen, Lance Stewart, David Baker
bioRxiv 2024.06.24.600095Supplementary

Diffusing protein binders to intrinsically disordered proteins
Caixuan Liu, Kejia Wu, Hojun Choi, Hannah Han, Xueli Zhang, Joseph L Watson, Sara Shijo, Asim K Bera, Alex Kang, Evans Brackenbrough, Brian Coventry, Derrick R Hick, Andrew N Hoofnagle, Ping Zhu, Xingting Li, Justin Decarreau, Stacey R Gerben, Wei Yang, Xinru Wang, Mila Lamp, Analisa Murray, Magnus Bauer, David Baker
bioRxiv 2024.07.16.603789Supplementary

6.4 CNN-based

De Novo Design of Site-specific Protein Binders Using Surface Fingerprints
Pablo Gainza, Sarah Wehrle, Alexandra Van Hall-Beauvais, Anthony Marchand, Andreas Scheck, Zander Harteveld, Stephen Buckley, Dongchun Ni, Shuguang Tan, Freyr Sverrisson, Casper Goverde, Priscilla Turelli, Charlène Raclot, Alexandra Teslenko, Martin Pacesa, Stéphane Rosset, Sandrine Georgeon, Jane Marsden, Aaron Petruzzella, Kefang Liu, Zepeng Xu, Yan Chai, Pu Han, George F. Gao, Elisa Oricchio, Beat Fierz, Didier Trono, Henning Stahlberg, Michael Bronstein, Bruno E. Correia
Protein Science 30.CONF (2021)/bioRxiv (2022)/Nature (2023)Supplementarymasif_seedmasiflecture

Targeting protein-ligand neosurfaces using a generalizable deep learning approach
Anthony Marchand, Stephen Buckley, Arne Schneuing, Martin Pacesa, Pablo Gainza, Evgenia Elizarova, Rebecca Manuela Neeser, Pao-Wan Lee, Luc Reymond, Maddalena Elia, Leo Scheller, Sandrine Georgeon, Joseph Schmidt, Philippe Schwaller, Sebastian Josef Maerkl, Michael Bronstein, Bruno Emmanuel Correia
bioRxiv 2024.03.25.585721Supplementarycode

6.5 GNN-based

Iterative refinement graph neural network for antibody sequence-structure co-design
Wengong Jin, Jeremy Wohlwend, Regina Barzilay, Tommi Jaakkola
arXiv preprint arXiv:2110.04624 (2021)RefineGNNlecture1, lecture2

Antibody Complementarity Determining Regions (CDRs) design using Constrained Energy Model
Fu, Tianfan, and Jimeng Sun.
Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022code

Conditional Antibody Design as 3D Equivariant Graph Translation
Xiangzhe Kong, Wenbing Huang, Yang Liu
ICLR 2023/arXiv:2208.06073

End-to-End Full-Atom Antibody Design
Xiangzhe Kong, Wenbing Huang, Yang Liu
arXiv:2302.00203code

AbODE: Ab Initio Antibody Design using Conjoined ODEs
Yogesh Verma, Markus Heinonen, Vikas Garg
arXiv:2306.01005

Joint Design of Protein Sequence and Structure based on Motifs
Zhenqiao Song, Yunlong Zhao, Yufei Song, Wenxian Shi, Yang Yang, Lei Li
arXiv:2310.02546

De novo protein design using geometric vector field networks
Weian Mao, Muzhi Zhu, Zheng Sun, Shuaike Shen, Lin Yuanbo Wu, Hao Chen, Chunhua Shen
arXiv:2310.11802/ICLR 2024 under review

A Survey of Geometric Graph Neural Networks: Data Structures, Models and Applications
Jiaqi Han, Jiacheng Cen, Liming Wu, Zongzhao Li, Xiangzhe Kong, Rui Jiao, Ziyang Yu, Tingyang Xu, Fandi Wu, Zihe Wang, Hongteng Xu, Zhewei Wei, Yang Liu, Yu Rong, Wenbing Huang
arXiv:2403.00485 • review

GeoAB: Towards Realistic Antibody Design and Reliable Affinity Maturation
Haitao LIN, Lirong Wu, Huang Yufei, Yunfan Liu, Odin Zhang, Yuanqing Zhou, Rui Sun, Stan Z Li
bioRxiv 2024.05.15.594274code

Topological Neural Networks go Persistent, Equivariant, and Continuous
Yogesh Verma, Amauri H Souza, Vikas Garg
arXiv:2406.03164code

6.6 Transformer-based

Protein Sequence and Structure Co-Design with Equivariant Translation
Chence Shi, Chuanrui Wang, Jiarui Lu, Bozitao Zhong, Jian Tang
arXiv:2210.08761/ICLR 2023Supplementarycode

Deep Learning for Flexible and Site-Specific Protein Docking and Design
Matt McPartlon, Jinbo Xu
bioRxiv 2023.04.01.535079code

Full-Atom Protein Pocket Design via Iterative Refinement
Zaixi Zhang, Zepu Lu, Zhongkai Hao, Marinka Zitnik, Qi Liu
arXiv:2310.02553code

Functional Geometry Guided Protein Sequence and Backbone Structure Co-Design
Anonymous
ICLR 2024 under review

Fast and accurate modeling and design of antibody-antigen complex using tFold
Fandi Wu, Yu Zhao, Jiaxiang Wu, Biaobin Jiang, Bing He, Longkai Huang, Chenchen Qin, Fan Yang, Ningqiao Huang, Yang Xiao, Rubo Wang, Huaxian Jia, Yu Rong, Yuyi Liu, Houtim Lai, Tingyang Xu, Wei Liu, Peilin Zhao, Jianhua Yao
bioRxiv 2024.02.05.578892website

PocketGen: Generating Full-Atom Ligand-Binding Protein Pockets
Zhang Zaixi, Wanxiang Shen, Qi Liu, Marinka Zitnik
bioRxiv 2024.02.25.581968codewebsite

Simulating 500 million years of evolution with a language model
Thomas Hayes, Roshan Rao, Halil Akin, Nicholas James Sofroniew, Deniz Oktay, Zeming Lin, Robert Verkuil, Vincent Quy Tran, Jonathan Deaton, Marius Wiggert, Rohil Badkundri, Irhum Shafkat, Jun Gong, Alexander Derry, Raul Santiago Molina, Neil Thomas, Yousuf Khan, Chetan Mishra, Carolyn Kim, Liam J. Bartie, Patrick D. Hsu, Tom Sercu, Salvatore Candido, Alexander Rives preprint/bioRxiv 2024.07.01.600583websitecodecolabnews

6.7 MLP-based

Protein Complex Invariant Embedding with Cross-Gate MLP is A One-Shot Antibody Designer
Cheng Tan, Zhangyang Gao, Stan Z. Li
arXiv:2305.09480

6.8 Flow-based

Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design
Andrew Campbell, Jason Yim, Regina Barzilay, Tom Rainforth, Tommi Jaakkola
arXiv:2402.04997codelecture

PPFlow: Target-Aware Peptide Design with Torsional Flow Matching
Haitao Lin, Odin Zhang, Huifeng Zhao, Dejun Jiang, Lirong Wu, Zicheng Liu, Yufei Huang, Stan Z. Li
bioRxiv 2024.03.07.583831/arXiv:2405.06642Supplementary

Full-Atom Peptide Design based on Multi-modal Flow Matching
Jiahan Li, Chaoran Cheng, Zuofan Wu, Ruihan Guo, Shitong Luo, Zhizhou Ren, Jian Peng, Jianzhu Ma
arXiv:2406.00735code

AntibodyFlow: Normalizing Flow Model for Designing Antibody Complementarity-Determining Regions
Bohao Xu, Yanbo Wang, Wenyu Chen, Shimin Shan
arXiv:2406.13162

7. Other tasks

7.1 Effects of mutation & Fitness Landscape

Deep generative models of genetic variation capture the effects of mutations
Adam J. Riesselman, John B. Ingraham & Debora S. Marks
Nature Methodscode::DeepSequence • Oct 2018

Deciphering protein evolution and fitness landscapes with latent space models
Xinqiang Ding, Zhengting Zou & Charles L. Brooks III
Nature Communicationscode::PEVAE • Dec 2019

Is transfer learning necessary for protein landscape prediction?
Shanehsazzadeh, Amir, David Belanger, and David Dohan.
arXiv preprint arXiv:2011.03443 (2020)

Epistatic Net allows the sparse spectral regularization of deep neural networks for inferring fitness functions
Amirali Aghazadeh, Hunter Nisonoff, Orhan Ocal, David H. Brookes, Yijie Huang, O. Ozan Koyluoglu, Jennifer Listgarten & Kannan Ramchandran
Nature Communicationscode • Sep 2021

The generative capacity of probabilistic protein sequence models Francisco McGee, Sandro Hauri, Quentin Novinger, Slobodan Vucetic, Ronald M. Levy, Vincenzo Carnevale & Allan Haldane
Nature Communicationscode::generation_capacity_metricscode::sVAE • Nov 2021

Learning the local landscape of protein structures with convolutional neural networks
Anastasiya V. Kulikova, Daniel J. Diaz, James M. Loy, Andrew D. Ellington & Claus O. Wilke
Journal of Biological Physics 47.4 (2021)

Learning Protein Fitness Models from Evolutionary and Assay-labeled Data
Chloe Hsu, Hunter Nisonoff, Clara Fannjiang, Jennifer Listgarten
Nature Biotechnology (2022)Supplementary Informationcode

Proximal Exploration for Model-guided Protein Sequence Design
Zhizhou Ren, Jiahan Li, Fan Ding, Yuan Zhou, Jianzhu Ma, Jian Peng
BioRxiv (2022)code • commercial

Efficient evolution of human antibodies from general protein language models and sequence information alone
Brian L. Hie, Duo Xu, Varun R. Shanker, Theodora U.J. Bruun, Payton A. Weidenbacher, Shaogeng Tang, Peter S. Kim
bioRxiv (2022)code

Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval
Notin, P., Dias, M., Frazer, J., Marchena-Hurtado, J., Gomez, A., Marks, D.S., Gal, Y.
ICML (2022)/arXiv:2205.13760codehugging face

Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments
Ruyun Hu, Lihao Fu, Yongcan Chen, Junyu Chen, Yu Qiao, Tong Si
bioRxiv 2022.08.11.503535

Antibody optimization enabled by artificial intelligence predictions of binding affinity and naturalness
Sharrol Bachas, Goran Rakocevic, David Spencer, Anand V. Sastry, Robel Haile, John M. Sutton, George Kasun, Andrew Stachyra, Jahir M. Gutierrez, Edriss Yassine, Borka Medjo, Vincent Blay, Christa Kohnert, Jennifer T. Stanton, Alexander Brown, Nebojsa Tijanic, Cailen McCloskey, Rebecca Viazzo, Rebecca Consbruck, Hayley Carter, Simon Levine, Shaheed Abdulhaqq, Jacob Shaul, Abigail B. Ventura, Randal S. Olson, Engin Yapici, Joshua Meier, Sean McClain, Matthew Weinstock, Gregory Hannum, Ariel Schwartz, Miles Gander, Roberto Spreafico
bioRxiv 2022.08.16.504181poster

Construction of a Deep Neural Network Energy Function for Protein Physics
Yang, Huan, Zhaoping Xiong, and Francesco Zonta
Journal of Chemical Theory and Computation (2022)

Inferring protein fitness landscapes from laboratory evolution experiments
Sameer D’Costa, Emily C. Hinds, Chase R. Freschlin, Hyebin Song, Philip A. Romero
bioRxiv 2022.09.01.506224Supplementary

BayeStab: Predicting Effects of Mutations on Protein Stability with Uncertainty Quantification
Shuyu Wang, Hongzhou Tang, Yuliang Zhao, Lei Zuo
Protein Science (2022)codewebsite

Tuned Fitness Landscapes for Benchmarking Model-Guided Protein Design
Neil Thomas, Atish Agarwala, David Belanger, Yun S. Song, Lucy Colwell
bioRxiv 2022.10.28.514293code

Protein design using structure-based residue preferences
David Ding, Ada Y Shaw, Sam Sinai, Nathan J Rollins, Noam Prywes, David Savage, Michael T Laub, Debora S Marks
bioRxiv 2022.10.31.514613code

Accurate Mutation Effect Prediction using RoseTTAFold
Sanaa Mansoor, Minkyung Baek, David Juergens, Joseph L Watson, David Baker
bioRxiv 2022.11.04.515218

Learning the shape of protein micro-environments with a holographic convolutional neural network
Michael N. Pun, Andrew Ivanov, Quinn Bellamy, Zachary Montague, Colin LaMont, Philip Bradley, Jakub Otwinowski, Armita Nourmohammad
bioRxiv (2022)code

Infer global, predict local: quantity-quality trade-off in protein fitness predictions from sequence data
Lorenzo Posani, Francesca Rizzato, Rémi Monasson, Simona Cocco
bioRxiv 2022.12.12.520004

Validation of de novo designed water-soluble and transmembrane proteins by in silico folding and melting
Alvaro Martin, Carolin Berner, Sergey Ovchinnikov, Anastassia Andreevna Vorobieva
bioRxiv 2023.06.06.543955colab

PoET: A generative model of protein families as sequences-of-sequences
Timothy F. Truong Jr, Tristan Bepler
arXiv:2306.06156code

Rapid protein stability prediction using deep learning representations
Lasse M BlaabjergMaher M KassemLydia L GoodNicolas JonssonMatteo CagiadaKristoffer E JohanssonWouter BoomsmaAmelie SteinKresten Lindorff-Larsen
eLife 12:e82593code

A general Temperature-Guided Language model to engineer enhanced Stability and Activity in Proteins
Pan Tan, Mingchen Li, Yuanxi Yu, Fan Jiang, Lirong Zheng, Banghao Wu, Xinyu Sun, Liqi Kang, Jie Song, Liang Zhang, Yi Xiong, Wanli Ouyang, Zhiqiang Hu, Guisheng Fan, Yufeng Pei, Liang Hong
arXiv:2307.12682

Transfer learning to leverage larger datasets for improved prediction of protein stability changes
Henry Dieckhaus, Michael Brocidiacono, Nicholas Randolph, Brian Kuhlman
bioRxiv 2023.07.27.550881codeSupplementary

Structure-based self-supervised learning enables ultrafast prediction of stability changes upon mutation at the protein universe scale Jinyuan Sun, Tong Zhu, Yinglu Cui, Bian Wu
bioRxiv 2023.08.09.552725code

Boosting AND/OR-Based Computational Protein Design: Dynamic Heuristics and Generalizable UFO
Bobak Pezeshki, Radu Marinescu, Alexander Ihler, Rina Dechter
arXiv:2309.00408

Zero-shot Mutation Effect Prediction on Protein Stability and Function using RoseTTAFold
Sanaa Mansoor, Minkyung Baek, David Juergens, Joseph L. Watson, David Baker
Protein Sciencedissertation

Accurate proteome-wide missense variant effect prediction with AlphaMissense
Jun Cheng, Guido Novati, Joshua Pan, Clare Bycroft, Akvile Žemgulyte, Taylor Applebaum, Alexander Pritzel, Lai Hong Wong, Michal Zielinski, Tobias Sargeant, Rosalia G. Schneider, Andrew W. Senior, John Jumper, Demis Hassabis, Pushmeet Kohli, Žiga Avsec
Science0,eadg7492DOI:10.1126/science.adg7492codedata

What makes the effect of protein mutations difficult to predict?
Floris Julian van der Flier, Dave Estell, Sina Pricelius, Lydia Dankmeyer, Sander van Stigt Thans, Harm Mulder, Rei Otsuka, Frits Goedegebuur, Laurens Lammerts, Diego Staphorst, Aalt D.J. van Dijk, Dick de Ridder, Henning Redestig
bioRxiv 2023.09.25.559319code

Fast, accurate ranking of engineered proteins by target binding propensity using structure modeling
Xiaozhe Ding, Xinhong Chen, Erin E. Sullivan, Timothy F. Shay, Viviana Gradinaru
bioRxiv 2023.01.11.523680/Molecular Therapy (2024)codecolab

Neural network extrapolation to distant regions of the protein fitness landscape
Sarah A Fahlberg, Chase R Freschlin, Pete Heinzelman, Philip A Romero
bioRxiv 2023.11.08.566287Supplementary

Accelerating protein engineering with fitness landscape modeling and reinforcement learning
Haoran Sun, Liang He, Pan Deng, Guoqing Liu, Haiguang Liu, Chuan Cao, Fusong Ju, Lijun Wu, Tao Qin, Tie-Yan Liu
bioRxiv 2023.11.16.565910

Protein Design by Directed Evolution Guided by Large Language Models
Trong Thanh Tran, Truong Son Hy
bioRxiv 2023.11.29.568945Supplementarycode

High-throughput ML-guided design of diverse single-domain antibodies against SARS-CoV-2
Christof Angermueller, Zelda Marie, Benjamin Jester, Emily Engelhart, Ryan Emerson, Babak Alipanahi, Zachary Ryan McCaw, Jim Roberts, Randolph M Lopez, David Younger, Lucy Colwell
bioRxiv 2023.12.01.569227

Efficiently Predicting Protein Stability Changes Upon Single-point Mutation with Large Language Models
Yijie Zhang, Zhangyang Gao, Cheng Tan, Stan Z.Li
arXiv preprint arXiv:2312.04019 (2023)

DSMBind: SE(3) denoising score matching for unsupervised binding energy prediction and nanobody design
Wengong Jin, Xun Chen, Amrita Vetticaden, Siranush Sarzikova, Raktima Raychowdhury, Caroline Uhler, Nir Hacohen
bioRxiv 2023.12.10.570461Supplementary1Supplementary2

Inverse folding of protein complexes with a structure-informed language model enables unsupervised antibody evolution
Varun R. Shanker, Theodora U.J. Bruun, Brian L. Hie, Peter S. Kim
bioRxiv 2023.12.19.572475

EvolMPNN: Predicting Mutational Effect on Homologous Proteins by Evolution Encoding
Zhiqiang Zhong, Davide Mottin
arXiv:2402.13418

Generating mutants of monotone affinity towards stronger protein complexes through adversarial learning
Tian Lan, Shuquan Su, Pengyao Ping, Gyorgy Hutvagner, Tao Liu, Yi Pan & Jinyan Li
Nat Mach Intell 6, 315–325 (2024)code

Latent-based Directed Evolution accelerated by Gradient Ascent for Protein Sequence Design
Nhat Khang Ngo, Thanh V. T. Tran, Viet Thanh Duy Nguyen, Truong Son Hy
bioRxiv 2024.04.13.589381code

AAVDiff: Experimental Validation of Enhanced Viability and Diversity in Recombinant Adeno-Associated Virus (AAV) Capsids through Diffusion Generation
Lijun Liu, Jiali Yang, Jianfei Song, Xinglin Yang, Lele Niu, Zeqi Cai, Hui Shi, Tingjun Hou, Chang-yu Hsieh, Weiran Shen, Yafeng Deng
arXiv:2404.10573

Protein engineering with lightweight graph denoising neural networks
Bingxin Zhou, Lirong Zheng, Banghao Wu, Yang Tan, Outongyi Lv, Kai Yi, Guisheng Fan, and Liang Hong
Journal of Chemical Information and Modeling (2024)code

VespaG: Expert-guided protein Language Models enable accurate and blazingly fast fitness prediction
Celine Marquet, Julius Schlensok, Marina Abakarova, Burkhard Rost, Elodie Laine
bioRxiv 2024.04.24.590982code

Interface design of SARS-CoV-2 symmetrical nsp7 dimer and machine learning-guided nsp7 sequence prediction reveals physicochemical properties and hotspots for nsp7 stability, adaptation, and therapeutic design
Amar Jeet Yadav, Shivank Kumar, Shweata Maurya, Khushboo Bhagat, and Aditya K. Padhi
Physical Chemistry Chemical Physics (2024)

Aligning protein generative models with experimental fitness via Direct Preference Optimization
Talal Widatalla, Rafael Rafailov, Brian Hie
bioRxiv 2024.05.20.595026code

ProBASS – a language model with sequence and structural features for predicting the effect of mutations on binding affinity
Sagara N.S. Gurusinghe, Yibing Wu, William DeGrado, Julia M. Shifman
bioRxiv 2024.06.21.600041code

Unsupervised evolution of protein and antibody complexes with a structure-informed language model
Varun R. Shanker, Theodora U. J. Bruun, Brian L. Hie, Peter S. Kim
Science385,46-53(2024)code

Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning
Ziyi Zhou, Liang Zhang, Yuanxi Yu, Banghao Wu, Mingchen Li, Liang Hong & Pan Tan
Nat Commun 15, 5566 (2024)code

Rapid protein evolution by few-shot learning with a protein language model Kaiyi Jiang, Zhaoqing Yan, Matteo Di Bernardo, Samantha R. Sgrizzi, Lukas Villiger, Alisan Kayabolen, Byungji Kim, Josephine K. Carscadden, Masahiro Hiraizumi, Hiroshi Nishimasu, Jonathan S. Gootenberg, Omar O. Abudayyeh
bioRxiv 2024.07.17.604015code1,code2

7.2 Protein Language Models (pLM) and representation learning

More detailed protein representation learning list:
Lirong Wu's awesome-protein-representation-learning

Unified rational protein engineering with sequence-based deep representation learning
Ethan C. Alley, Grigory Khimulya, Surojit Biswas, Mohammed AlQuraishi & George M. Church
Nature methods 16.12 (2019)

Protein Structure Representation Learning by Geometric Pretraining
Zuobai Zhang, Minghao Xu, Arian Jamasb, Vijil Chenthamarakshan, Aurelie Lozano, Payel Das, Jian Tang
arXiv • Jan 2022

Evolutionary velocity with protein language models
Brian L. Hie, Kevin K. Yang, and Peter S. Kim
bioRxiv

Advancing protein language models with linguistics: a roadmap for improved interpretability
Mai Ha Vu, Rahmad Akbar, Philippe A. Robert, Bartlomiej Swiatczak, Victor Greiff, Geir Kjetil Sandve, Dag Trygve Truslew Haug
arXiv:2207.00982

Deciphering the language of antibodies using self-supervised learning
Jinwoo Leem, Laura S. Mitchell, James H.R. Farmery, Justin Barton, Jacob D. Galson
Patterns (2022): 100513code

On Pre-training Language Model for Antibody
Anonymous(Paper under double-blind review)
ICLR 2023Supplementary

Antibody Representation Learning for Drug Discovery
Lin Li, Esther Gupta, John Spaeth, Leslie Shing, Tristan Bepler, Rajmonda Sulo Caceres
arXiv:2210.02881

Learning Complete Protein Representation by Deep Coupling of Sequence and Structure
Bozhen Hu, Cheng Tan, Jun Xia, Jiangbin Zheng, Yufei Huang, Lirong Wu, Yue Liu, Yongjie Xu, Stan Z. Li
bioRxiv 2023.07.05.547769

Leveraging Ancestral Sequence Reconstruction for Protein Representation Learning
D. S. Matthews, M. A. Spence, A. C. Mater, J. Nichols, S. B. Pulsford, M. Sandhu, J. A. Kaczmarski, C. M. Miton, N. Tokuriki, C. J. Jackson
bioRxiv 2023.12.20.572683code

Protein language models are biased by unequal sequence sampling across the tree of life
Frances Ding, Jacob Steinhardt
bioRxiv 2024.03.07.584001

InstructPLM: Aligning Protein Language Models to Follow Protein Structure Instructions
Jiezhong Qiu, Junde Xu, Jie Hu, Hanqun Cao, Liya Hou, Zijun Gao, Xinyi Zhou, Anni Li, Xiujuan Li, Bin Cui, Fei Yang, Shuang Peng, Ning Sun, Fangyu Wang, Aimin Pan, Jie Tang, Jieping Ye, Junyang Lin, Jin Tang, Xingxu Huang, Pheng Ann Heng, Guangyong Chen
bioRxiv 2024.04.17.589642

7.3 Molecular Design Models

Unlike function-scaffold-sequence paradigm in protein design, major molecular design models based on paradigm form DL from 3 kinds of level: atom-based, fragment-based, reaction-based, and they can be categorized as Gradient optimization or Optimized sampling(gradient-free). Click here for detail review
In consideration of learning more various of generative models for design, these recommended latest models from Molecular Design might be helpful and even be able to be transplanted to protein design. More paper list at :

  1. CondaPereira's GitHub repo: Essay_For_Molecular_Generation.
  2. AspirinCode's :papers-for-molecular-design-using-DL
  3. Alex Morehead's :awesome-molecular-generation

7.3.1 Gradient optimization

Differentiable scaffolding tree for molecular optimization
Fu, T., Gao, W., Xiao, C., Yasonik, J., Coley, C. W., & Sun, J.
arXiv preprint arXiv:2109.10469code • Sept 21

Equivariant Energy-Guided SDE for Inverse Molecular Design
Fan Bao, Min Zhao, Zhongkai Hao, Peiyao Li, Chongxuan Li, Jun Zhu
arXiv:2209.15408

Equivariant Shape-Conditioned Generation of 3D Molecules for Ligand-Based Drug Design
Keir Adams, Connor W. Coley
arXiv:2210.04893code

Structure-based Drug Design with Equivariant Diffusion Models
Arne Schneuing, Yuanqi Du, Charles Harris, Arian Jamasb, Ilia Igashov, Weitao Du, Tom Blundell, Pietro Lió, Carla Gomes, Max Welling, Michael Bronstein, Bruno Correia
NeurIPS 2022/arXiv:2210.13695code

7.3.2 Optimized sampling

Generating 3D Molecules for Target Protein Binding
Meng Liu, Youzhi Luo, Kanji Uchino, Koji Maruhashi, Shuiwang Ji
International Conference on Machine Learning 39 (2022)GraphBP

Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets
Peng, Xingang, et al.
International Conference on Machine Learning 39 (2022)code

Reinforced Genetic Algorithm for Structure-based Drug Design
Fu, Tianfan, et al.
arXiv preprint arXiv:2211.16508 (2022)/ICML22codewebsite

Molecule Generation For Target Protein Binding with Structural Motifs
Zhang, Zaixi, et al.
International Conference on Learning Representations 11 (2023)code

3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction
Guan, Jiaqi, et al.
International Conference on Learning Representations 11 (2023)code

About

List of papers about Proteins Design using Deep Learning

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published