Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
MolGen		MolGen
fig		fig
moldata		moldata
LICENSE		LICENSE
README.md		README.md
molgen.png		molgen.png

Repository files navigation

⚗️ MolGen

Domain-Agnostic Molecular Generation with Self-feedback

📃 Paper • 🤗 Model • 🔬 Space

🔔 News

2023-6 We open-source KnowLM, a knowledgeable LLM framework with pre-training and instruction fine-tuning code (supports multi-machine multi-GPU setup).
2023-6 We release Mol-Instructions, a large-scale biomolecule instruction dataset for large language models.
2023-5 We propose Knowledge graph-enhanced molecular contrAstive learning with fuNctional prOmpt (KANO) on Nature Machine Intelligence, exploiting fundamental domain knowledge in both pre-training and fine-tuning.
2023-4 We provide a NLP for science paper-list at https://github.com/zjunlp/NLP4Science_Papers.
2023-3 We release our pre-trained and fine-tuned model on 🤗 Hugging Face at MolGen-large and MolGen-large-opt.
2023-2 We provide a demo on 🤗 Hugging Face at Space.

📕 Requirements

To run the codes, You can configure dependencies by restoring our environment:

conda env create -f MolGen/environment.yml -n $Your_env_name$

and then：

conda activate $Your_env_name$

📚 Resource Download

You can download the pre-trained model via this link1, and the fine-tuned models via this link2.

Moreover, the dataset used for downstream tasks can be found here.

The expected structure of files is:

moldata
├── checkpoint 
│   ├── molgen.pkl              # pre-trained model
│   ├── syn_qed_model.pkl       # fine-tuned model for QED optimization on synthetic data
│   ├── syn_plogp_model.pkl     # fine-tuned model for p-logP optimization on synthetic data
│   ├── np_qed_model.pkl        # fine-tuned model for QED optimization on natural product data
│   ├── np_plogp_model.pkl      # fine-tuned model for p-logP optimization on natural product data
├── finetune
│   ├── np_test.csv             # nature product test data
│   ├── np_train.csv            # nature product train data
│   ├── plogp_test.csv          # synthetic test data for plogp optimization
│   ├── qed_test.csv            # synthetic test data for plogp optimization
│   └── zinc250k.csv            # synthetic train data
├── generate                    # generate molecules
├── output                      # molecule candidates
└── vocab_list
    └── zinc.npy                # SELFIES alphabet

🚀 How to run

Fine-tune
- First, preprocess the finetuning dataset by generating candidate molecules using our pre-trained model. The preprocessed data will be stored in the folder output.
```
    cd MolGen
    bash preprocess.sh
```
- Then utilize the self-feedback paradigm. The fine-tuned model will be stored in the folder checkpoint.
```
    bash finetune.sh
```
Generate

To generate molecules, run this script. Please specify the checkpoint_path to determine whether to use the pre-trained model or the fine-tuned model.
```
cd MolGen
bash generate.sh
```

🥽 Experiments

We conduct experiments on well-known benchmarks to confirm MolGen's optimization capabilities, encompassing penalized logP, QED, and molecular docking properties. For detailed experimental settings and analysis, please refer to our paper.

Distribution Learning

Targeted Molecule Discovery

Constrained Molecular Optimization

Citation

If you use or extend our work, please cite the paper as follows:

@article{fang2023molecular,
  title={Molecular Language Model as Multi-task Generator},
  author={Fang, Yin and Zhang, Ningyu and Chen, Zhuo and Fan, Xiaohui and Chen, Huajun},
  journal={arXiv preprint arXiv:2301.11259},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚗️ MolGen

Domain-Agnostic Molecular Generation with Self-feedback

🔔 News

📕 Requirements

📚 Resource Download

🚀 How to run

Fine-tune

Generate

🥽 Experiments

Distribution Learning

Targeted Molecule Discovery

Constrained Molecular Optimization

Citation

About

Releases

Packages

Contributors 3

Languages

License

zjunlp/MolGen

Folders and files

Latest commit

History

Repository files navigation

⚗️ MolGen

Domain-Agnostic Molecular Generation with Self-feedback

🔔 News

📕 Requirements

📚 Resource Download

🚀 How to run

Fine-tune

Generate

🥽 Experiments

Distribution Learning

Targeted Molecule Discovery

Constrained Molecular Optimization

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages