Cleaned Enzeptional #232

yvesnana · 2023-11-22T16:01:34Z

Cleaned Enzeptional
Refactored both Processing and Core files

Signed-off-by: nanayves <[email protected]>

drugilsberg

We are getting there. Address all remaining comments please, it should be straightforward. The CI is failing on black, ensure the styling is applied before commits.

drugilsberg · 2023-11-23T16:50:52Z

src/gt4sd/frameworks/enzeptional/README.md

+<!--
+MIT License
+
+Copyright (c) 2023 GT4SD team
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+-->
+


we do not license README.md files

Suggested change

drugilsberg · 2023-11-23T16:51:55Z

src/gt4sd/frameworks/enzeptional/README.md

Move this README.md in a dedicated example folder where we show how use the framework. You can get inspiration from here: https://github.com/GT4SD/gt4sd-core/tree/main/examples/regression_transformer

drugilsberg · 2023-11-23T16:52:34Z

src/gt4sd/frameworks/enzeptional/README.md

+## Requirements
+- Python 3.6 or higher
+- PyTorch
+- Hugging Face's Transformers
+- TAPE (Tasks Assessing Protein Embeddings)
+- NumPy
+- Joblib
+- Logging module
+- xgboost (optional)
+
+## Installation
+Ensure all required libraries are installed. You can install them using pip:
+```bash
+pip install torch transformers numpy joblib xgboost


Not need this needs yo be covered by the toolkit installer.

Suggested change

## Requirements

- Python 3.6 or higher

- PyTorch

- Hugging Face's Transformers

- TAPE (Tasks Assessing Protein Embeddings)

- NumPy

- Joblib

- Logging module

- xgboost (optional)

## Installation

Ensure all required libraries are installed. You can install them using pip:

```bash

pip install torch transformers numpy joblib xgboost

drugilsberg · 2023-11-23T16:53:27Z

src/gt4sd/frameworks/enzeptional/README.md

+### Example Usage
+```python
+# Set Up Model Paths
+language_model_path = "Rostlab/prot_bert"
+tokenizer_path = "Rostlab/prot_bert"
+unmasking_model_path = "Rostlab/prot_bert"
+chem_model_path = "Rostlab/prot_bert" 
+chem_tokenizer_path = "Rostlab/prot_bert"
+
+protein_model = HFandTAPEModelUtility(embedding_model_path=language_model_path,
+                                      tokenizer_path=tokenizer_path,)
+
+# Mutation Configuration
+mutation_config = {
+    "type": "language-modeling",
+    "embedding_model_path": language_model_path,
+    "tokenizer_path": tokenizer_path,
+    "unmasking_model_path": unmasking_model_path
+}
+
+# Define Parameters
+intervals = [[5, 10], [20, 25]]
+batch_size = 5
+top_k = 3
+substrate_smiles = "CCCO"  # Replace with actual substrate SMILES
+product_smiles = "CCCO"  # Replace with actual product SMILES
+
+# Initialize Sequence Mutator
+sample_sequence = "WLSNIDMILRSPYSHTGAVLIYKQPDNNEDNIHPSSSMYFDANILIEALSKALVP"
+mutator = SequenceMutator(sequence=sample_sequence, mutation_config=mutation_config)
+
+# Initialize Protein Sequence Optimizer
+optimizer = ProteinSequenceOptimizer(
+    sequence=sample_sequence,
+    protein_model=protein_model,
+    substrate_smiles=substrate_smiles,
+    product_smiles=product_smiles,
+    chem_model_path=chem_model_path,
+    chem_tokenizer_path=chem_tokenizer_path,
+    mutator=mutator,
+    intervals=intervals,
+    batch_size=batch_size,
+    top_k=top_k,
+    selection_ratio=0.5,
+    perform_crossover=True,
+    crossover_type="single_point",
+    concat_order=["substrate", "sequence", "product"]
+)
+
+# Run optimization
+optimized_sequences, iteration_info = optimizer.optimize(
+    num_iterations=5,
+    num_sequences=50,
+    num_mutations=5,
+    time_budget=3600
+)
+
+# Output results
+for i in optimized_sequences:
+    seq = i["sequence"]
+    score = i["score"]
+    print(f"Sequence: {seq}, Score: {score}")
+
+print(iteration_info)
+```
+
+## Customization
+- Modify `intervals` to specify mutation regions in the sequence.
+- Adjust `batch_size`, `top_k`, `selection_ratio`, and `crossover_type` for different optimization strategies.
+- Change `concat_order` to alter the order of sequence, substrate, and product in the final embedding for scoring.
+- Use `time_budget` to set a maximum time limit for each optimization iteration.
+
+## Notes
+- Ensure the paths to the models and tokenizers are correctly set.
+- The script is designed for flexibility and can be adapted to different models and optimization strategies.
+- For extensive usage, consider parallelizing or distributing the computation, especially for large-scale optimizations.
+


consider moving the python snippet into a dedicated example after moving this.

drugilsberg · 2023-11-23T16:55:01Z

src/gt4sd/frameworks/enzeptional/core.py

+            sequence (str): The original sequence to be mutated.
+            num_mutations (int): The number of mutations to introduce.
+            intervals (List[List[int]]): Intervals within the sequence


do not type arguments in docstrings, only return types, follow the library standard

drugilsberg · 2023-11-23T16:57:16Z

src/gt4sd/frameworks/enzeptional/processing.py

+def get_device(device: Optional[Union[torch.device, str]] = None) -> torch.device:
+    """
+    Determines the appropriate torch device for computations.
+
+    Args:
+        device (Optional[Union[torch.device, str]]): The desired device
+        'cpu' or 'cuda:0'). If None,
+        automatically selects the device.
+
+    Returns:
+        torch.device: The determined torch device for computations.
+    """
+    return torch.device(
+        "cuda:0" if torch.cuda.is_available() and device != "cpu" else "cpu"
+    )


use the methods we already have in the library, specifically this: https://github.com/GT4SD/gt4sd-core/blob/94f009132968ac4558e8256a4afcbf740065c5cd/src/gt4sd/frameworks/torch/__init__.py#L65C58-L65C59

drugilsberg · 2023-11-23T16:59:21Z

setup.cfg

@@ -84,6 +84,9 @@ gt4sd =
    training_pipelines/tests/*json


need to add all enzeptional dependencies in install_requires with no version and update with the versions the requirements files to ensure the installation of the needed packages.

yvesnana and others added 12 commits February 10, 2023 10:45

refactor & feat: re-organized the scripts and added new features

eb06787

Signed-off-by: nanayves <[email protected]>

feat: updated the setup.cfg

889066b

Signed-off-by: nanayves <[email protected]>

Merge branch 'main' into yves

b2056b7

refactor: re-organized Enzeptional

6e5f5f5

Merge branch 'main' into yves

dc66869

fix: fixed Enzeptional tests

85723ce

Merge branch 'yves' of github.com:yvesnana/gt4sd-core into yves

5c4ed49

fix: resolved black issues

24a7bc9

fix: fixed mypy errors

c453478

fix: fixed mypy errors

24b4f94

Merge branch 'GT4SD:main' into yves

0410500

refactor: Added new features to Enzeptional.

74689df

cla-bot bot added the cla-signed CLA has been signed label Nov 22, 2023

yvesnana requested a review from drugilsberg November 22, 2023 16:01

drugilsberg requested changes Nov 23, 2023

View reviewed changes

yvesnana closed this Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleaned Enzeptional #232

Cleaned Enzeptional #232

yvesnana commented Nov 22, 2023

drugilsberg left a comment

drugilsberg Nov 23, 2023

drugilsberg Nov 23, 2023

drugilsberg Nov 23, 2023

drugilsberg Nov 23, 2023

drugilsberg Nov 23, 2023

drugilsberg Nov 23, 2023

drugilsberg Nov 23, 2023

Cleaned Enzeptional #232

Cleaned Enzeptional #232

Conversation

yvesnana commented Nov 22, 2023

drugilsberg left a comment

Choose a reason for hiding this comment

drugilsberg Nov 23, 2023

Choose a reason for hiding this comment

drugilsberg Nov 23, 2023

Choose a reason for hiding this comment

drugilsberg Nov 23, 2023

Choose a reason for hiding this comment

drugilsberg Nov 23, 2023

Choose a reason for hiding this comment

drugilsberg Nov 23, 2023

Choose a reason for hiding this comment

drugilsberg Nov 23, 2023

Choose a reason for hiding this comment

drugilsberg Nov 23, 2023

Choose a reason for hiding this comment