Name		Name	Last commit message	Last commit date
Latest commit History 640 Commits
.github/workflows		.github/workflows
chemlactica		chemlactica
local_submit_files		local_submit_files
notebooks		notebooks
tests		tests
tokenizer.json		tokenizer.json
unit_tests		unit_tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
confirm_tests.py		confirm_tests.py
environment.yml		environment.yml
gemma_ddp_config.yaml		gemma_ddp_config.yaml
gemma_fsdp_config.yaml		gemma_fsdp_config.yaml
new_docmaker.py		new_docmaker.py
pyproject.toml		pyproject.toml
run_ap_mist7.sh		run_ap_mist7.sh
run_iter_ft.sh		run_iter_ft.sh
run_test_chem.sh		run_test_chem.sh
submit_run.py		submit_run.py
submit_run_gemma.py		submit_run_gemma.py
test_environment.yml		test_environment.yml
test_status.yaml		test_status.yaml
try_gemma.py		try_gemma.py

Repository files navigation

ChemLactica

Description

Fine tuning the galactica models on chemistry data from PubChem.

Prerequisites

Python 3.11
conda

Installation

conda create -n ChemLactica python=3.11 -y -f environment.yml
conda activate chemlactica

Usage

Training

The script for training the model is train.py which can be run from the command line using the following syntax:

python train.py --model_type galactica/125m --training_data_dir .small_data/train --valid_data_dir .small_data/valid --max_steps 128 --eval_steps 64 --track --eval_accumulation_steps 8

Here's what these arguments do

--model_type <model_name> - type of model to train, one of galactica/125m, galactica/1.3B , galactica/20B
--training_data_dir - directory containing training data
--valid_data_dir - directory containing validation data
--max_steps - maximum number of steps to run training for
--eval_steps - the interval at which to run evaluation
--track - whether to track model checkpoint or not
--eval_accumulation_steps - the number of steps after which to move the prediction tensor from GPU to CPU during the evaluation (specified to avoid OOM errors)

Tests

The test for running the a small sized model with the same architecture as galactica on a small set of data is located at /tests/precommit_test.py and can be called as follows:

python -m unittest precommit_test.py

This test is also run as part of the CI pipeline on the main branch on a public github runner.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChemLactica

Table of contents

Description

Prerequisites

Installation

Usage

Training

Tests

About

Releases

Packages

Contributors 5

Languages

YerevaNN/ChemLactica

Folders and files

Latest commit

History

Repository files navigation

ChemLactica

Table of contents

Description

Prerequisites

Installation

Usage

Training

Tests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages