Name		Name	Last commit message	Last commit date
Latest commit History 412 Commits
.github/workflows		.github/workflows
chemlactica		chemlactica
notebooks		notebooks
tests		tests
tokenizer.json		tokenizer.json
unit_tests		unit_tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
confirm_tests.py		confirm_tests.py
environment.yml		environment.yml
new_docmaker.py		new_docmaker.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_ap_mist7.sh		run_ap_mist7.sh
test_environment.yml		test_environment.yml
test_status.yaml		test_status.yaml

Repository files navigation

ChemLactica

Description

Fine tuning the galactica models on chemistry data from PubChem.

Prerequisites

Python 3.11
conda

Installation

conda create -n ChemLactica python=3.11 -y -f environment.yml
conda activate chemlactica

Usage

Training

The script for training the model is train.py which can be run from the command line using the following syntax:

python train.py --model_type galactica/125m --training_data_dir .small_data/train --valid_data_dir .small_data/valid --max_steps 128 --eval_steps 64 --track --eval_accumulation_steps 8

Here's what these arguments do

--model_type <model_name> - type of model to train, one of galactica/125m, galactica/1.3B , galactica/20B
--training_data_dir - directory containing training data
--valid_data_dir - directory containing validation data
--max_steps - maximum number of steps to run training for
--eval_steps - the interval at which to run evaluation
--track - whether to track model checkpoint or not
--eval_accumulation_steps - the number of steps after which to move the prediction tensor from GPU to CPU during the evaluation (specified to avoid OOM errors)

Tests

The test for running the a small sized model with the same architecture as galactica on a small set of data is located at /tests/precommit_test.py and can be called as follows:

python -m unittest precommit_test.py

This test is also run as part of the CI pipeline on the main branch on a public github runner.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChemLactica

Table of contents

Description

Prerequisites

Installation

Usage

Training

Tests

About

Releases

Packages

Contributors 5

Languages

YerevaNN/ChemLactica

Folders and files

Latest commit

History

Repository files navigation

ChemLactica

Table of contents

Description

Prerequisites

Installation

Usage

Training

Tests

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages