ProtoCode: Large Language Model pipeline for enhanced human and machine interpretable protocol resource construction

The ProtoCode can standardized protocol by either reading the input text or URL for the literature. To initiate protocol standardization in ProtoCode, users are required to submit their URL for the literature described in natural text, and choose the protocol for extraction using a dropdown menu. Upon submission, ProtoCode first performs a screening analysis to identify the text region corresponding protocol. Users can correct any miss-annotation by highlighting the region of interest. Next, ProtoCode performs data extraction on a protocol specific fine-tuned LLM. The extracted data, in JSONL format, can be converted into standardized natural language text. Moreover, if the corresponding protocol data contains information on equipment settings and/or programs, users can select the outputs for operating experimental equipment.

ProtoCode embraces a full-stack framework with four layers:

Data source layer: This layer collects data from different sources.
Protocol Extraction layer: This layer retrieves the content from article URLs, which is particularly helpful for reducing the input tokens for the model.
LLM layer: This layer mitigates the input protocols, ensuring the model’s relevance and accuracy with fine-tuning and cross-validation.
Application layer: Practical applications, this layer highlights the potential capability of the ProtoCode in different functions.

ProtoCode Framework: Large Language Model for enhanced human and machine interpretable protocol resource construction

Installation

Ensure you have the required dependencies installed by running:

pip install -r requirements.txt

Make sure you have Conda installed on your system before proceeding. If you don't have Conda installed, you can download it from the official Conda website: https://docs.conda.io/en/latest/miniconda.html

conda env create -f environment.yml
conda activate name_of_your_environment

Part I: Content_Extraction

ProtoCode_Content_Extraction extract and save content from URL

Content Configuration

Select a BioRxiv URL (full-text) for the paper.
Paste the URL on the input_path in the [input file].
The result will be saved based on the output_path.
The number of keywords selected determines the quality of the extracted protocol.

* File Name: content_config

content_config is the Config file for Paper Extraction

Required setting

URL for the article

[input_link]
url =

Optional setting

Output path - output_path
Number of keywords - num_keywords # default is 4

[output_path]
out_path = ./protocols/protocol.csv

[num_keywords]
num_keywords =

Part II: Application

ProtoCode_Application has functions include:

Read extracted content or custom input
obtain output from the fine-tuned model
convert the output to robot language

Application Configuration

Use the extracted protocol as the input from the [input_file].
If the quality is not satisfactory, manually extract content and paste it in [input_content].
Choose a number between 1 and 5 to select one of the five fine-tuned models.
The result will be saved based on the output_path.

* File Name: application_config

application_config is the config file for Robot Language

Required setting:

Use extracted content - input_file
Or customized content - input_content
API keys from OPENAI

[input_file]
input_path = ./protocols/protocol.csv

[input_content]
content =

[openai_key]
key =

Optional setting:

Choice of fine-tuned model - model_selection
Output path - output_path

[model_selection]
model_num = 1

[output_path]
output_path = ./output_language/

Error.log

The file will store errors if they occur in any step

Citing ProtoCode

@article{,
  title={Shuo Jiang, Daniel Evans-Yamamoto, Dennis Bersenev, Sucheendra K. Palaniappan, and Ayako Yachie-Kinoshita},
  author={},
  journal={},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
figs		figs
output_language		output_language
protocols		protocols
Manual.pptx		Manual.pptx
ProtoCode_Application.ipynb		ProtoCode_Application.ipynb
ProtoCode_Content_Extraction.ipynb		ProtoCode_Content_Extraction.ipynb
README.md		README.md
application_config.ini		application_config.ini
content_config.ini		content_config.ini
environment.yml		environment.yml
error.log		error.log
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ProtoCode: Large Language Model pipeline for enhanced human and machine interpretable protocol resource construction

ProtoCode embraces a full-stack framework with four layers:

Installation

Part I: Content_Extraction

Content Configuration

* File Name: content_config

Required setting

Optional setting

Part II: Application

Application Configuration

* File Name: application_config

Required setting:

Optional setting:

Error.log

Citing ProtoCode

About

Releases

Packages

Languages

leisuzz/ProtoCode

Folders and files

Latest commit

History

Repository files navigation

ProtoCode: Large Language Model pipeline for enhanced human and machine interpretable protocol resource construction

ProtoCode embraces a full-stack framework with four layers:

Installation

Part I: Content_Extraction

Content Configuration

* File Name: content_config

Required setting

Optional setting

Part II: Application

Application Configuration

* File Name: application_config

Required setting:

Optional setting:

Error.log

Citing ProtoCode

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages