Skip to content

Deno(TypeScript) BPE Encoder Decoder for GPT-2 / GPT-3

License

Notifications You must be signed in to change notification settings

optionsx/GPT3-Tokenizer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Deno Port of GPT-3-Encoder

BPE Encoder Decoder for GPT-2 / GPT-3

About

i needed gpt tokenizer for a personal peroject, all others projects(gpt_2_3_tokenizer,clip_bpe) had issue,
they would break at encoding "constructor" word,
so i ported a working gpt-3-encoder module from nodejs(js) to deno(ts) and reformed the internals abit

Usage

deno 1.30.2
v8 10.9.194.5
typescript 4.9.4

// `deno run --allow-read --allow-write example.ts`
import { decode, encode } from "https://deno.land/x/gpt/mod.ts";

const str = "encode: biji heval, contrusctor";
const encoded = encode(str);
console.log("tokenized result: ", encoded);

console.log("We can look at each token and what it represents");
for (let token of encoded) {
  console.log({ token, string: decode([token]) });
}

const decoded = decode(encoded);
console.log("decoded:", decoded);

About

Deno(TypeScript) BPE Encoder Decoder for GPT-2 / GPT-3

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • TypeScript 100.0%