tiktoken is a BPE tokeniser for use with OpenAI's models, forked from the original tiktoken library to provide NPM bindings for Node and other JS runtimes.
import assert from "node:assert";
import { get_encoding, encoding_for_model } from "@dqbd/tiktoken";
const enc = get_encoding("gpt2");
assert(
new TextDecoder().decode(enc.decode(enc.encode("hello world"))) ===
"hello world"
);
// To get the tokeniser corresponding to a specific model in the OpenAI API:
const enc = encoding_for_model("text-davinci-003");
The open source version of tiktoken
can be installed from PyPI:
npm install @dqbd/tiktoken