An tool to visualise open source LLM tokenizers.
To launch the frontend locally, run the following commands:
cd frontend/
npm install
npm run dev
and you should be able to access the website at http://localhost:3000/trailtoken
To run the backend locally, execute:
cd backend/
pip install -r requirements. txt
python src/main.py
and you should be able to make requests to http://127.0.0.1:5000/
. In particular, a request to tokenizer text can be made to http://127.0.0.1:5000/tokenize
. The body of the request has the following structure:
{
"tokenizer_name": string,
"input_text": string
}
Other useful frontend commands are
npm run lint # for linting
npm run build # build the site
npm run start # run the built site
Check backend test coverage with
pytest --cov-report=term-missing:skip-covered --cov=src/
Inspired by Andrej Karpathy's video on tokenization and a similar tool for visualising OpenAI tokenizers.
You can cite this work by using the following
@misc{trailtoken2024,
author = {Lopata, Laurynas and Macijauskas, Augustas},
title = {trailtoken: {O}pen {S}ource {LLM} {T}okenizer {V}isualisation {T}ool},
year = {2024},
howpublished = {\url{https://augustasmacijauskas.github.io/trailtoken/}},
note = {Accessed: 2024-04-18}
}