trailtoken

An tool to visualise open source LLM tokenizers.

Usage

To launch the frontend locally, run the following commands:

cd frontend/
npm install
npm run dev

and you should be able to access the website at http://localhost:3000/trailtoken

To run the backend locally, execute:

cd backend/
pip install -r requirements. txt
python src/main.py

and you should be able to make requests to http://127.0.0.1:5000/. In particular, a request to tokenizer text can be made to http://127.0.0.1:5000/tokenize. The body of the request has the following structure:

{
  "tokenizer_name": string,
  "input_text": string
}

Other useful frontend commands are

npm run lint  # for linting
npm run build # build the site
npm run start # run the built site

Tests

Check backend test coverage with

pytest --cov-report=term-missing:skip-covered --cov=src/

Acknowledgements

Inspired by Andrej Karpathy's video on tokenization and a similar tool for visualising OpenAI tokenizers.

Cite

You can cite this work by using the following

@misc{trailtoken2024,
  author = {Lopata, Laurynas and Macijauskas, Augustas},
  title = {trailtoken: {O}pen {S}ource {LLM} {T}okenizer {V}isualisation {T}ool},
  year = {2024},
  howpublished = {\url{https://augustasmacijauskas.github.io/trailtoken/}},
  note = {Accessed: 2024-04-18}
}

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github/workflows		.github/workflows
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
TODO.md		TODO.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

trailtoken

Usage

Tests

Acknowledgements

Cite

About

Releases

Packages

Contributors 2

Languages

AugustasMacijauskas/trailtoken

Folders and files

Latest commit

History

Repository files navigation

trailtoken

Usage

Tests

Acknowledgements

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages