Skip to content

AugustasMacijauskas/trailtoken

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

trailtoken

An tool to visualise open source LLM tokenizers.

Usage

To launch the frontend locally, run the following commands:

cd frontend/
npm install
npm run dev

and you should be able to access the website at http://localhost:3000/trailtoken

To run the backend locally, execute:

cd backend/
pip install -r requirements. txt
python src/main.py

and you should be able to make requests to http://127.0.0.1:5000/. In particular, a request to tokenizer text can be made to http://127.0.0.1:5000/tokenize. The body of the request has the following structure:

{
  "tokenizer_name": string,
  "input_text": string
}

Other useful frontend commands are

npm run lint  # for linting
npm run build # build the site
npm run start # run the built site

Tests

Check backend test coverage with

pytest --cov-report=term-missing:skip-covered --cov=src/

Acknowledgements

Inspired by Andrej Karpathy's video on tokenization and a similar tool for visualising OpenAI tokenizers.

Cite

You can cite this work by using the following

@misc{trailtoken2024,
  author = {Lopata, Laurynas and Macijauskas, Augustas},
  title = {trailtoken: {O}pen {S}ource {LLM} {T}okenizer {V}isualisation {T}ool},
  year = {2024},
  howpublished = {\url{https://augustasmacijauskas.github.io/trailtoken/}},
  note = {Accessed: 2024-04-18}
}

About

An application that visualises LLM tokenizers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published