Skip to content

Commit

Permalink
added sharptoken as exampel
Browse files Browse the repository at this point in the history
  • Loading branch information
dmitry-brazhenko committed Mar 28, 2023
1 parent ebfdfe3 commit be1f118
Showing 1 changed file with 4 additions and 19 deletions.
23 changes: 4 additions & 19 deletions examples/How_to_count_tokens_with_tiktoken.ipynb
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -35,8 +34,9 @@
"\n",
"## Tokenizer libraries by language\n",
"\n",
"For `cl100k_base` and `p50k_base` encodings, `tiktoken` is the only tokenizer available as of March 2023.\n",
"For `cl100k_base` and `p50k_base` encodings:\n",
"- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md)\n",
"- .NET / C#: [SharpToken](https://github.com/dmitry-brazhenko/SharpToken)\n",
"\n",
"For `r50k_base` (`gpt2`) encodings, tokenizers are available in many languages.\n",
"- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md) (or alternatively [GPT2TokenizerFast](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2TokenizerFast))\n",
Expand All @@ -54,7 +54,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -88,7 +87,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -105,7 +103,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -126,7 +123,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -143,7 +139,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand All @@ -152,7 +147,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -180,7 +174,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -221,15 +214,13 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Turn tokens into text with `encoding.decode()`"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -257,15 +248,13 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Warning: although `.decode()` can be applied to single tokens, beware that it can be lossy for tokens that aren't on utf-8 boundaries."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -293,15 +282,13 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"(The `b` in front of the strings indicates that the strings are byte strings.)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -424,7 +411,6 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
Expand Down Expand Up @@ -549,7 +535,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "openai",
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
Expand All @@ -563,9 +549,8 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.9"
"version": "3.7.3"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"
Expand Down

0 comments on commit be1f118

Please sign in to comment.