added sharptoken as exampel

leebox20 · Mar 28, 2023 · be1f118 · be1f118
1 parent ebfdfe3
commit be1f118
Showing 1 changed file with 4 additions and 19 deletions.
diff --git a/examples/How_to_count_tokens_with_tiktoken.ipynb b/examples/How_to_count_tokens_with_tiktoken.ipynb
@@ -1,7 +1,6 @@
 {
  "cells": [
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -35,8 +34,9 @@
     "\n",
     "## Tokenizer libraries by language\n",
     "\n",
-    "For `cl100k_base` and `p50k_base` encodings, `tiktoken` is the only tokenizer available as of March 2023.\n",
+    "For `cl100k_base` and `p50k_base` encodings:\n",
     "- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md)\n",
+    "- .NET / C#: [SharpToken](https://github.com/dmitry-brazhenko/SharpToken)\n",
     "\n",
     "For `r50k_base` (`gpt2`) encodings, tokenizers are available in many languages.\n",
     "- Python: [tiktoken](https://github.com/openai/tiktoken/blob/main/README.md) (or alternatively [GPT2TokenizerFast](https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2TokenizerFast))\n",
@@ -54,7 +54,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -88,7 +87,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -105,7 +103,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -126,7 +123,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -143,7 +139,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -152,7 +147,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -180,7 +174,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -221,15 +214,13 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## 4. Turn tokens into text with `encoding.decode()`"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -257,15 +248,13 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "Warning: although `.decode()` can be applied to single tokens, beware that it can be lossy for tokens that aren't on utf-8 boundaries."
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -293,15 +282,13 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "(The `b` in front of the strings indicates that the strings are byte strings.)"
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -424,7 +411,6 @@
    ]
   },
   {
-   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -549,7 +535,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "openai",
+   "display_name": "Python 3",
    "language": "python",
    "name": "python3"
   },
@@ -563,9 +549,8 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.9"
+   "version": "3.7.3"
   },
-  "orig_nbformat": 4,
   "vscode": {
    "interpreter": {
     "hash": "365536dcbde60510dc9073d6b991cd35db2d9bac356a11f5b64279a5e6708b97"