Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown encoding gpt2 #301

Closed
aryagxr opened this issue May 22, 2024 · 1 comment
Closed

Unknown encoding gpt2 #301

aryagxr opened this issue May 22, 2024 · 1 comment

Comments

@aryagxr
Copy link

aryagxr commented May 22, 2024

This is the code I'm trying to run tokenizer = tiktoken.get_encoding("gpt2")
and this is the error I get:

{
	"name": "ValueError",
	"message": "Unknown encoding gpt2. Plugins found: ['tiktoken_ext.openai_public']",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[21], line 4
      1 import tiktoken
----> 4 tokenizer = tiktoken.get_encoding(\"gpt2\")

File ~/anaconda3/envs/ml-dl/lib/python3.12/site-packages/tiktoken/registry.py:68, in get_encoding(encoding_name)
     65     assert ENCODING_CONSTRUCTORS is not None
     67 if encoding_name not in ENCODING_CONSTRUCTORS:
---> 68     raise ValueError(
     69         f\"Unknown encoding {encoding_name}. Plugins found: {_available_plugin_modules()}\"
     70     )
     72 constructor = ENCODING_CONSTRUCTORS[encoding_name]
     73 enc = Encoding(**constructor())

ValueError: Unknown encoding gpt2. Plugins found: ['tiktoken_ext.openai_public']"
}

From issue #51, here is the full log:

Python 3.12.3
Linux-6.5.0-25-generic-x86_64-with-glibc2.35
Requirement already satisfied: wheel in ./env/lib/python3.12/site-packages (0.43.0)
Requirement already satisfied: tiktoken in ./env/lib/python3.12/site-packages (0.7.0)
Requirement already satisfied: regex>=2022.1.18 in ./env/lib/python3.12/site-packages (from tiktoken) (2024.5.15)
Requirement already satisfied: requests>=2.26.0 in ./env/lib/python3.12/site-packages (from tiktoken) (2.32.2)
Requirement already satisfied: charset-normalizer<4,>=2 in ./env/lib/python3.12/site-packages (from requests>=2.26.0->tiktoken) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./env/lib/python3.12/site-packages (from requests>=2.26.0->tiktoken) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./env/lib/python3.12/site-packages (from requests>=2.26.0->tiktoken) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in ./env/lib/python3.12/site-packages (from requests>=2.26.0->tiktoken) (2024.2.2)
<Encoding 'gpt2'>
['wheel-0.43.0.dist-info', 'pip-24.0.dist-info', 'certifi-2024.2.2.dist-info', 'urllib3', 'tiktoken_ext', 'idna', 'regex-2024.5.15.dist-info', 'urllib3-2.2.1.dist-info', 'wheel', 'tiktoken-0.7.0.dist-info', 'requests-2.32.2.dist-info', 'idna-3.7.dist-info', 'requests', 'regex', 'certifi', 'tiktoken', 'pip', 'charset_normalizer-3.3.2.dist-info', 'charset_normalizer']

Here is what I tried: (Solutions from #63 )

  • clearing cache
# Set the environment variable to prevent caching
os.environ['TIKTOKEN_CACHE_DIR'] = ""

# Identify the cache directory
cache_dir = os.path.join(tempfile.gettempdir(), "data-gym-cache")
print(f"Cache directory: {cache_dir}")

# Check if the cache directory exists and clear it if it does
if os.path.exists(cache_dir):
    print("Cache directory found. Attempting to clear it...")
    shutil.rmtree(cache_dir)
    print("Cache directory cleared.")
else:
    print("Cache directory does not exist.")
  • Tried re installing tiktoken

Any ideas on how I could fix this error?
Thanks in advance!

@aryagxr aryagxr changed the title Unknown encoding gpt-2 Unknown encoding gpt2 May 22, 2024
@aryagxr
Copy link
Author

aryagxr commented May 27, 2024

Closing this issue. Fixed the error after reinstalling tiktoken in a new conda environment.

@aryagxr aryagxr closed this as completed May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant