Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of tokens (757) exceeded maximum context length (512). #2

Open
datacrud8 opened this issue Sep 12, 2023 · 4 comments
Open

Number of tokens (757) exceeded maximum context length (512). #2

datacrud8 opened this issue Sep 12, 2023 · 4 comments

Comments

@datacrud8
Copy link

hi, trying to build this app in local, and used same model llama-2-7b-chat.ggmlv3.q8_0.bin
when run the app UI showing some random message same like you showed but checking in console getting this below message:

Number of tokens (755) exceeded maximum context length (512).
Number of tokens (756) exceeded maximum context length (512).
Number of tokens (757) exceeded maximum context length (512).

so increased max_new_tokens=2048, and increased n_ctx and added truncate=True , non of them are fixing this issue.
Changed the model as well. still same issue.

do you know any solution for this issue?

@ctxwing
Copy link

ctxwing commented Oct 25, 2023

i got the just same as @datacrud8 .

did any one got solved ? thanks in advance.

$ chainlit run main.py -w
2023-10-25 19:38:13 - Loaded .env file
2023-10-25 19:38:22 - Your app is available at http://localhost:8000
2023-10-25 19:38:51 - Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2
2023-10-25 19:38:54 - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
2023-10-25 19:39:06 - Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2
2023-10-25 19:39:07 - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
Batches: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.15s/it]
2023-10-25 19:39:20 - 4 changes detected
Batches: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 6.11it/s]
2023-10-25 19:41:53 - Number of tokens (513) exceeded maximum context length (512).
2023-10-25 19:41:53 - Number of tokens (514) exceeded maximum context length (512).
2023-10-25 19:41:54 - Number of tokens (515) exceeded maximum context length (512).
2023-10-25 19:41:54 - Number of tokens (516) exceeded maximum context length (512).
2023-10-25 19:41:55 - Number of tokens (517) exceeded maximum context length (512).
2023-10-25 19:41:55 - Number of tokens (518) exceeded maximum context length (512).

@sudarshan-koirala
Copy link
Owner

hello, can you try a different embeddings model, for example, hkunlp/instructor-large in the ingest.py file.

@ctxwing
Copy link

ctxwing commented Oct 27, 2023

@sudarshan-koirala At first, thanks for the answer to my question.
i changed this
model_name="sentence-transformers/all-MiniLM-L6-v2" ... result is case [A]

into below lists including yours recommend, from by refering (https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models ) .

  • hkunlp/instructor-large ... result is case [B]
  • sentence-transformers/multi-qa-MiniLM-L6-cos-v1 ... result is case [A]

but got no luck ...
did i miss something ?

   huggingface_embeddings = HuggingFaceEmbeddings(
       model_name="sentence-transformers/all-MiniLM-L6-v2", #<-- [A] this cause exceeded maximum context length (512).
       model_kwargs={"device": "cpu"},
       )
    huggingface_embeddings = HuggingFaceEmbeddings(
        model_name="hkunlp/instructor-large", #<-- [B] changed to this , throw error below
        model_kwargs={"device": "cpu"},
        )

$ time python ingest.py
Downloading (…)c7233/.gitattributes: 100%|████████████████████████████████████████████████| 1.48k/1.48k [00:00<00:00, 3.60MB/s]
Downloading (…)_Pooling/config.json: 100%|█████████████████████████████████████████████████████| 270/270 [00:00<00:00, 716kB/s]
Downloading (…)/2_Dense/config.json: 100%|█████████████████████████████████████████████████████| 116/116 [00:00<00:00, 291kB/s]
Downloading pytorch_model.bin: 100%|██████████████████████████████████████████████████████| 3.15M/3.15M [00:00<00:00, 11.1MB/s]
Downloading (…)9fb15c7233/README.md: 100%|█████████████████████████████████████████████████| 66.3k/66.3k [00:00<00:00, 338kB/s]
Downloading (…)b15c7233/config.json: 100%|████████████████████████████████████████████████| 1.53k/1.53k [00:00<00:00, 4.31MB/s]
Downloading (…)ce_transformers.json: 100%|█████████████████████████████████████████████████████| 122/122 [00:00<00:00, 358kB/s]
Downloading pytorch_model.bin: 100%|██████████████████████████████████████████████████████| 1.34G/1.34G [01:56<00:00, 11.5MB/s]
Downloading (…)nce_bert_config.json: 100%|███████████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 157kB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 6.51MB/s]
Downloading spiece.model: 100%|█████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 11.9MB/s]
Downloading (…)c7233/tokenizer.json: 100%|████████████████████████████████████████████████| 2.42M/2.42M [00:01<00:00, 2.36MB/s]
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████| 2.41k/2.41k [00:00<00:00, 7.06MB/s]
Downloading (…)15c7233/modules.json: 100%|████████████████████████████████████████████████████| 461/461 [00:00<00:00, 1.37MB/s]
Traceback (most recent call last):
File "/home/ctxwing/docker-ctx/lancer/basic/llama2-chat-with-documents/ingest.py", line 82, in
create_vector_database()
File "/home/ctxwing/docker-ctx/lancer/basic/llama2-chat-with-documents/ingest.py", line 59, in create_vector_database
huggingface_embeddings = HuggingFaceEmbeddings(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/ctxwing/anaconda3/envs/py311-chainlit/lib/python3.11/site-packages/langchain/embeddings/huggingface.py", line 66, in init
self.client = sentence_transformers.SentenceTransformer(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ctxwing/anaconda3/envs/py311-chainlit/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 95, in init
modules = self._load_sbert_model(model_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ctxwing/anaconda3/envs/py311-chainlit/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 840, in _load_sbert_model
module = module_class.load(os.path.join(model_path, module_config['path']))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ctxwing/anaconda3/envs/py311-chainlit/lib/python3.11/site-packages/sentence_transformers/models/Pooling.py", line 120, in load
return Pooling(**config)
^^^^^^^^^^^^^^^^^
TypeError: Pooling.init() got an unexpected keyword argument 'pooling_mode_weightedmean_tokens'

real 2m12.113s
user 0m14.215s
sys 0m9.627s

@sny-verma
Copy link

As per new updates, define like this:-

llm = CTransformers(
model=model_path,
model_type=model_type,
config={'max_new_tokens': 1024,
'temperature': 0.7,
'context_length': 4096}
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants