Number of tokens (757) exceeded maximum context length (512). #2

datacrud8 · 2023-09-12T15:09:36Z

hi, trying to build this app in local, and used same model llama-2-7b-chat.ggmlv3.q8_0.bin
when run the app UI showing some random message same like you showed but checking in console getting this below message:

Number of tokens (755) exceeded maximum context length (512).
Number of tokens (756) exceeded maximum context length (512).
Number of tokens (757) exceeded maximum context length (512).

so increased max_new_tokens=2048, and increased n_ctx and added truncate=True , non of them are fixing this issue.
Changed the model as well. still same issue.

do you know any solution for this issue?

ctxwing · 2023-10-25T10:48:13Z

i got the just same as @datacrud8 .

did any one got solved ? thanks in advance.

$ chainlit run main.py -w
2023-10-25 19:38:13 - Loaded .env file
2023-10-25 19:38:22 - Your app is available at http://localhost:8000
2023-10-25 19:38:51 - Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2
2023-10-25 19:38:54 - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
2023-10-25 19:39:06 - Load pretrained SentenceTransformer: sentence-transformers/all-MiniLM-L6-v2
2023-10-25 19:39:07 - Anonymized telemetry enabled. See https://docs.trychroma.com/telemetry for more information.
Batches: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.15s/it]
2023-10-25 19:39:20 - 4 changes detected
Batches: 100%|██████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 6.11it/s]
2023-10-25 19:41:53 - Number of tokens (513) exceeded maximum context length (512).
2023-10-25 19:41:53 - Number of tokens (514) exceeded maximum context length (512).
2023-10-25 19:41:54 - Number of tokens (515) exceeded maximum context length (512).
2023-10-25 19:41:54 - Number of tokens (516) exceeded maximum context length (512).
2023-10-25 19:41:55 - Number of tokens (517) exceeded maximum context length (512).
2023-10-25 19:41:55 - Number of tokens (518) exceeded maximum context length (512).

sudarshan-koirala · 2023-10-26T09:49:08Z

hello, can you try a different embeddings model, for example, hkunlp/instructor-large in the ingest.py file.

ctxwing · 2023-10-27T02:35:39Z

@sudarshan-koirala At first, thanks for the answer to my question.
i changed this
model_name="sentence-transformers/all-MiniLM-L6-v2" ... result is case [A]

into below lists including yours recommend, from by refering (https://www.sbert.net/docs/pretrained_models.html#multi-lingual-models ) .

hkunlp/instructor-large ... result is case [B]
sentence-transformers/multi-qa-MiniLM-L6-cos-v1 ... result is case [A]

but got no luck ...
did i miss something ?

   huggingface_embeddings = HuggingFaceEmbeddings(
       model_name="sentence-transformers/all-MiniLM-L6-v2", #<-- [A] this cause exceeded maximum context length (512).
       model_kwargs={"device": "cpu"},
       )

    huggingface_embeddings = HuggingFaceEmbeddings(
        model_name="hkunlp/instructor-large", #<-- [B] changed to this , throw error below
        model_kwargs={"device": "cpu"},
        )

$ time python ingest.py
Downloading (…)c7233/.gitattributes: 100%|████████████████████████████████████████████████| 1.48k/1.48k [00:00<00:00, 3.60MB/s]
Downloading (…)_Pooling/config.json: 100%|█████████████████████████████████████████████████████| 270/270 [00:00<00:00, 716kB/s]
Downloading (…)/2_Dense/config.json: 100%|█████████████████████████████████████████████████████| 116/116 [00:00<00:00, 291kB/s]
Downloading pytorch_model.bin: 100%|██████████████████████████████████████████████████████| 3.15M/3.15M [00:00<00:00, 11.1MB/s]
Downloading (…)9fb15c7233/README.md: 100%|█████████████████████████████████████████████████| 66.3k/66.3k [00:00<00:00, 338kB/s]
Downloading (…)b15c7233/config.json: 100%|████████████████████████████████████████████████| 1.53k/1.53k [00:00<00:00, 4.31MB/s]
Downloading (…)ce_transformers.json: 100%|█████████████████████████████████████████████████████| 122/122 [00:00<00:00, 358kB/s]
Downloading pytorch_model.bin: 100%|██████████████████████████████████████████████████████| 1.34G/1.34G [01:56<00:00, 11.5MB/s]
Downloading (…)nce_bert_config.json: 100%|███████████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 157kB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 6.51MB/s]
Downloading spiece.model: 100%|█████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 11.9MB/s]
Downloading (…)c7233/tokenizer.json: 100%|████████████████████████████████████████████████| 2.42M/2.42M [00:01<00:00, 2.36MB/s]
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████| 2.41k/2.41k [00:00<00:00, 7.06MB/s]
Downloading (…)15c7233/modules.json: 100%|████████████████████████████████████████████████████| 461/461 [00:00<00:00, 1.37MB/s]
Traceback (most recent call last):
File "/home/ctxwing/docker-ctx/lancer/basic/llama2-chat-with-documents/ingest.py", line 82, in
create_vector_database()
File "/home/ctxwing/docker-ctx/lancer/basic/llama2-chat-with-documents/ingest.py", line 59, in create_vector_database
huggingface_embeddings = HuggingFaceEmbeddings(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/ctxwing/anaconda3/envs/py311-chainlit/lib/python3.11/site-packages/langchain/embeddings/huggingface.py", line 66, in init
self.client = sentence_transformers.SentenceTransformer(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ctxwing/anaconda3/envs/py311-chainlit/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 95, in init
modules = self._load_sbert_model(model_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ctxwing/anaconda3/envs/py311-chainlit/lib/python3.11/site-packages/sentence_transformers/SentenceTransformer.py", line 840, in _load_sbert_model
module = module_class.load(os.path.join(model_path, module_config['path']))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ctxwing/anaconda3/envs/py311-chainlit/lib/python3.11/site-packages/sentence_transformers/models/Pooling.py", line 120, in load
return Pooling(**config)
^^^^^^^^^^^^^^^^^
TypeError: Pooling.init() got an unexpected keyword argument 'pooling_mode_weightedmean_tokens'

real 2m12.113s
user 0m14.215s
sys 0m9.627s

sny-verma · 2024-04-25T06:02:10Z

As per new updates, define like this:-

llm = CTransformers(
model=model_path,
model_type=model_type,
config={'max_new_tokens': 1024,
'temperature': 0.7,
'context_length': 4096}
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Number of tokens (757) exceeded maximum context length (512). #2

Number of tokens (757) exceeded maximum context length (512). #2

datacrud8 commented Sep 12, 2023

ctxwing commented Oct 25, 2023

sudarshan-koirala commented Oct 26, 2023

ctxwing commented Oct 27, 2023 •

edited

Loading

sny-verma commented Apr 25, 2024

Number of tokens (757) exceeded maximum context length (512). #2

Number of tokens (757) exceeded maximum context length (512). #2

Comments

datacrud8 commented Sep 12, 2023

ctxwing commented Oct 25, 2023

sudarshan-koirala commented Oct 26, 2023

ctxwing commented Oct 27, 2023 • edited Loading

sny-verma commented Apr 25, 2024

ctxwing commented Oct 27, 2023 •

edited

Loading