Skip to content

Commit 96cee4f

Browse files
authored
Explicitly clear the kv cache each time we eval tokens to match n_past. (nomic-ai#1808)
1 parent 2d56671 commit 96cee4f

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

gpt4all-backend/llamamodel.cpp

+2
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,8 @@ LLModel::Token LLamaModel::sampleToken(PromptContext &promptCtx) const
298298

299299
bool LLamaModel::evalTokens(PromptContext &ctx, const std::vector<int32_t> &tokens) const
300300
{
301+
llama_kv_cache_seq_rm(d_ptr->ctx, 0, ctx.n_past, -1);
302+
301303
llama_batch batch = llama_batch_init(tokens.size(), 0, 1);
302304

303305
batch.n_tokens = tokens.size();

0 commit comments

Comments
 (0)