Understanding 1MB per token calculation #25

hlamba-dm · 2023-09-21T18:57:14Z

I am finding the 1MB GPU ram usage per token while inferencing calculation a bit hard to understand --- also not what I am seeing in practice.

Any insights on how this number was computed ?

zhuangxy · 2023-10-25T01:38:41Z

I think this is a very rough estimation , the actual value should depend on batch size, token length and the embedding size(or hidden layer dimension).
For example a 13B model, has 40 layers and the token length is 4096 , embedding size is 8192, if using batchsize 1, it needs 1 (batchsize) * 8192 (embedding size) * 2 (byets, FP16) * 4096 (token length) * 40 (layer) ~= 2560M, about 0.625 M per token

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding 1MB per token calculation #25

Understanding 1MB per token calculation #25

hlamba-dm commented Sep 21, 2023

zhuangxy commented Oct 25, 2023 •

edited

Loading

Understanding 1MB per token calculation #25

Understanding 1MB per token calculation #25

Comments

hlamba-dm commented Sep 21, 2023

zhuangxy commented Oct 25, 2023 • edited Loading

zhuangxy commented Oct 25, 2023 •

edited

Loading