For Interactive visualization of the results , save the linked file as html on your machine and open it in a browser.
Model: h2oai/h2ogpt-4096-llama2-7b-chat (transformers)
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
CPU
1215.52
1.17546
8
CPU
1216.98
1.17641
4
CPU
1217.17
1.16575
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
1 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
31.8619
41.9433
16
1 x NVIDIA GeForce RTX 4090 (24564 MiB)
32.2947
40.9252
16
1 x NVIDIA A100-SXM4-80GB (81920 MiB)
37.1139
32.4529
16
1 x NVIDIA RTX A6000 (46068 MiB)
47.0375
29.8526
16
1 x NVIDIA GeForce RTX 3090 (24576 MiB)
67.9752
18.0571
8
1 x NVIDIA GeForce RTX 4090 (24564 MiB)
114.622
9.21246
8
1 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
94.1774
8.95532
8
1 x NVIDIA A100-SXM4-80GB (81920 MiB)
181.246
7.47991
8
1 x NVIDIA RTX A6000 (46068 MiB)
148.616
6.61984
8
1 x NVIDIA GeForce RTX 3090 (24576 MiB)
185.146
4.35807
4
1 x NVIDIA GeForce RTX 4090 (24564 MiB)
39.544
32.571
4
1 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
42.8067
32.3408
4
1 x NVIDIA A100-SXM4-80GB (81920 MiB)
53.3973
23.3267
4
1 x NVIDIA RTX A6000 (46068 MiB)
61.5241
22.8456
4
1 x NVIDIA GeForce RTX 3090 (24576 MiB)
90.5194
14.9456
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
2 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
32.1395
40.3871
16
2 x NVIDIA A100-SXM4-80GB (81920 MiB)
39.9269
32.248
16
2 x NVIDIA RTX A6000 (46068 MiB)
47.4105
28.8472
16
2 x NVIDIA GeForce RTX 3090 (24576 MiB)
71.4808
17.7518
8
2 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
94.9813
9.03765
8
2 x NVIDIA A100-SXM4-80GB (81920 MiB)
178.2
7.55443
8
2 x NVIDIA RTX A6000 (46068 MiB)
152.544
6.43862
8
2 x NVIDIA GeForce RTX 3090 (24576 MiB)
186.884
4.35012
4
2 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
43.235
32.0566
4
2 x NVIDIA A100-SXM4-80GB (81920 MiB)
57.0808
22.6791
4
2 x NVIDIA RTX A6000 (46068 MiB)
64.6442
21.972
4
2 x NVIDIA GeForce RTX 3090 (24576 MiB)
94.5099
14.6162
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
4 x NVIDIA A100-SXM4-80GB (81920 MiB)
42.3398
30.2181
16
4 x NVIDIA RTX A6000 (46068 MiB)
49.089
27.7344
8
4 x NVIDIA A100-SXM4-80GB (81920 MiB)
180.534
7.53804
8
4 x NVIDIA RTX A6000 (46068 MiB)
153.411
6.46469
4
4 x NVIDIA A100-SXM4-80GB (81920 MiB)
58.6287
21.9123
4
4 x NVIDIA RTX A6000 (46068 MiB)
66.4926
21.409
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
8 x NVIDIA A100-SXM4-80GB (81920 MiB)
40.4986
30.5489
8
8 x NVIDIA A100-SXM4-80GB (81920 MiB)
186.713
7.23498
4
8 x NVIDIA A100-SXM4-80GB (81920 MiB)
60.1828
21.9172
Model: h2oai/h2ogpt-4096-llama2-13b-chat (transformers)
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
1 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
52.4984
26.2487
16
1 x NVIDIA A100-SXM4-80GB (81920 MiB)
49.7972
24.9301
16
1 x NVIDIA RTX A6000 (46068 MiB)
71.9114
18.4362
16
1 x NVIDIA GeForce RTX 3090 (24576 MiB)
nan
nan
OOM
16
1 x NVIDIA GeForce RTX 4090 (24564 MiB)
nan
nan
OOM
8
1 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
168.967
7.67522
8
1 x NVIDIA GeForce RTX 4090 (24564 MiB)
185.442
6.0205
8
1 x NVIDIA A100-SXM4-80GB (81920 MiB)
174.458
5.69269
8
1 x NVIDIA RTX A6000 (46068 MiB)
193.993
5.56359
8
1 x NVIDIA GeForce RTX 3090 (24576 MiB)
280.467
3.75936
4
1 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
45.3051
20.4771
4
1 x NVIDIA GeForce RTX 4090 (24564 MiB)
68.0646
16.1241
4
1 x NVIDIA RTX A6000 (46068 MiB)
81.1389
15.6933
4
1 x NVIDIA A100-SXM4-80GB (81920 MiB)
74.271
15.0868
4
1 x NVIDIA GeForce RTX 3090 (24576 MiB)
96.6189
9.77255
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
2 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
51.6428
26.1842
16
2 x NVIDIA A100-SXM4-80GB (81920 MiB)
51.299
24.8757
16
2 x NVIDIA RTX A6000 (46068 MiB)
72.8565
18.2039
16
2 x NVIDIA GeForce RTX 3090 (24576 MiB)
89.5996
12.8295
8
2 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
167.523
7.82793
8
2 x NVIDIA RTX A6000 (46068 MiB)
195.929
5.51238
8
2 x NVIDIA A100-SXM4-80GB (81920 MiB)
180.781
5.43787
8
2 x NVIDIA GeForce RTX 3090 (24576 MiB)
280.831
3.72157
4
2 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
47.1425
19.9791
4
2 x NVIDIA RTX A6000 (46068 MiB)
84.5776
15.1326
4
2 x NVIDIA A100-SXM4-80GB (81920 MiB)
79.9461
14.3455
4
2 x NVIDIA GeForce RTX 3090 (24576 MiB)
98.4705
9.68779
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
4 x NVIDIA A100-SXM4-80GB (81920 MiB)
55.3779
21.7073
16
4 x NVIDIA RTX A6000 (46068 MiB)
74.4377
17.8537
8
4 x NVIDIA A100-SXM4-80GB (81920 MiB)
179.505
5.45185
8
4 x NVIDIA RTX A6000 (46068 MiB)
199.799
5.39725
4
4 x NVIDIA RTX A6000 (46068 MiB)
87.6579
14.6779
4
4 x NVIDIA A100-SXM4-80GB (81920 MiB)
78.9061
14.6754
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
8 x NVIDIA A100-SXM4-80GB (81920 MiB)
55.3965
22.302
8
8 x NVIDIA A100-SXM4-80GB (81920 MiB)
185.328
5.38647
4
8 x NVIDIA A100-SXM4-80GB (81920 MiB)
83.0479
13.969
Model: h2oai/h2ogpt-4096-llama2-70b-chat (transformers)
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
1 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
nan
nan
OOM
16
1 x NVIDIA GeForce RTX 3090 (24576 MiB)
nan
nan
OOM
16
1 x NVIDIA A100-SXM4-80GB (81920 MiB)
nan
nan
OOM
16
1 x NVIDIA RTX A6000 (46068 MiB)
nan
nan
OOM
8
1 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
nan
nan
OOM
8
1 x NVIDIA GeForce RTX 3090 (24576 MiB)
nan
nan
OOM
8
1 x NVIDIA RTX A6000 (46068 MiB)
nan
nan
OOM
4
1 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
122.132
10.6495
4
1 x NVIDIA RTX A6000 (46068 MiB)
165.058
6.94248
4
1 x NVIDIA GeForce RTX 3090 (24576 MiB)
nan
nan
OOM
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
2 x NVIDIA RTX A6000 (46068 MiB)
nan
nan
OOM
8
2 x NVIDIA RTX A6000 (46068 MiB)
410.069
2.25687
4
2 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
120.538
10.5008
4
2 x NVIDIA RTX A6000 (46068 MiB)
171.744
6.71342
4
2 x NVIDIA GeForce RTX 3090 (24576 MiB)
nan
nan
OOM
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
4 x NVIDIA RTX A6000 (46068 MiB)
267.056
4.24242
8
4 x NVIDIA RTX A6000 (46068 MiB)
413.957
2.22551
4
4 x NVIDIA RTX A6000 (46068 MiB)
175.491
6.5798
Backend: text-generation-inference
For Interactive visualization of the results , save the linked file as html on your machine and open it in a browser.
Model: h2oai/h2ogpt-4096-llama2-7b-chat (text-generation-inference)
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
1 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
39.0155
55.2139
16
1 x NVIDIA GeForce RTX 3090 (24576 MiB)
29.129
45.9535
16
1 x NVIDIA GeForce RTX 4090 (24564 MiB)
24.3988
44.5878
16
1 x NVIDIA A100-SXM4-80GB (81920 MiB)
39.2697
30.3068
16
1 x NVIDIA RTX A6000 (46068 MiB)
40.3622
29.9724
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
2 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
7.63612
71.7881
16
2 x NVIDIA RTX A6000 (46068 MiB)
41.0461
30.3726
16
2 x NVIDIA A100-SXM4-80GB (81920 MiB)
41.0245
29.36
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
4 x NVIDIA RTX A6000 (46068 MiB)
42.8377
29.388
16
4 x NVIDIA A100-SXM4-80GB (81920 MiB)
41.0995
28.4403
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
8 x NVIDIA A100-SXM4-80GB (81920 MiB)
42.8594
27.8644
Model: h2oai/h2ogpt-4096-llama2-13b-chat (text-generation-inference)
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
1 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
21.7823
33.7132
16
1 x NVIDIA A100-SXM4-80GB (81920 MiB)
51.8428
19.083
16
1 x NVIDIA GeForce RTX 3090 (24576 MiB)
nan
nan
OOM
16
1 x NVIDIA RTX A6000 (46068 MiB)
nan
nan
OOM
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
2 x NVIDIA RTX 6000 Ada Generation (49140 MiB)
10.8242
57.8237
16
2 x NVIDIA GeForce RTX 3090 (24576 MiB)
42.2111
31.4247
16
2 x NVIDIA A100-SXM4-80GB (81920 MiB)
53.3837
22.223
16
2 x NVIDIA RTX A6000 (46068 MiB)
64.782
21.3549
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
4 x NVIDIA A100-SXM4-80GB (81920 MiB)
52.7912
21.3862
16
4 x NVIDIA RTX A6000 (46068 MiB)
66.5247
20.777
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
8 x NVIDIA A100-SXM4-80GB (81920 MiB)
56.3847
20.3764
Model: h2oai/h2ogpt-4096-llama2-70b-chat (text-generation-inference)
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
4 x NVIDIA A100-SXM4-80GB (81920 MiB)
131.453
9.61851
16
4 x NVIDIA RTX A6000 (46068 MiB)
nan
nan
OOM
bits
gpus
summarization time [sec]
generation speed [tokens/sec]
exception
16
8 x NVIDIA A100-SXM4-80GB (81920 MiB)
133.53
9.53011