-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: Initial commit of TensorRT-LLM Blog post (janhq#2428)
docs: Initial commit of TensorRT-LLM Blog post
- Loading branch information
Showing
20 changed files
with
168 additions
and
52 deletions.
There are no files selected for viewing
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
--- | ||
title: Jan now supports TensorRT-LLM | ||
description: Jan has added for Nvidia's TensorRT-LLM, a hardware-optimized LLM inference engine that runs very fast on Nvidia GPUs | ||
tags: [Nvidia, TensorRT-LLM] | ||
--- | ||
|
||
Jan now supports [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) as an alternative inference engine. TensorRT-LLM is a hardware-optimized LLM inference engine that compiles models to [run extremely fast on Nvidia GPUs](https://blogs.nvidia.com/blog/tensorrt-llm-windows-stable-diffusion-rtx/). | ||
|
||
- [TensorRT-LLM Extension](/guides/providers/tensorrt-llm) is available in [0.4.9 release](https://github.com/janhq/jan/releases/tag/v0.4.9) | ||
- Currently available only for Windows | ||
|
||
We've made a few TensorRT-LLM models TensorRT-LLM models available in the Jan Hub for download: | ||
|
||
- TinyLlama-1.1b | ||
- Mistral 7b | ||
- TinyJensen-1.1b, which is trained on Jensen Huang's 👀 | ||
|
||
## What is TensorRT-LLM? | ||
|
||
Please read our [TensorRT-LLM Guide](/guides/providers/tensorrt-llm). | ||
|
||
TensorRT-LLM is mainly used in datacenter-grade GPUs to achieve [10,000 tokens/s](https://nvidia.github.io/TensorRT-LLM/blogs/H100vsA100.html) type speeds. | ||
|
||
## Performance Benchmarks | ||
|
||
|
||
We were curious to see how this would perform on consumer-grade GPUs, as most of Jan's users use consumer-grade GPUs. | ||
|
||
- We’ve done a comparison of how TensorRT-LLM does vs. llama.cpp, our default inference engine. | ||
|
||
| NVIDIA GPU | Architecture | VRAM Used (GB) | CUDA Cores | Tensor Cores | Memory Bus Width (bit) | Memory Bandwidth (GB/s) | | ||
| ---------- | ------------ | -------------- | ---------- | ------------ | ---------------------- | ----------------------- | | ||
| RTX 4090 | Ada | 24 | 16,384 | 512 | 384 | ~1000 | | ||
| RTX 3090 | Ampere | 24 | 10,496 | 328 | 384 | 935.8 | | ||
| RTX 4060 | Ada | 8 | 3,072 | 96 | 128 | 272 | | ||
|
||
> We test using batch_size 1 and input length 2048, output length 512 as it’s the common use case people all use. We run 5 times and get the Average. | ||
> We use Windows task manager and Linux NVIDIA-SMI/ Htop to get CPU/ Memory/ NVIDIA GPU metrics per process. | ||
> We turn off all user application and only open Jan app with Nitro tensorrt-llm or NVIDIA benchmark script in python | ||
### RTX 4090 on Windows PC | ||
|
||
- CPU: Intel 13th series | ||
- GPU: NVIDIA GPU 4090 (Ampere - sm 86) | ||
- RAM: 120GB | ||
- OS: Windows | ||
|
||
#### TinyLlama-1.1b q4 | ||
|
||
| Metrics | GGUF (using the GPU) | TensorRT-LLM | | ||
| -------------------- | -------------------- | ------------ | | ||
| Throughput (token/s) | 104 | ✅ 131 | | ||
| VRAM Used (GB) | 2.1 | 😱 21.5 | | ||
| RAM Used (GB) | 0.3 | 😱 15 | | ||
| Disk Size (GB) | 4.07 | 4.07 | | ||
|
||
#### Mistral-7b int4 | ||
|
||
| Metrics | GGUF (using the GPU) | TensorRT-LLM | | ||
| -------------------- | -------------------- | ------------ | | ||
| Throughput (token/s) | 80 | ✅ 97.9 | | ||
| VRAM Used (GB) | 2.1 | 😱 23.5 | | ||
| RAM Used (GB) | 0.3 | 😱 15 | | ||
| Disk Size (GB) | 4.07 | 4.07 | | ||
|
||
### RTX 3090 on Windows PC | ||
|
||
- CPU: Intel 13th series | ||
- GPU: NVIDIA GPU 3090 (Ampere - sm 86) | ||
- RAM: 64GB | ||
- OS: Windows | ||
|
||
#### TinyLlama-1.1b q4 | ||
|
||
| Metrics | GGUF (using the GPU) | TensorRT-LLM | | ||
| -------------------- | -------------------- | ------------ | | ||
| Throughput (token/s) | 131.28 | ✅ 194 | | ||
| VRAM Used (GB) | 2.1 | 😱 21.5 | | ||
| RAM Used (GB) | 0.3 | 😱 15 | | ||
| Disk Size (GB) | 4.07 | 4.07 | | ||
|
||
#### Mistral-7b int4 | ||
|
||
| Metrics | GGUF (using the GPU) | TensorRT-LLM | | ||
| -------------------- | -------------------- | ------------ | | ||
| Throughput (token/s) | 88 | ✅ 137 | | ||
| VRAM Used (GB) | 6.0 | 😱 23.8 | | ||
| RAM Used (GB) | 0.3 | 😱 25 | | ||
| Disk Size (GB) | 4.07 | 4.07 | | ||
|
||
### RTX 4060 on Windows Laptop | ||
|
||
- Manufacturer: Acer Nitro 16 Phenix | ||
- CPU: Ryzen 7000 | ||
- RAM: 16GB | ||
- GPU: NVIDIA Laptop GPU 4060 (Ada) | ||
|
||
#### TinyLlama-1.1b q4 | ||
|
||
| Metrics | GGUF (using the GPU) | TensorRT-LLM | | ||
| -------------------- | -------------------- | ------------ | | ||
| Throughput (token/s) | 65 | ❌ 41 | | ||
| VRAM Used (GB) | 2.1 | 😱 7.6 | | ||
| RAM Used (GB) | 0.3 | 😱 7.2 | | ||
| Disk Size (GB) | 4.07 | 4.07 GB | | ||
|
||
#### Mistral-7b int4 | ||
|
||
| Metrics | GGUF (using the GPU) | TensorRT-LLM | | ||
| -------------------- | -------------------- | ------------ | | ||
| Throughput (token/s) | 22 | ❌ 19 | | ||
| VRAM Used (GB) | 2.1 | 😱 7.7 | | ||
| RAM Used (GB) | 0.3 | 😱 13.5 | | ||
| Disk Size (GB) | 4.07 | 4.07 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 17 | ||
sidebar_position: 18 | ||
slug: /changelog/changelog-v0.2.0 | ||
--- | ||
# v0.2.0 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 14 | ||
sidebar_position: 15 | ||
slug: /changelog/changelog-v0.2.3 | ||
--- | ||
# v0.2.3 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 13 | ||
sidebar_position: 14 | ||
slug: /changelog/changelog-v0.3.0 | ||
--- | ||
# v0.3.0 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 12 | ||
sidebar_position: 13 | ||
slug: /changelog/changelog-v0.3.1 | ||
--- | ||
# v0.3.1 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 11 | ||
sidebar_position: 12 | ||
slug: /changelog/changelog-v0.3.2 | ||
--- | ||
# v0.3.2 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 10 | ||
sidebar_position: 11 | ||
slug: /changelog/changelog-v0.3.3 | ||
--- | ||
# v0.3.3 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 9 | ||
sidebar_position: 10 | ||
slug: /changelog/changelog-v0.4.0 | ||
--- | ||
# v0.4.0 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 8 | ||
sidebar_position: 9 | ||
slug: /changelog/changelog-v0.4.1 | ||
--- | ||
# v0.4.1 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 7 | ||
sidebar_position: 8 | ||
slug: /changelog/changelog-v0.4.2 | ||
--- | ||
# v0.4.2 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 6 | ||
sidebar_position: 7 | ||
slug: /changelog/changelog-v0.4.3 | ||
--- | ||
# v0.4.3 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 5 | ||
sidebar_position: 6 | ||
slug: /changelog/changelog-v0.4.4 | ||
--- | ||
# v0.4.4 | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
--- | ||
sidebar_position: 4 | ||
sidebar_position: 5 | ||
slug: /changelog/changelog-v0.4.5 | ||
--- | ||
# v0.4.5 | ||
|
Oops, something went wrong.