-
fastapi_llm_infer_astreaming Public
Asynchronous streaming inference for LLM(OpenAI, NVIDIA NIM, NAVER HyperClova) using FastAPI.
Python UpdatedNov 28, 2024 -
-
tensorrt_examples Public
Some TensorRT conversion examples for different kinds of neural network models
-
riva_demo Public
NVIDIA Riva SDK Demonstration for Feb 2022,2023 Developer Meetup
-
FasterTransformer Public
Forked from NVIDIA/FasterTransformerTransformer related optimization, including BERT, GPT
C++ Apache License 2.0 UpdatedDec 14, 2022 -
cudnn_mnist Public
cuDNN/cuBLAS implementation for basic convolutional neural network architecture with MNIST dataset
-
TensorRT Public
Forked from NVIDIA/TensorRTTensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.
C++ Apache License 2.0 UpdatedNov 3, 2022 -
Some Triton python client examples
-
Megatron-DeepSpeed-Slurm Public
Execute Megatron-DeepSpeed using Slurm for multi-nodes distributed training
-
dask-mnmg Public
Run RAPIDS Dask CuML Clustering Algorithms with Multi-gpus on Multi-nodes.