Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 358 44 Updated Sep 11, 2024

mit-han-lab / qserve

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Python 473 28 Updated Nov 9, 2024

confident-ai / deepeval

The LLM Evaluation Framework

Python 4,163 341 Updated Jan 2, 2025

onnx / tutorials

Tutorials for creating and using ONNX models

Jupyter Notebook 3,416 634 Updated Jul 15, 2024

NVIDIA / nvbench

CUDA Kernel Benchmarking Library

Cuda 540 69 Updated Nov 20, 2024

microsoft / superbenchmark

A validation and profiling tool for AI infrastructure

Python 284 60 Updated Dec 12, 2024

tjunlp-lab / Awesome-LLMs-Evaluation-Papers

The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.

727 47 Updated May 8, 2024

datawhalechina / thorough-pytorch

PyTorch入门教程，在线阅读地址：https://datawhalechina.github.io/thorough-pytorch/

Jupyter Notebook 2,704 428 Updated Oct 30, 2024

onnx / onnx

Open standard for machine learning interoperability

Python 18,178 3,691 Updated Jan 3, 2025

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,637 218 Updated Dec 20, 2024

IST-DASLab / QUIK

Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024

C++ 175 13 Updated Apr 16, 2024

jenkinsci / jenkins

Jenkins automation server

Java 23,426 8,859 Updated Jan 3, 2025

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 85,486 23,015 Updated Jan 3, 2025

numba / numba

NumPy aware dynamic Python compiler using LLVM

Python 10,092 1,136 Updated Dec 17, 2024

inducer / pycuda

CUDA integration for Python, plus shiny features

Python 1,890 291 Updated Nov 5, 2024

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 8,039 418 Updated Sep 6, 2024

ModelTC / lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,723 217 Updated Jan 3, 2025

woshidandan / TANet-image-aesthetics-and-quality-assessment

🔥[IJCAI 2022, Official Code] for paper "Rethinking Image Aesthetics Assessment: Models, Datasets and Benchmarks". Official Weights and Demos provided. 首个面向多主题场景的美学评估数据集、算法和benchmark.

Python 301 19 Updated Nov 25, 2024

antgroup / glake

GLake: optimizing GPU memory management and IO transmission.

Python 406 35 Updated Nov 27, 2024

kubernetes / dashboard

General-purpose web UI for Kubernetes clusters

Go 14,573 4,175 Updated Jan 2, 2025

triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.

Python 8,537 1,506 Updated Jan 3, 2025

PaddlePaddle / VisualDL

Deep Learning Visualization Toolkit（『飞桨』深度学习可视化工具）

HTML 4,798 630 Updated Dec 11, 2024

microsoft / promptflow

Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring.

Python 9,717 901 Updated Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is_leaner iamWHTWD

Achievements

Achievements

Block or report iamWHTWD

Stars

chaosblade-io / chaosblade

neuralmagic / guidellm

abhibambhaniya / GenZ-LLM-Analyzer

onnx / models

noamgat / lm-format-enforcer

AIPHES / emnlp19-moverscore

Tiiiger / bert_score

hahnyuan / LLM-Viewer