Skip to content

abdulrahman305/ppl.nn

 
 

Repository files navigation

PPLNN

website License qq zhihu

Overview

PPLNN, which is short for "PPLNN is a Primitive Library for Neural Network", is a high-performance deep-learning inference engine for efficient AI inferencing. It can run various ONNX models and has better support for OpenMMLab.

alt arch

Important Notice

  • PMX has changed to OPMX at 25/04/2024.
  • ChatGLM1 will not be supported in OPMX.
  • All LLM must be converted(or just rename pmx_params.json to opmx_params.json) and exported again.
  • You can find the old code at llm_v1

Known Issues

  • NCCL issue on some Device: Currently reported that L40S and H800 may encounter illegal memory access on NCCL AllReduce. We suggest trying to turn NCCL protocol Simple off by setting environment NCCL_PROTO=^Simple to fix this issue.

LLM Features

  • New LLM Engine(Overview)
  • Flash Attention
  • Split-k Attention(Similar with Flash Decoding)
  • Group-query Attention
  • Dynamic Batching(Also called Continous Batching or In-flight Batching)
  • Tensor Parallelism
  • Graph Optimization
  • INT8 groupwise KV Cache(Numerical accuracy is very close to FP16🚀)
  • INT8 per token per channel Quantization(W8A8)

LLM Model Zoo

Hello, world!

  • Installing prerequisites:

    • On Debian or Ubuntu:
    apt-get install build-essential cmake git python3 python3-dev
    • On RedHat or CentOS:
    yum install gcc gcc-c++ cmake3 make git python3 python3-devel
  • Cloning source code:

git clone https://github.com/openppl-public/ppl.nn.git
  • Building from source:
cd ppl.nn
./build.sh -DPPLNN_USE_X86_64=ON -DPPLNN_ENABLE_PYTHON_API=ON
  • Running python demo:
PYTHONPATH=./pplnn-build/install/lib python3 ./tools/pplnn.py --use-x86 --onnx-model tests/testdata/conv.onnx

Refer to Documents for more details.

Documents

Contact Us

Questions, reports, and suggestions are welcome through GitHub Issues!

WeChat Official Account QQ Group
OpenPPL 627853444
OpenPPL QQGroup

Contributions

This project uses Contributor Covenant as code of conduct. Any contributions would be highly appreciated.

Acknowledgements

License

This project is distributed under the Apache License, Version 2.0.

About

A primitive library for neural network

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 97.8%
  • CMake 0.8%
  • Python 0.5%
  • Objective-C 0.4%
  • C 0.3%
  • Shell 0.2%