Acceleration Hardware and software acceleration for LLM training and inference Papers 2023 (2023-02) High-throughput Generative Inference of Large Language Models with a single GPU Ying Sheng et al. Paper | Github Useful Resources