Skip to content
@wenxcs-msft

wenxcs-msft

Popular repositories Loading

  1. flash-attention-rocm flash-attention-rocm Public archive

    Forked from ROCm/flash-attention

    Fast and memory-efficient exact attention

    C++ 1

  2. tvm.dx tvm.dx Public archive

    TVM with DirectX support

    Python 1

  3. DirectX-Headers DirectX-Headers Public archive

    Forked from microsoft/DirectX-Headers

    Official DirectX headers available under an open source license

    C

  4. DirectXShaderCompiler DirectXShaderCompiler Public archive

    Forked from microsoft/DirectXShaderCompiler

    This repo hosts the source for the DirectX Shader Compiler which is based on LLVM/Clang.

    C++

  5. dxpy dxpy Public archive

    DirectX Python Runtime

    C++

  6. deort deort Public archive

    Python

Repositories

Showing 10 of 16 repositories
  • wenxcs-msft/public_access_scripts’s past year of commit activity
    Shell 0 Apache-2.0 0 1 0 Updated Feb 5, 2025
  • ads-triton-server-third_party Public Forked from JiushengChen/third_party

    Third-party source packages that are modified for use in Triton.

    wenxcs-msft/ads-triton-server-third_party’s past year of commit activity
    C 0 BSD-3-Clause 60 0 0 Updated Jan 28, 2025
  • ads-triton-server Public Forked from JiushengChen/triton-server

    The Triton Inference Server provides an optimized cloud and edge inferencing solution.

    wenxcs-msft/ads-triton-server’s past year of commit activity
    Python 0 BSD-3-Clause 1,549 0 1 Updated Jan 28, 2025
  • vllm-xx Public Forked from xiaoxiawu-microsoft/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    wenxcs-msft/vllm-xx’s past year of commit activity
    Python 0 Apache-2.0 5,977 0 8 Updated Sep 26, 2024
  • vllm-pr Public Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    wenxcs-msft/vllm-pr’s past year of commit activity
    Python 0 Apache-2.0 5,977 0 0 Updated Jul 10, 2024
  • wenxcs-msft/foundation-model-stack’s past year of commit activity
    Python 0 Apache-2.0 62 0 0 Updated May 10, 2024
  • vllm Public Forked from vllm-project/vllm

    A high-throughput and memory-efficient inference and serving engine for LLMs

    wenxcs-msft/vllm’s past year of commit activity
    Python 0 Apache-2.0 5,977 0 0 Updated May 8, 2024
  • ads-TensorRT-LLM Public Forked from JiushengChen/TensorRT-LLM

    TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

    wenxcs-msft/ads-TensorRT-LLM’s past year of commit activity
    C++ 0 Apache-2.0 1,126 0 0 Updated Jan 26, 2024
  • WizardLM Public archive Forked from nlpxucan/WizardLM

    WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions

    wenxcs-msft/WizardLM’s past year of commit activity
    Python 0 781 0 0 Updated May 12, 2023
  • azure-openai-proxy Public archive Forked from diemus/azure-openai-proxy

    A proxy for Azure OpenAI API that can convert an OpenAI request into an Azure OpenAI request.

    wenxcs-msft/azure-openai-proxy’s past year of commit activity
    Go 0 MIT 70 0 0 Updated Apr 7, 2023

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…