Enoki-JIT — CUDA & LLVM just-in-time compiler

Continuous Integration

Introduction

This project implements a lazy tracing just-in-time (JIT) compiler targeting GPUs (via CUDA 10+ and NVIDIA PTX) and CPUs (via LLVM 7+ IR). Lazy refers its behavior of capturing operations performed in C or C++, while attempting to postpone the associated computation for as long as possible. Eventually, this is no longer possible, at which point the system generates an efficient kernel containing queued computation that is either evaluated on the CPU or GPU.

Enoki-JIT can be used just by itself, or as a component of the larger Enoki library, which additionally provides things like multidimensional arrays, automatic differentiation, and a large library of mathematical functions.

This project has almost no dependencies: it can be compiled without CUDA or LLVM actually being present on the system (it will attempt to find them at runtime). The library is implemented in C++11 but exposes all functionality through a C99-compatible interface.

An example

Two header files enoki-jit/cuda.h and enoki-jit/llvm.h provide convenient C++ wrappers with operator operator overloading building on the C-level API (enoki-jit/jit.h). Here is an brief example on how these can be used:

#include <enoki/cuda.h>

using Bool   = CUDAArray<bool>;
using Float  = CUDAArray<float>;
using UInt32 = CUDAArray<uint32_t>;

// [0, 0.01, 0.02, ..., 1]
Float x = linspace<Float>(0, 1, 101);

// [0, 2, 4, 8, .., 98]
UInt32 index = arange<UInt32>(50) * 2;

// Scatter/gather operations are available
Float y = gather(x, index);

/// Comparisons produce mask arrays
Bool mask = x < .5f;

// Ternary operator
Float z = select(mask, sqrt(x), 1.f / x);

printf("Value is = %s\n", z.str());

Running this program will trigger two kernel launches. The first generates the x array (size 100) when it is accessed by the gather() operation, and the second generates z (size 50) when it is printed in the last line. Both correspond to points during the execution where evaluation could no longer be postponed.

Simply changing the first lines to

#include <enoki/llvm.h>

using Bool   = LLVMArray<bool>;
using Float  = LLVMArray<float>;
using UInt32 = LLVMArray<uint32_t>;

switches to the functionally equivalent LLVM backend. By default, the LLVM backend parallelizes execution via a built-in thread pool, enabling usage that is very similar to the CUDA variant: a single thread issues computation that is then processed in parallel by all cores of the system.

Features

Cross-platform: runs on Linux, macOS, and Windows.
Kernels are cached and reused when the same computation is encountered again. Caching is done both in memory and on disk (~/.enoki on Linux and macOS, ~/AppData/Local/Temp/enoki on Windows).
The internals of the JIT compiler heavily rely on hash table lookups (to keep track of variables) and string concatenation (to merge IR fragments into full kernels), and both of these steps are highly optimized. This means that the overhead of generating kernel IR code is minimal (only a few μs), with most time being spent either executing kernels or compiling from IR to machine code when a kernel is encountered for the first time.
Supports parallel kernel execution on multiple devices (JITing from several CPU threads, or running kernels on multiple GPUs).
The LLVM backend automatically targets the vector instruction sets supported by the host machine (e.g. AVX/AVX2, or AVX512 if available).
The library provides an asynchronous memory allocator, which allocates and releases memory in the execution stream of a device that runs asynchronously with respect to the host CPU. Kernels frequently request and release large memory buffers, which both tend to be very costly operations. For this reason, memory allocations are also cached and reused.
Provides a variety of parallel reductions for convenience.

Name		Name	Last commit message	Last commit date
Latest commit History 264 Commits
ext		ext
include/enoki-jit		include/enoki-jit
kernels		kernels
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enoki-JIT — CUDA & LLVM just-in-time compiler

Introduction

An example

Features

About

Releases

Packages

Contributors 2

Languages

License

knut0815/enoki-jit

Folders and files

Latest commit

History

Repository files navigation

Enoki-JIT — CUDA & LLVM just-in-time compiler

Introduction

An example

Features

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages