Skip to content

Commit

Permalink
Explain the occupancy API in the introduction tutorial.
Browse files Browse the repository at this point in the history
[skip tests] [skip benchmarks]
  • Loading branch information
maleadt committed Oct 4, 2021
1 parent ec19e4d commit 179498a
Showing 1 changed file with 45 additions and 0 deletions.
45 changes: 45 additions & 0 deletions docs/src/tutorials/introduction.jl
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,51 @@ end
# ```


# In the previous example, the number of threads was hard-coded to 256. This is not ideal,
# as using more threads generally improves performance, but the maximum number of allowed
# threads to launch depends on your GPU as well as on the kernel. To automatically select an
# appropriate number of threads, it is recommended to use the launch configuration API. This
# API takes a compiled (but not launched) kernel, returns a tuple with an upper bound on the
# number of threads, and the minimum number of blocks that are required to fully saturate
# the GPU:

kernel = @cuda launch=false gpu_add3!(y_d, x_d)
config = launch_configuration(kernel.fun)
threads = min(N, config.threads)
blocks = cld(N, threads)

# The compiled kernel is callable, and we can pass the computed launch configuration as
# keyword arguments:

fill!(y_d, 2)
kernel(y_d, x_d; threads, blocks)
@test all(Array(y_d) .== 3.0f0)

# Now let's benchmark this:

function bench_gpu4!(y, x)
kernel = @cuda launch=false gpu_add3!(y_d, x_d)
config = launch_configuration(kernel.fun)
threads = min(length(y), config.threads)
blocks = cld(length(y), threads)

CUDA.@sync begin
kernel(y, x; threads, blocks)
end
end

# ```julia
# @btime bench_gpu4!($y_d, $x_d)
# ```

# ```
# 70.826 μs (99 allocations: 3.44 KiB)
# ```

# A comparable performance; slightly slower due to the use of the occupancy API, but that
# will not matter with more complex kernels.


# ### Printing

# When debugging, it's not uncommon to want to print some values. This is achieved with
Expand Down

0 comments on commit 179498a

Please sign in to comment.