RadX (VK-1.1.109)

You asked question, but how can be fast radix sort in NVIDIA Turing GPU's?

GPU sorting shaders dedication from vRt project. Optimized for modern GPU's and written on Vulkan API (GLSL).

What to be want to do

Optimized sorting for NVIDIA RTX GPU's (and, probably, Volta GPU's)
Remove outdated and bloated code
Add new experimental features without rendering backend
In future, add support for other possible architectures (Radeon VII, Navi, Ampere)
Add support for Intel UHD Graphics 630 (if we will have time)
CUDA Compute Cabability 7.5 Interporability

In average can sort (up to, and more) 1 billion uint32_t elements per second (tested with RTX 2070, achievable in RTX 2070 Super)
Outperform parallel std::sort up to 40x faster (Intel Core i7-8700K)
Performance tested in Windows 10 (Insiders) and Visual Studio 2019
Can be built by GCC-8 in Linux systems (tested in Ubuntu 18.10)

Name		Name	Last commit message	Last commit date
Latest commit History 310 Commits
.idea		.idea
.vscode		.vscode
include		include
prebuilt/intrusive		prebuilt/intrusive
shaders-sdk		shaders-sdk
src/test		src/test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CMakeSettings.json		CMakeSettings.json
LICENSE		LICENSE
README.md		README.md