Cupy, CUDA Bilinear interpolation

Ultra fast Bilinear interpolation in image resize with CUDA.

lerp.py : Concept and code base (*single thread, may take a while to run).
resize_ker.cu : CUDA test case in C.
resize.py : Cupy example

(*PyCUDA(deprecated) is no longer support , use cupy instead )

Requirements:

GPU (compute capability: 3.0 or above, testing platform: 7.5)

CUDA driver

Docker and nvidia docker

Pros:

support Batch image.
no shared object .so and .dll binary file
Install cupy and use
Compatible to Numpy library
pass the GPU array to TensorRT directly.

Cons:

still need the concept of CUDA programming
SourceModule have to write in C CUDA, including all CUDA kernel and device code

Quick Start

# Pull docker image
docker run -it --runtime=nvidia royinx/cuda_resize bash

# For Cupy implementation
python3 resize.py

# For concept
python3 lerp.py

# For CUDA kernel testing
nvcc resize_free.cu -o resize_free.o && ./resize_free.o

# For benmarking
wget http://images.cocodataset.org/zips/val2017.zip
unzip val2017.zip
python3 benchmark.py

Build

git clone https://github.com/royinx/CUDA_Resize.git
cd CUDA_Resize
docker build -t lerp_cuda .
docker run -it --runtime=nvidia -v ${PWD}:/py -w /py lerp_cuda bash

Advance Metrics

docker run -it --privileged --runtime=nvidia -p 20072:22 -v ${PWD}:/py -w /py lerp_cuda bash
sh -c 'echo 1 >/proc/sys/kernel/perf_event_paranoid'
nvcc resize_free.cu -o resize_free.o
nsys profile ./resize_free.o

ncu -o metrics /bin/python3 resize.py  > profile_log
ncu -o metrics /bin/python3 resize.py

Remark: Development platform is in dockerfile.opencv with OpenCV in C for debugging

Function Working well in pycuda container, you dont need to build OpenCV.

Benchmark

2080ti

ratio = 2080ti (ms) / Ryzen 2700x (ms)

time (us/img)

shared memory

(Deprecated) [w/o smem] AWS g4dn.xlarge (Tesla T4)

ratio = T4 (ms) per img / Xeon Platinum 8259CL (ms) per img

(ms) per img on T4

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github		.github
benchmark		benchmark
deprecated		deprecated
lintrc		lintrc
tools		tools
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
benchmark.py		benchmark.py
lerp.py		lerp.py
lib_cuResize.cu		lib_cuResize.cu
resize.py		resize.py
resize_formated.py		resize_formated.py
resize_free.cu		resize_free.cu
rgba.png		rgba.png
trump.jpg		trump.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cupy, CUDA Bilinear interpolation

Quick Start

Benchmark

2080ti

(Deprecated) [w/o smem] AWS g4dn.xlarge (Tesla T4)

About

Releases

Packages

Contributors 3

Languages

royinx/CUDA_Resize

Folders and files

Latest commit

History

Repository files navigation

Cupy, CUDA Bilinear interpolation

Quick Start

Benchmark

2080ti

(Deprecated) [w/o smem] AWS g4dn.xlarge (Tesla T4)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages