Skip to content

Tags: Budhyant/CUDA.jl

Tags

v2.6.1

Toggle v2.6.1's commit message
## CUDA v2.6.1

[Diff since v2.6.0](JuliaGPU/CUDA.jl@v2.6.0...v2.6.1)


**Closed issues:**
- CUDA 11.2 (JuliaGPU#601)
- LLVM not found (JuliaGPU#681)

**Merged pull requests:**
- Automatic task-based concurrency using local streams (JuliaGPU#662) (@maleadt)
- add beta keyword to conv (JuliaGPU#672) (@jw3126)
- Protect the kernel closure from GC collection. (JuliaGPU#674) (@maleadt)
- Track external globals, use it to avoid needless exception flags (JuliaGPU#675) (@maleadt)
- Adapt to GPUCompiler changes. (JuliaGPU#676) (@maleadt)
- Minor improvements (JuliaGPU#677) (@maleadt)
- CompatHelper: add new compat entry for "Memoize" at version "0.4" (JuliaGPU#678) (@github-actions[bot])
- Use released GPUCompiler. (JuliaGPU#683) (@maleadt)
- v2.6.1 (JuliaGPU#684) (@maleadt)

v2.6.0

Toggle v2.6.0's commit message
## CUDA v2.6.0

[Diff since v2.5.0](JuliaGPU/CUDA.jl@v2.5.0...v2.6.0)


**Closed issues:**
- Invalid results due to shared memory + multiple function exits (?) mysteriously solved by @cuprintf (JuliaGPU#43)
- NVML-related segfault on Windows (JuliaGPU#610)
- @cuda with config keyword sometimes allocate lots of memory (JuliaGPU#643)
- Can someone with push access run the TagBot workflow? (JuliaGPU#644)
- Taking gradient with Flux results in NaNs when using CUDA arrays but not when using CPU arrays (JuliaGPU#657)
- Broadcasting fails in a special case (JuliaGPU#658)
- view causes KeyError in alias (JuliaGPU#661)
- PTXCompilerTarget error when creating a CuArray with Float64 (JuliaGPU#664)
- Complex dot product performance of CuArrays and of StructArrays of CuArrays (JuliaGPU#667)
- could not load cublas64_11.dll (JuliaGPU#670)

**Merged pull requests:**
- CUDA quicksort (JuliaGPU#431) (@xaellison)
- Bump Reexport to 1.0 (JuliaGPU#640) (@DhairyaLGandhi)
- Use newer NVML initialization method. (JuliaGPU#641) (@maleadt)
- README: add some information on viewing capabilities of your devices (JuliaGPU#642) (@DilumAluthge)
- Remove duplicate functions. (JuliaGPU#645) (@maleadt)
- Use released version of Adapt.jl (JuliaGPU#646) (@maleadt)
- Simplify list of tests to skip. (JuliaGPU#647) (@maleadt)
- Use a test-specific Project.toml. (JuliaGPU#648) (@maleadt)
- Use raw output for CUBLAS log message. (JuliaGPU#649) (@maleadt)
- Close the async condition used to call host functions. (JuliaGPU#650) (@maleadt)
- Backports for Julia 1.5 / CUDA 2.4 (JuliaGPU#651) (@maleadt)
- Allow running benchmarks outside of the master branch on other systems. (JuliaGPU#652) (@maleadt)
- Bump GPUCompiler. (JuliaGPU#653) (@maleadt)
- Reuse the compiler when generating SASS code. (JuliaGPU#654) (@maleadt)
- Run the tests from the current directory. (JuliaGPU#655) (@maleadt)
- Configure the PTX GPUCompiler codegen quirks. (JuliaGPU#656) (@maleadt)
- Update manifest (JuliaGPU#660) (@github-actions[bot])
- Support view on unmanaged arrays. (JuliaGPU#663) (@maleadt)
- Retry CuModule creation when OOM. (JuliaGPU#665) (@maleadt)
- Make fill async. (JuliaGPU#669) (@maleadt)
- Fix version lookups. (JuliaGPU#671) (@maleadt)
- Update manifest (JuliaGPU#673) (@github-actions[bot])

v2.4.1

Toggle v2.4.1's commit message
## CUDA v2.4.1

[Diff since v2.4.0](JuliaGPU/CUDA.jl@v2.4.0...v2.4.1)


**Closed issues:**
- `cudaconvert` for closures (JuliaGPU#67)
- Invalid results due to shared memory + multiple function exits (?) mysteriously solved by @cuprintf (JuliaGPU#43)
- NVML-related segfault on Windows (JuliaGPU#610)
- Update Reexport compat (JuliaGPU#629)
- Incomplete CUDA device attributes list  (JuliaGPU#637)
- @cuda with config keyword sometimes allocate lots of memory (JuliaGPU#643)
- Can someone with push access run the TagBot workflow? (JuliaGPU#644)
- Taking gradient with Flux results in NaNs when using CUDA arrays but not when using CPU arrays (JuliaGPU#657)
- Broadcasting fails in a special case (JuliaGPU#658)
- view causes KeyError in alias (JuliaGPU#661)
- PTXCompilerTarget error when creating a CuArray with Float64 (JuliaGPU#664)
- Complex dot product performance of CuArrays and of StructArrays of CuArrays (JuliaGPU#667)

**Merged pull requests:**
- CUDA quicksort (JuliaGPU#431) (@xaellison)
- cudaconvert captured values in closures. (JuliaGPU#625) (@maleadt)
- CompatHelper: only instantiate `/Manifest.toml` (the manifest file in the root of the repository) (JuliaGPU#631) (@DilumAluthge)
- CompatHelper: bump compat for "Reexport" to "1.0" (JuliaGPU#633) (@github-actions[bot])
- CompatHelper: bump compat for "AbstractFFTs" to "1.0" (JuliaGPU#634) (@github-actions[bot])
- Update wrappers (JuliaGPU#638) (@maleadt)
- Bump artifacts for Windows/Julia 1.6 compatibility. (JuliaGPU#639) (@maleadt)
- Bump Reexport to 1.0 (JuliaGPU#640) (@DhairyaLGandhi)
- Use newer NVML initialization method. (JuliaGPU#641) (@maleadt)
- README: add some information on viewing capabilities of your devices (JuliaGPU#642) (@DilumAluthge)
- Remove duplicate functions. (JuliaGPU#645) (@maleadt)
- Use released version of Adapt.jl (JuliaGPU#646) (@maleadt)
- Simplify list of tests to skip. (JuliaGPU#647) (@maleadt)
- Use a test-specific Project.toml. (JuliaGPU#648) (@maleadt)
- Use raw output for CUBLAS log message. (JuliaGPU#649) (@maleadt)
- Close the async condition used to call host functions. (JuliaGPU#650) (@maleadt)
- Backports for Julia 1.5 / CUDA 2.4 (JuliaGPU#651) (@maleadt)
- Allow running benchmarks outside of the master branch on other systems. (JuliaGPU#652) (@maleadt)
- Bump GPUCompiler. (JuliaGPU#653) (@maleadt)
- Reuse the compiler when generating SASS code. (JuliaGPU#654) (@maleadt)
- Run the tests from the current directory. (JuliaGPU#655) (@maleadt)
- Configure the PTX GPUCompiler codegen quirks. (JuliaGPU#656) (@maleadt)
- Update manifest (JuliaGPU#660) (@github-actions[bot])
- Support view on unmanaged arrays. (JuliaGPU#663) (@maleadt)
- Retry CuModule creation when OOM. (JuliaGPU#665) (@maleadt)
- Make fill async. (JuliaGPU#669) (@maleadt)

v2.5.0

Toggle v2.5.0's commit message
## CUDA v2.5.0

[Diff since v2.4.0](JuliaGPU/CUDA.jl@v2.4.0...v2.5.0)

v2.4.0

Toggle v2.4.0's commit message
## CUDA v2.4.0

[Diff since v2.3.0](JuliaGPU/CUDA.jl@v2.3.0...v2.4.0)


**Closed issues:**
- cublasXtStrmm test failures on Windows 10 Julia 1.1 (JuliaGPU#124)
- CUSPARSE tests broken (JuliaGPU#259)
- Make @cuda return a kernel object (JuliaGPU#341)
- Depend on CompilerSupportLibraries (JuliaGPU#359)
- CUBLAS and exceptions test failures on Windows (JuliaGPU#536)
- argmax(::CuArray) returns nothing with NaN-values (JuliaGPU#553)
- Multiple @cuDynamicSharedMem in kernel causes unexpected behavior (JuliaGPU#555)
- Illegal memory access with atomic shared memory (JuliaGPU#558)
- CUDA.sqrt will not found symbol "__nv_sqrt" (JuliaGPU#559)
- Exception with CUDA.exp (JuliaGPU#561)
- Use LazyArtifacts instead of Pkg (JuliaGPU#570)
- Test runner: early bail out (JuliaGPU#578)
- memory reporting issue (JuliaGPU#579)
- c[3:4]=0 leads to exception (JuliaGPU#580)
- Add math ops (including broadcast) for half types (JuliaGPU#581)
- Dot product of Array and CuArray fails with CPU address error. (JuliaGPU#586)
- Support for CUDA-capable GPU with compute capability 4.0 like GTX 1080 (JuliaGPU#587)
- mapreducedim! not threadsafe (JuliaGPU#588)
- Allow separate directories for cuda and cudnn (JuliaGPU#590)
- Difficulties installing CUDA on Julia 1.6.0 . (JuliaGPU#591)
- Bug in Initialisation Error (JuliaGPU#603)
- CUDA.jl initialisation fails after suspending Ubuntu 20.04 with CUDA 11.2 (JuliaGPU#605)
- CUDA 11.2 CUBLASError and "CUDA.jl does not yet support CUDA with nvdisasm 11.2.67" (JuliaGPU#607)
- This intrinsic must be compiled to be called (JuliaGPU#611)
- OpenGL interop (JuliaGPU#612)
- Add support for CuFFT callback functions (JuliaGPU#614)
-  I can’t multiply a CSR sparse matrix anymore (JuliaGPU#615)
- Julia version requirement (JuliaGPU#619)

**Merged pull requests:**
- Support all combinations of datatypes and transposes/adjoints in LinearAlgebra (JuliaGPU#535) (@cqql)
- Use structs for texture intrinsic return types. (JuliaGPU#554) (@maleadt)
- Backport some 1.6 fixes (JuliaGPU#557) (@maleadt)
- Update manifest (JuliaGPU#560) (@github-actions[bot])
- Correct dims error (JuliaGPU#562) (@DhairyaLGandhi)
- Lock `_shmem_cb` (JuliaGPU#564) (@vchuravy)
- Move to Julia 1.6 (JuliaGPU#566) (@maleadt)
- Adapt to JuliaLang/julia#38487. (JuliaGPU#568) (@maleadt)
- Support for 'delayed kernels' (JuliaGPU#569) (@maleadt)
- Run cuda-memcheck as part of CI (JuliaGPU#571) (@maleadt)
- Use at-sync instead of calls to synchronize in tests. (JuliaGPU#572) (@maleadt)
- Update artifacts to include cuda-memcheck (JuliaGPU#573) (@maleadt)
- Use LazyArtifacts instead of Pkg. (JuliaGPU#574) (@maleadt)
- Improve LinearAlgebra impl methods for triangular types (JuliaGPU#575) (@maleadt)
- New findmin/max implementation using single-pass reduction (JuliaGPU#576) (@maleadt)
- Fix synchronization before testing cublasXt calls. (JuliaGPU#577) (@maleadt)
- Fix used memory reporting. (JuliaGPU#582) (@maleadt)
- Implement Statistics.varm/stdm instead of Statistics._var (JuliaGPU#583) (@sdewaele)
- Test for JuliaGPU#558. (JuliaGPU#584) (@maleadt)
- Add a quick failure option to the test runner. (JuliaGPU#585) (@maleadt)
- Add lock around `cfunction` lookup (JuliaGPU#589) (@vchuravy)
- Catch all initialization errors. (JuliaGPU#593) (@maleadt)
- Update dependencies. (JuliaGPU#596) (@maleadt)
- Fix wrong initialisation error message (JuliaGPU#604) (@qin-yu)
- Fixes wrong spacing in docstring admonition (JuliaGPU#608) (@navidcy)
- Fix broadcasting with Base.angle (JuliaGPU#618) (@marius311)
- Test with the 1.6 nightly, not 1.7. (JuliaGPU#620) (@maleadt)
- Wrap cudaGL.h (JuliaGPU#621) (@maleadt)
- Initial compatibility with CUDA 11.2. (JuliaGPU#622) (@maleadt)
- 1.5 compatibility release (JuliaGPU#623) (@maleadt)
- Add CUDA 11.2 artifacts. (JuliaGPU#624) (@maleadt)

v2.3.0

Toggle v2.3.0's commit message
## CUDA v2.3.0

[Diff since v2.2.1](JuliaGPU/CUDA.jl@v2.2.1...v2.3.0)


**Closed issues:**
- Misaligned address on load from `Const` (JuliaGPU#548)

**Merged pull requests:**
- Allow `PermutedDimsArray` in `gemm_strided_batched` (JuliaGPU#539) (@mcabbott)
- Fix broken checkbounds for CuSparseMatrixCSR and tests (JuliaGPU#545) (@achuchmala)
- Emphasize rebooting option. (JuliaGPU#547) (@xanfus)
- fix address calculation for ldg (JuliaGPU#549) (@vchuravy)
- Don't use explicit per-stream threads. (JuliaGPU#551) (@maleadt)

v2.2.1

Toggle v2.2.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Bump version again.

By having skipped CI on the previous tagged commit, we didn't get documentation.
[skip tests]

v2.2.0

Toggle v2.2.0's commit message
## CUDA v2.2.0

[Diff since v2.1.0](JuliaGPU/CUDA.jl@v2.1.0...v2.2.0)


**Closed issues:**
- cudnn missing after downloading artifact (JuliaGPU#521)
- Downloading artifact: CUDA110 when using DiffEqFlux (JuliaGPU#542)

**Merged pull requests:**
- Update manifest (JuliaGPU#520) (@github-actions[bot])
- Try out Buildkite. (JuliaGPU#522) (@maleadt)
- Update manifest (JuliaGPU#529) (@github-actions[bot])
- Support for / Upgrade to CUDA 11.1 update 1. (JuliaGPU#530) (@maleadt)
- Fix and test svd! (JuliaGPU#531) (@maleadt)
- Move more CI to Buildkite. (JuliaGPU#532) (@maleadt)
- Use type symbols to generate wrapper methods (JuliaGPU#534) (@cqql)
- Fully move to Buildkite. (JuliaGPU#537) (@maleadt)
- Add unit_diag option for sv2! functions (JuliaGPU#540) (@amontoison)
- Documentation fixes (JuliaGPU#543) (@maleadt)

v2.1.0

Toggle v2.1.0's commit message
## CUDA v2.1.0

[Diff since v2.0.2](JuliaGPU/CUDA.jl@v2.0.2...v2.1.0)


**Closed issues:**
- CUDNN convolution with Float16 always returns zeros  (JuliaGPU#92)
- axp(b)y! and mul! (scalar multiplication) with mixed argument types (JuliaGPU#144)
- Dispatching to generic matmul instead of CUBLAS (JuliaGPU#164)
- Support for Ints and Float16? (JuliaGPU#165)
- Subarrays/views support (JuliaGPU#172)
- Easy way to pick among multiple GPUs  (JuliaGPU#174)
- More prominently document JULIA_CUDA_USE_BINARYBUILDER (JuliaGPU#204)
- ERROR_COOPERATIVE_LAUNCH_TOO_LARGE during tests (JuliaGPU#247)
- Pkg.test error for cutensor test on Windows (JuliaGPU#422)
- Runtime build improvements (JuliaGPU#456)
- Fusing Wrappers (JuliaGPU#467)
- Could not find nvToolsExt (libnvToolsExt.dylib.1.0 or libnvToolsExt.dylib.1) in /Users/imac/.julia/artifacts/b502baf54095dff4a69fd6aba8667124583f6929/lib (JuliaGPU#482)
- mapreduce assumes commutative op (JuliaGPU#484)
- SubArray Broadcast Bug in 2.0 (JuliaGPU#488)
- Nested SubArray Scalar Indexing (JuliaGPU#490)
- Sparse matrix * view(vector) regression in 2.0 (JuliaGPU#493)
- Error transforming a reshaped 0-dimentional GPU array to a CPU array (JuliaGPU#494)
- test cuda FAILURE (JuliaGPU#496)
- Reshaped CuArray is not DenseCuArray (JuliaGPU#511)
- assignment failure when using array slicing. (JuliaGPU#516)

**Merged pull requests:**
- Use the correct CUDNN scaling parameter type. (JuliaGPU#454) (@maleadt)
- Fix versioned dylib discovery. (JuliaGPU#486) (@maleadt)
- Move inv from GPUArrays. (JuliaGPU#487) (@maleadt)
- Use dense array types in sparse wrappers. (JuliaGPU#495) (@maleadt)
- Update manifest (JuliaGPU#497) (@github-actions[bot])
- Revert array wrapper union changes (JuliaGPU#498) (@maleadt)
- Clean-up pointer field. (JuliaGPU#499) (@maleadt)
- mapreduce: change iteration for compatibility with non-commutative operators. (JuliaGPU#500) (@maleadt)
- Use versioned libcuda (JuliaGPU#502) (@maleadt)
- Dynamically choose versioned libcuda (JuliaGPU#503) (@mustafaquraish)
- Update multigpu.md (JuliaGPU#504) (@efmanu)
- Upgrade artifacts for CUDA 11 compatibility. (JuliaGPU#506) (@maleadt)
- Update dependencies. (JuliaGPU#507) (@maleadt)
- Convert unsigned short ints to Cint for printf. (JuliaGPU#508) (@maleadt)
- Update manifest (JuliaGPU#510) (@github-actions[bot])
- Fix reshape with missing dimensions. (JuliaGPU#512) (@maleadt)
- Don't return a pointer from 'alias'. (JuliaGPU#513) (@maleadt)
- Add some docs (JuliaGPU#514) (@maleadt)
- Fix CUDNN-optimized activation broadcasts (JuliaGPU#515) (@maleadt)
- Fix cooperative launch test. (JuliaGPU#517) (@maleadt)
- Fixes for Windows (JuliaGPU#518) (@maleadt)
- CUTENSOR fixes on Windows (JuliaGPU#519) (@maleadt)

v2.0.2

Toggle v2.0.2's commit message
## CUDA v2.0.2

[Diff since v2.0.1](JuliaGPU/CUDA.jl@v2.0.1...v2.0.2)


**Closed issues:**
- cu() behavior for complex floating point numbers (JuliaGPU#91)
- Error when following example on using multiple GPUs on multiple processes (JuliaGPU#468)
- MacOS without nvidia GPU is trying to download CUDA111 on julia nightly (JuliaGPU#469)
- Drop BinaryProvider? (JuliaGPU#474)
- Latest version of master doesn't work on Windows (JuliaGPU#477)
- `sum(CUDA.rand(3,3))` broken (JuliaGPU#480)
- copyto!() between cpu and gpu with subarrays (JuliaGPU#491)

**Merged pull requests:**
- Adapt to GPUCompiler changes. (JuliaGPU#458) (@maleadt)
- Fix initialization of global state (JuliaGPU#471) (@maleadt)
- Remove 'view' implementation. (JuliaGPU#472) (@maleadt)
- Workaround new artifact"" eagerness that prevents loading on unsupported platforms (JuliaGPU#473) (@ianshmean)
- Remove BinaryProvider dep. (JuliaGPU#475) (@maleadt)
- typo:  libcuda.dll -> libcuda.so on Linux (JuliaGPU#476) (@Alexander-Barth)
- NFC array simplifications. (JuliaGPU#481) (@maleadt)
- Update manifest (JuliaGPU#485) (@github-actions[bot])
- Convert AbstractArray{ComplexF64} to CuArray{ComplexF32} by default (JuliaGPU#489) (@pabloferz)