Tags: Budhyant/CUDA.jl
Tags
## CUDA v2.6.1 [Diff since v2.6.0](JuliaGPU/CUDA.jl@v2.6.0...v2.6.1) **Closed issues:** - CUDA 11.2 (JuliaGPU#601) - LLVM not found (JuliaGPU#681) **Merged pull requests:** - Automatic task-based concurrency using local streams (JuliaGPU#662) (@maleadt) - add beta keyword to conv (JuliaGPU#672) (@jw3126) - Protect the kernel closure from GC collection. (JuliaGPU#674) (@maleadt) - Track external globals, use it to avoid needless exception flags (JuliaGPU#675) (@maleadt) - Adapt to GPUCompiler changes. (JuliaGPU#676) (@maleadt) - Minor improvements (JuliaGPU#677) (@maleadt) - CompatHelper: add new compat entry for "Memoize" at version "0.4" (JuliaGPU#678) (@github-actions[bot]) - Use released GPUCompiler. (JuliaGPU#683) (@maleadt) - v2.6.1 (JuliaGPU#684) (@maleadt)
## CUDA v2.6.0 [Diff since v2.5.0](JuliaGPU/CUDA.jl@v2.5.0...v2.6.0) **Closed issues:** - Invalid results due to shared memory + multiple function exits (?) mysteriously solved by @cuprintf (JuliaGPU#43) - NVML-related segfault on Windows (JuliaGPU#610) - @cuda with config keyword sometimes allocate lots of memory (JuliaGPU#643) - Can someone with push access run the TagBot workflow? (JuliaGPU#644) - Taking gradient with Flux results in NaNs when using CUDA arrays but not when using CPU arrays (JuliaGPU#657) - Broadcasting fails in a special case (JuliaGPU#658) - view causes KeyError in alias (JuliaGPU#661) - PTXCompilerTarget error when creating a CuArray with Float64 (JuliaGPU#664) - Complex dot product performance of CuArrays and of StructArrays of CuArrays (JuliaGPU#667) - could not load cublas64_11.dll (JuliaGPU#670) **Merged pull requests:** - CUDA quicksort (JuliaGPU#431) (@xaellison) - Bump Reexport to 1.0 (JuliaGPU#640) (@DhairyaLGandhi) - Use newer NVML initialization method. (JuliaGPU#641) (@maleadt) - README: add some information on viewing capabilities of your devices (JuliaGPU#642) (@DilumAluthge) - Remove duplicate functions. (JuliaGPU#645) (@maleadt) - Use released version of Adapt.jl (JuliaGPU#646) (@maleadt) - Simplify list of tests to skip. (JuliaGPU#647) (@maleadt) - Use a test-specific Project.toml. (JuliaGPU#648) (@maleadt) - Use raw output for CUBLAS log message. (JuliaGPU#649) (@maleadt) - Close the async condition used to call host functions. (JuliaGPU#650) (@maleadt) - Backports for Julia 1.5 / CUDA 2.4 (JuliaGPU#651) (@maleadt) - Allow running benchmarks outside of the master branch on other systems. (JuliaGPU#652) (@maleadt) - Bump GPUCompiler. (JuliaGPU#653) (@maleadt) - Reuse the compiler when generating SASS code. (JuliaGPU#654) (@maleadt) - Run the tests from the current directory. (JuliaGPU#655) (@maleadt) - Configure the PTX GPUCompiler codegen quirks. (JuliaGPU#656) (@maleadt) - Update manifest (JuliaGPU#660) (@github-actions[bot]) - Support view on unmanaged arrays. (JuliaGPU#663) (@maleadt) - Retry CuModule creation when OOM. (JuliaGPU#665) (@maleadt) - Make fill async. (JuliaGPU#669) (@maleadt) - Fix version lookups. (JuliaGPU#671) (@maleadt) - Update manifest (JuliaGPU#673) (@github-actions[bot])
## CUDA v2.4.1 [Diff since v2.4.0](JuliaGPU/CUDA.jl@v2.4.0...v2.4.1) **Closed issues:** - `cudaconvert` for closures (JuliaGPU#67) - Invalid results due to shared memory + multiple function exits (?) mysteriously solved by @cuprintf (JuliaGPU#43) - NVML-related segfault on Windows (JuliaGPU#610) - Update Reexport compat (JuliaGPU#629) - Incomplete CUDA device attributes list (JuliaGPU#637) - @cuda with config keyword sometimes allocate lots of memory (JuliaGPU#643) - Can someone with push access run the TagBot workflow? (JuliaGPU#644) - Taking gradient with Flux results in NaNs when using CUDA arrays but not when using CPU arrays (JuliaGPU#657) - Broadcasting fails in a special case (JuliaGPU#658) - view causes KeyError in alias (JuliaGPU#661) - PTXCompilerTarget error when creating a CuArray with Float64 (JuliaGPU#664) - Complex dot product performance of CuArrays and of StructArrays of CuArrays (JuliaGPU#667) **Merged pull requests:** - CUDA quicksort (JuliaGPU#431) (@xaellison) - cudaconvert captured values in closures. (JuliaGPU#625) (@maleadt) - CompatHelper: only instantiate `/Manifest.toml` (the manifest file in the root of the repository) (JuliaGPU#631) (@DilumAluthge) - CompatHelper: bump compat for "Reexport" to "1.0" (JuliaGPU#633) (@github-actions[bot]) - CompatHelper: bump compat for "AbstractFFTs" to "1.0" (JuliaGPU#634) (@github-actions[bot]) - Update wrappers (JuliaGPU#638) (@maleadt) - Bump artifacts for Windows/Julia 1.6 compatibility. (JuliaGPU#639) (@maleadt) - Bump Reexport to 1.0 (JuliaGPU#640) (@DhairyaLGandhi) - Use newer NVML initialization method. (JuliaGPU#641) (@maleadt) - README: add some information on viewing capabilities of your devices (JuliaGPU#642) (@DilumAluthge) - Remove duplicate functions. (JuliaGPU#645) (@maleadt) - Use released version of Adapt.jl (JuliaGPU#646) (@maleadt) - Simplify list of tests to skip. (JuliaGPU#647) (@maleadt) - Use a test-specific Project.toml. (JuliaGPU#648) (@maleadt) - Use raw output for CUBLAS log message. (JuliaGPU#649) (@maleadt) - Close the async condition used to call host functions. (JuliaGPU#650) (@maleadt) - Backports for Julia 1.5 / CUDA 2.4 (JuliaGPU#651) (@maleadt) - Allow running benchmarks outside of the master branch on other systems. (JuliaGPU#652) (@maleadt) - Bump GPUCompiler. (JuliaGPU#653) (@maleadt) - Reuse the compiler when generating SASS code. (JuliaGPU#654) (@maleadt) - Run the tests from the current directory. (JuliaGPU#655) (@maleadt) - Configure the PTX GPUCompiler codegen quirks. (JuliaGPU#656) (@maleadt) - Update manifest (JuliaGPU#660) (@github-actions[bot]) - Support view on unmanaged arrays. (JuliaGPU#663) (@maleadt) - Retry CuModule creation when OOM. (JuliaGPU#665) (@maleadt) - Make fill async. (JuliaGPU#669) (@maleadt)
## CUDA v2.5.0 [Diff since v2.4.0](JuliaGPU/CUDA.jl@v2.4.0...v2.5.0)
## CUDA v2.4.0 [Diff since v2.3.0](JuliaGPU/CUDA.jl@v2.3.0...v2.4.0) **Closed issues:** - cublasXtStrmm test failures on Windows 10 Julia 1.1 (JuliaGPU#124) - CUSPARSE tests broken (JuliaGPU#259) - Make @cuda return a kernel object (JuliaGPU#341) - Depend on CompilerSupportLibraries (JuliaGPU#359) - CUBLAS and exceptions test failures on Windows (JuliaGPU#536) - argmax(::CuArray) returns nothing with NaN-values (JuliaGPU#553) - Multiple @cuDynamicSharedMem in kernel causes unexpected behavior (JuliaGPU#555) - Illegal memory access with atomic shared memory (JuliaGPU#558) - CUDA.sqrt will not found symbol "__nv_sqrt" (JuliaGPU#559) - Exception with CUDA.exp (JuliaGPU#561) - Use LazyArtifacts instead of Pkg (JuliaGPU#570) - Test runner: early bail out (JuliaGPU#578) - memory reporting issue (JuliaGPU#579) - c[3:4]=0 leads to exception (JuliaGPU#580) - Add math ops (including broadcast) for half types (JuliaGPU#581) - Dot product of Array and CuArray fails with CPU address error. (JuliaGPU#586) - Support for CUDA-capable GPU with compute capability 4.0 like GTX 1080 (JuliaGPU#587) - mapreducedim! not threadsafe (JuliaGPU#588) - Allow separate directories for cuda and cudnn (JuliaGPU#590) - Difficulties installing CUDA on Julia 1.6.0 . (JuliaGPU#591) - Bug in Initialisation Error (JuliaGPU#603) - CUDA.jl initialisation fails after suspending Ubuntu 20.04 with CUDA 11.2 (JuliaGPU#605) - CUDA 11.2 CUBLASError and "CUDA.jl does not yet support CUDA with nvdisasm 11.2.67" (JuliaGPU#607) - This intrinsic must be compiled to be called (JuliaGPU#611) - OpenGL interop (JuliaGPU#612) - Add support for CuFFT callback functions (JuliaGPU#614) - I can’t multiply a CSR sparse matrix anymore (JuliaGPU#615) - Julia version requirement (JuliaGPU#619) **Merged pull requests:** - Support all combinations of datatypes and transposes/adjoints in LinearAlgebra (JuliaGPU#535) (@cqql) - Use structs for texture intrinsic return types. (JuliaGPU#554) (@maleadt) - Backport some 1.6 fixes (JuliaGPU#557) (@maleadt) - Update manifest (JuliaGPU#560) (@github-actions[bot]) - Correct dims error (JuliaGPU#562) (@DhairyaLGandhi) - Lock `_shmem_cb` (JuliaGPU#564) (@vchuravy) - Move to Julia 1.6 (JuliaGPU#566) (@maleadt) - Adapt to JuliaLang/julia#38487. (JuliaGPU#568) (@maleadt) - Support for 'delayed kernels' (JuliaGPU#569) (@maleadt) - Run cuda-memcheck as part of CI (JuliaGPU#571) (@maleadt) - Use at-sync instead of calls to synchronize in tests. (JuliaGPU#572) (@maleadt) - Update artifacts to include cuda-memcheck (JuliaGPU#573) (@maleadt) - Use LazyArtifacts instead of Pkg. (JuliaGPU#574) (@maleadt) - Improve LinearAlgebra impl methods for triangular types (JuliaGPU#575) (@maleadt) - New findmin/max implementation using single-pass reduction (JuliaGPU#576) (@maleadt) - Fix synchronization before testing cublasXt calls. (JuliaGPU#577) (@maleadt) - Fix used memory reporting. (JuliaGPU#582) (@maleadt) - Implement Statistics.varm/stdm instead of Statistics._var (JuliaGPU#583) (@sdewaele) - Test for JuliaGPU#558. (JuliaGPU#584) (@maleadt) - Add a quick failure option to the test runner. (JuliaGPU#585) (@maleadt) - Add lock around `cfunction` lookup (JuliaGPU#589) (@vchuravy) - Catch all initialization errors. (JuliaGPU#593) (@maleadt) - Update dependencies. (JuliaGPU#596) (@maleadt) - Fix wrong initialisation error message (JuliaGPU#604) (@qin-yu) - Fixes wrong spacing in docstring admonition (JuliaGPU#608) (@navidcy) - Fix broadcasting with Base.angle (JuliaGPU#618) (@marius311) - Test with the 1.6 nightly, not 1.7. (JuliaGPU#620) (@maleadt) - Wrap cudaGL.h (JuliaGPU#621) (@maleadt) - Initial compatibility with CUDA 11.2. (JuliaGPU#622) (@maleadt) - 1.5 compatibility release (JuliaGPU#623) (@maleadt) - Add CUDA 11.2 artifacts. (JuliaGPU#624) (@maleadt)
## CUDA v2.3.0 [Diff since v2.2.1](JuliaGPU/CUDA.jl@v2.2.1...v2.3.0) **Closed issues:** - Misaligned address on load from `Const` (JuliaGPU#548) **Merged pull requests:** - Allow `PermutedDimsArray` in `gemm_strided_batched` (JuliaGPU#539) (@mcabbott) - Fix broken checkbounds for CuSparseMatrixCSR and tests (JuliaGPU#545) (@achuchmala) - Emphasize rebooting option. (JuliaGPU#547) (@xanfus) - fix address calculation for ldg (JuliaGPU#549) (@vchuravy) - Don't use explicit per-stream threads. (JuliaGPU#551) (@maleadt)
## CUDA v2.2.0 [Diff since v2.1.0](JuliaGPU/CUDA.jl@v2.1.0...v2.2.0) **Closed issues:** - cudnn missing after downloading artifact (JuliaGPU#521) - Downloading artifact: CUDA110 when using DiffEqFlux (JuliaGPU#542) **Merged pull requests:** - Update manifest (JuliaGPU#520) (@github-actions[bot]) - Try out Buildkite. (JuliaGPU#522) (@maleadt) - Update manifest (JuliaGPU#529) (@github-actions[bot]) - Support for / Upgrade to CUDA 11.1 update 1. (JuliaGPU#530) (@maleadt) - Fix and test svd! (JuliaGPU#531) (@maleadt) - Move more CI to Buildkite. (JuliaGPU#532) (@maleadt) - Use type symbols to generate wrapper methods (JuliaGPU#534) (@cqql) - Fully move to Buildkite. (JuliaGPU#537) (@maleadt) - Add unit_diag option for sv2! functions (JuliaGPU#540) (@amontoison) - Documentation fixes (JuliaGPU#543) (@maleadt)
## CUDA v2.1.0 [Diff since v2.0.2](JuliaGPU/CUDA.jl@v2.0.2...v2.1.0) **Closed issues:** - CUDNN convolution with Float16 always returns zeros (JuliaGPU#92) - axp(b)y! and mul! (scalar multiplication) with mixed argument types (JuliaGPU#144) - Dispatching to generic matmul instead of CUBLAS (JuliaGPU#164) - Support for Ints and Float16? (JuliaGPU#165) - Subarrays/views support (JuliaGPU#172) - Easy way to pick among multiple GPUs (JuliaGPU#174) - More prominently document JULIA_CUDA_USE_BINARYBUILDER (JuliaGPU#204) - ERROR_COOPERATIVE_LAUNCH_TOO_LARGE during tests (JuliaGPU#247) - Pkg.test error for cutensor test on Windows (JuliaGPU#422) - Runtime build improvements (JuliaGPU#456) - Fusing Wrappers (JuliaGPU#467) - Could not find nvToolsExt (libnvToolsExt.dylib.1.0 or libnvToolsExt.dylib.1) in /Users/imac/.julia/artifacts/b502baf54095dff4a69fd6aba8667124583f6929/lib (JuliaGPU#482) - mapreduce assumes commutative op (JuliaGPU#484) - SubArray Broadcast Bug in 2.0 (JuliaGPU#488) - Nested SubArray Scalar Indexing (JuliaGPU#490) - Sparse matrix * view(vector) regression in 2.0 (JuliaGPU#493) - Error transforming a reshaped 0-dimentional GPU array to a CPU array (JuliaGPU#494) - test cuda FAILURE (JuliaGPU#496) - Reshaped CuArray is not DenseCuArray (JuliaGPU#511) - assignment failure when using array slicing. (JuliaGPU#516) **Merged pull requests:** - Use the correct CUDNN scaling parameter type. (JuliaGPU#454) (@maleadt) - Fix versioned dylib discovery. (JuliaGPU#486) (@maleadt) - Move inv from GPUArrays. (JuliaGPU#487) (@maleadt) - Use dense array types in sparse wrappers. (JuliaGPU#495) (@maleadt) - Update manifest (JuliaGPU#497) (@github-actions[bot]) - Revert array wrapper union changes (JuliaGPU#498) (@maleadt) - Clean-up pointer field. (JuliaGPU#499) (@maleadt) - mapreduce: change iteration for compatibility with non-commutative operators. (JuliaGPU#500) (@maleadt) - Use versioned libcuda (JuliaGPU#502) (@maleadt) - Dynamically choose versioned libcuda (JuliaGPU#503) (@mustafaquraish) - Update multigpu.md (JuliaGPU#504) (@efmanu) - Upgrade artifacts for CUDA 11 compatibility. (JuliaGPU#506) (@maleadt) - Update dependencies. (JuliaGPU#507) (@maleadt) - Convert unsigned short ints to Cint for printf. (JuliaGPU#508) (@maleadt) - Update manifest (JuliaGPU#510) (@github-actions[bot]) - Fix reshape with missing dimensions. (JuliaGPU#512) (@maleadt) - Don't return a pointer from 'alias'. (JuliaGPU#513) (@maleadt) - Add some docs (JuliaGPU#514) (@maleadt) - Fix CUDNN-optimized activation broadcasts (JuliaGPU#515) (@maleadt) - Fix cooperative launch test. (JuliaGPU#517) (@maleadt) - Fixes for Windows (JuliaGPU#518) (@maleadt) - CUTENSOR fixes on Windows (JuliaGPU#519) (@maleadt)
## CUDA v2.0.2 [Diff since v2.0.1](JuliaGPU/CUDA.jl@v2.0.1...v2.0.2) **Closed issues:** - cu() behavior for complex floating point numbers (JuliaGPU#91) - Error when following example on using multiple GPUs on multiple processes (JuliaGPU#468) - MacOS without nvidia GPU is trying to download CUDA111 on julia nightly (JuliaGPU#469) - Drop BinaryProvider? (JuliaGPU#474) - Latest version of master doesn't work on Windows (JuliaGPU#477) - `sum(CUDA.rand(3,3))` broken (JuliaGPU#480) - copyto!() between cpu and gpu with subarrays (JuliaGPU#491) **Merged pull requests:** - Adapt to GPUCompiler changes. (JuliaGPU#458) (@maleadt) - Fix initialization of global state (JuliaGPU#471) (@maleadt) - Remove 'view' implementation. (JuliaGPU#472) (@maleadt) - Workaround new artifact"" eagerness that prevents loading on unsupported platforms (JuliaGPU#473) (@ianshmean) - Remove BinaryProvider dep. (JuliaGPU#475) (@maleadt) - typo: libcuda.dll -> libcuda.so on Linux (JuliaGPU#476) (@Alexander-Barth) - NFC array simplifications. (JuliaGPU#481) (@maleadt) - Update manifest (JuliaGPU#485) (@github-actions[bot]) - Convert AbstractArray{ComplexF64} to CuArray{ComplexF32} by default (JuliaGPU#489) (@pabloferz)
PreviousNext