Skip to content

Tags: frapac/CUDA.jl

Tags

v3.11.0

Toggle v3.11.0's commit message
## CUDA v3.11.0

[Diff since v3.10.1](JuliaGPU/CUDA.jl@v3.10.1...v3.11.0)


**Closed issues:**
- CUSPARSE: Diagonal + CSC/CSR gives dense array (JuliaGPU#1469)
- CUBLAS: Multiplication of `UpperTriangular`/`LowerTriangular` not supported (JuliaGPU#1486)
- CUTENSOR tests consume lots of memory, breaking other tests (JuliaGPU#1501)
- CUFFT doesn't work for ComplexF64 C2C in-place (JuliaGPU#1519)
- Inconsistency of `==` and `isequal` for `CuArray` (JuliaGPU#1524)
- Setting CUDA seed the first time changes Random's RNG non-deterministically (JuliaGPU#1526)
- Undefined exported symbols (JuliaGPU#1527)
- Could not load library libLLVMExtra-14.dll (JuliaGPU#1535)
- Add an `rrule` for `cholesky` to `CUDA.jl` (JuliaGPU#1541)

**Merged pull requests:**
- specialize +/- op for sparse diag (JuliaGPU#1514) (@Roger-luo)
- Make sure instantiating RNGs doesn't affect the global CPU RNG. (JuliaGPU#1530) (@maleadt)
- Update manifest (JuliaGPU#1531) (@github-actions[bot])
- `ldiv!` for LU Decomposition (JuliaGPU#1532) (@SBuercklin)
- Lower dmax for contraction tests (JuliaGPU#1534) (@kshyatt)
- Fix convolution algorithm search (JuliaGPU#1536) (@maxfreu)
- Update manifest (JuliaGPU#1537) (@github-actions[bot])
- add specializations for some triangular-triangular multiplications (JuliaGPU#1538) (@Red-Portal)
- Add a utility to download artifacts without a functional driver. (JuliaGPU#1539) (@maleadt)
- Update manifest (JuliaGPU#1543) (@github-actions[bot])
- Explicit tests for type conversion (JuliaGPU#1544) (@kshyatt)
- Remove unused exports. (JuliaGPU#1545) (@maleadt)

v3.10.1

Toggle v3.10.1's commit message
## CUDA v3.10.1

[Diff since v3.10.0](JuliaGPU/CUDA.jl@v3.10.0...v3.10.1)


**Closed issues:**
- Overflow in `randn` using CUDA.jl's native RNG (JuliaGPU#1464)
- Segmentation fault with pre-compiled library importing CUDA (JuliaGPU#1465)
- Julia freezes when using Polynomials with CuArray (JuliaGPU#1497)
- Launch overhead regression (JuliaGPU#1503)
- CUSOLVER: Matrix division requires identical types (JuliaGPU#1512)
- Incorrect distribution for complex standard normals when using `CUDA.default_rng()` (JuliaGPU#1515)
- loggamma (JuliaGPU#1528)

**Merged pull requests:**
- CUSPARSE: Support mixed type mv (JuliaGPU#1475) (@Roger-luo)
- Add method for LinearAlgebra.opnorm2 (JuliaGPU#1516) (@danielwe)
- Promote to common eltype in matrix division (JuliaGPU#1517) (@danielwe)
- Fix Box-Muller transformation for complex eltypes (JuliaGPU#1518) (@danielwe)
- Update manifest (JuliaGPU#1521) (@github-actions[bot])
- Use at-dispose for LLVM.jl resource cleanup. (JuliaGPU#1523) (@maleadt)
- loggamma (JuliaGPU#1529) (@cossio)

v3.10.0

Toggle v3.10.0's commit message
## CUDA v3.10.0

[Diff since v3.9.1](JuliaGPU/CUDA.jl@v3.9.1...v3.10.0)


**Closed issues:**
- `Error while freeing DeviceBuffer`-warning when using multiple GPUs (JuliaGPU#1454)
- CUDNN cache locking prevents finalizers resulting in OOMs (JuliaGPU#1461)
- EOFError from pool_cleanup when closing REPL (JuliaGPU#1495)
- TypeError in compiler with custom kernel (JuliaGPU#1496)

**Merged pull requests:**
- expose sparse mv/mm algo selection (JuliaGPU#1201) (@Roger-luo)
- Always inspect the task-local context when verifying before freeing. (JuliaGPU#1462) (@maleadt)
- support sparse opnorm (JuliaGPU#1466) (@Roger-luo)
- Move CUSTATEVEC and CUTENSORNET into lib/ (JuliaGPU#1478) (@vchuravy)
- Adapt to GPUCompiler 0.15 changes (JuliaGPU#1488) (@maleadt)
- Limit time held by CUDNN locks. (JuliaGPU#1491) (@maleadt)
- Docstring for `cu` (JuliaGPU#1493) (@mcabbott)
- Update manifest (JuliaGPU#1499) (@github-actions[bot])
- Silence EOFError in pool_cleanup (JuliaGPU#1502) (@Octogonapus)
- Adapt to GPUCompiler changes (JuliaGPU#1504) (@maleadt)
- Fixes for CUSPARSE 11.7.1. (JuliaGPU#1505) (@maleadt)
- Update artifacts (JuliaGPU#1507) (@maleadt)
- Update manifest (JuliaGPU#1509) (@github-actions[bot])
- Add a new cache for HostKernel objects. (JuliaGPU#1510) (@maleadt)

v3.9.1

Toggle v3.9.1's commit message
## CUDA v3.9.1

[Diff since v3.9.0](JuliaGPU/CUDA.jl@v3.9.0...v3.9.1)


**Closed issues:**
- Issue with copy_cublasfloat (JuliaGPU#1476)
- Errors when broadcasting random number generators (JuliaGPU#1480)
- CPU version of linear algebra routine is dispatched when using `Zygote.gradient` (JuliaGPU#1481)
- `scan!` fails on vectors of structs (JuliaGPU#1482)
- InexactError when getting CUDA version info (JuliaGPU#1489)

**Merged pull requests:**
- Allow more integer argument types for byte_perm (JuliaGPU#1420) (@eschnett)
- support CuSparseMatrix(::Diagonal) (JuliaGPU#1470) (@Roger-luo)
- Don't emit debug info until the next CUDA version. (JuliaGPU#1473) (@maleadt)
- Update manifest (JuliaGPU#1474) (@github-actions[bot])
- Update manifest (JuliaGPU#1479) (@github-actions[bot])
- fix unsafe_wrap docstring and widen signature (JuliaGPU#1483) (@piever)
- Update manifest (JuliaGPU#1484) (@github-actions[bot])
- Check whether cudaRuntimeGetVersion succeeded. (JuliaGPU#1490) (@maleadt)
- Update manifest (JuliaGPU#1494) (@github-actions[bot])
- Fix JuliaGPU#1476: Allow any container in copy_cublasfloat (JuliaGPU#1498) (@danielwe)

v3.9.0

Toggle v3.9.0's commit message
## CUDA v3.9.0

[Diff since v3.8.5](JuliaGPU/CUDA.jl@v3.8.5...v3.9.0)


**Closed issues:**
- Tests for showing (JuliaGPU#35)
- Support LU factorizations (JuliaGPU#1193)
- Int8 WMMA not working in 3.8.4 and 3.8.5 despite merged PR. Add more unit tests? (JuliaGPU#1442)
- Optional CPU cpu kernel call with @cuda  (JuliaGPU#1443)
- Add library/artifact management for NCCL (JuliaGPU#1446)
- permutedims returns a lowertriangular matrix (JuliaGPU#1451)
- New broadcast corrupts memory? (JuliaGPU#1457)
- norm does not dispatch on CuSparseMatrixCSC  (JuliaGPU#1460)
- scalar * sparse multiplication (JuliaGPU#1468)

**Merged pull requests:**
- CUTENSOR: axpy! and axpby! not mutating fixed (JuliaGPU#1416) (@yapanuwan)
- Initial wrap of cuquantum (JuliaGPU#1437) (@kshyatt)
- CompatHelper: bump compat for "GPUCompiler" to "0.14" (JuliaGPU#1441) (@github-actions[bot])
- Fix return type of nrm2 for ComplexF16 (JuliaGPU#1444) (@danielwe)
- Use a build matrix. (JuliaGPU#1445) (@maleadt)
- Update manifest (JuliaGPU#1447) (@github-actions[bot])
- Rework factorizations (JuliaGPU#1449) (@maleadt)
- Add NCCL binaries. (JuliaGPU#1450) (@maleadt)
- Support general eltypes in matrix division and SVD (JuliaGPU#1453) (@danielwe)
- Update manifest (JuliaGPU#1456) (@github-actions[bot])
- Look at more environment variables to find nsys. (JuliaGPU#1459) (@maleadt)
- Fixes for 1.8 (JuliaGPU#1463) (@maleadt)

v3.8.5

Toggle v3.8.5's commit message
## CUDA v3.8.5

[Diff since v3.8.4](JuliaGPU/CUDA.jl@v3.8.4...v3.8.5)



**Merged pull requests:**
- Update manifest (JuliaGPU#1440) (@github-actions[bot])

v3.8.4

Toggle v3.8.4's commit message
## CUDA v3.8.4

[Diff since v3.8.3](JuliaGPU/CUDA.jl@v3.8.3...v3.8.4)


**Closed issues:**
- sparse-sparse and sparse-constant multiplication lose sparsity (output dense matrix) (JuliaGPU#1264)
- LLVMExtra fails to load on Julia 1.8 and PPC (JuliaGPU#1387)
- compute-sanitizer CUDA_ERROR_INVALID_VALUE on CUDA.jl 3.0+ (JuliaGPU#1415)
- `@cudnnDescriptor` is not threadsafe (JuliaGPU#1421)
- Precomplication of CUDA 3.8.3 broken on 1.7.1 due to changes in Random123.jl (JuliaGPU#1422)
- OOM error should include memory status (JuliaGPU#1427)
- WMMA kernel works with Julia 1.7.2 but fails with `illegal memory access` for Julia 1.8.0-beta1 (JuliaGPU#1431)
- Non Int64 local memory size leads to dynamic function invocation (JuliaGPU#1434)
- "initialization" test failing (JuliaGPU#1435)
- cuda with julia 1.8 not working on windows (working fine(?) on wsl2) (JuliaGPU#1436)

**Merged pull requests:**
- Add Int8 WMMA Support (JuliaGPU#1119) (@max-Hawkins)
- Wrap generic sparse-sparse GEMM (JuliaGPU#1285) (@kshyatt)
- Fix sparse COO to CSR conversion. (JuliaGPU#1412) (@maleadt)
- Drop support for CUDA 10.1 and below (JuliaGPU#1414) (@maleadt)
- Update manifest (JuliaGPU#1417) (@github-actions[bot])
- Report the OOM memory status at the time of the error. (JuliaGPU#1428) (@maleadt)
- Lock CUDNN descriptor cache lookups. (JuliaGPU#1430) (@maleadt)
- Switch to new LLVM context management for 1.9 compatibility. (JuliaGPU#1432) (@maleadt)
- Update manifest (JuliaGPU#1433) (@github-actions[bot])
- Backports for 3.8.4 (JuliaGPU#1438) (@maleadt)

v3.8.3

Toggle v3.8.3's commit message
## CUDA v3.8.3

[Diff since v3.8.2](JuliaGPU/CUDA.jl@v3.8.2...v3.8.3)


**Closed issues:**
- Sparse matrix addition not working (JuliaGPU#528)
- Native implementation of sparse arrays (JuliaGPU#829)
- CUSPARSE: Adding a value to the diagonal (JuliaGPU#1372)
- Conversion by `cu` casts Float64 to Float32 but not Int64 to Int32 (JuliaGPU#1388)
- `CUDA.math_mode!(...; precision)` option not working (JuliaGPU#1392)
- `cuIpcGetMemHandle` failure resulting in CUDA-aware MPI to fail (JuliaGPU#1398)
- axpby! support for BFloat16 (JuliaGPU#1399)
- CUSPARSE does not support integer matrices, breaks printing (JuliaGPU#1402)
- `sparse(I, J, V)` doesn't support unsorted inputs (JuliaGPU#1407)

**Merged pull requests:**
- General purpose broadcast for sparse CSR matrices. (JuliaGPU#1380) (@maleadt)
- Update manifest (JuliaGPU#1389) (@github-actions[bot])
- Implement sparse operations with UniformScaling using broadcast. (JuliaGPU#1390) (@maleadt)
- Prevent toplevel compilation. (JuliaGPU#1391) (@maleadt)
- Fix and test math precision. (JuliaGPU#1394) (@maleadt)
- Bump artifacts (JuliaGPU#1397) (@maleadt)
- support BFloat16 for atomic_cas (JuliaGPU#1400) (@bjarthur)
- Implement sparse broadcasting with CSC matrices. (JuliaGPU#1401) (@maleadt)
- Always report issues with discovering CUDA. (JuliaGPU#1404) (@maleadt)
- Fix sparse 1-argument broadcast output type. (JuliaGPU#1405) (@maleadt)
- CUSPARSE BSR improvements (JuliaGPU#1409) (@maleadt)
- Support limited sparse integer arrays by bitcasting to floating point. (JuliaGPU#1410) (@maleadt)
- Support using sparse with unsorted inputs. (JuliaGPU#1411) (@maleadt)
- Backports for 3.8.3 (JuliaGPU#1413) (@maleadt)

v3.8.2

Toggle v3.8.2's commit message
## CUDA v3.8.2

[Diff since v3.8.1](JuliaGPU/CUDA.jl@v3.8.1...v3.8.2)


**Closed issues:**
- CuSparseMatrixCSC missing lu and interactions with UniformScaling (JuliaGPU#79)
- CUSPARSE typo (JuliaGPU#1231)
- similar(A::CuSparse,eltype) returns an Array (JuliaGPU#1316)
- "errormonitor" undefined in julia1.6 (JuliaGPU#1375)
- Pool free can switch tasks (JuliaGPU#1384)

**Merged pull requests:**
- Define a compatibility shim for errormonitor (JuliaGPU#1378) (@vchuravy)
- Backport JuliaGPU#1361 to 3.8 (JuliaGPU#1379) (@vchuravy)
- Backports for 3.8.2 (JuliaGPU#1381) (@maleadt)
- Remove broken errormonitor implementation, just don't use it on 1.6. (JuliaGPU#1382) (@maleadt)
- Memory pool improvements (JuliaGPU#1383) (@maleadt)

v3.8.1

Toggle v3.8.1's commit message
## CUDA v3.8.1

[Diff since v3.8.0](JuliaGPU/CUDA.jl@v3.8.0...v3.8.1)


**Closed issues:**
- `one(::CuMatrix)` result on cpu (JuliaGPU#142)
- Broadcasted setindex! triggers scalar setindex! (JuliaGPU#101)
- OutOfGPUMemoryError With Available Memory (JuliaGPU#1346)
- Distributions.jl with CuArrays (JuliaGPU#1347)
- Views of Flux OneHotArrays (JuliaGPU#1349)
- synchronize(blocking = false) hangs in julia 1.7 eventually (JuliaGPU#1350)
- unsupported call through a literal pointer (call to log1pf) on Julia 1.6.5 (JuliaGPU#1352)
- SpecialFunctions ^1.8 compat entry? (JuliaGPU#1354)
- Performance deprecation using `^` on Float32 (JuliaGPU#1358)
- Method definition setindex!(LinearAlgebra.Diagonal{T, V} ... overwritten in module CUDA (JuliaGPU#1364)
- [PackageCompiler] Segmentation fault with CUDA.jl in multiversioning  (JuliaGPU#1365)
- Vectors in customary structs make julia stuck (JuliaGPU#1366)
- sparseCSC-dense matrix multiplication yields unstable results (JuliaGPU#1368)
- UndefVarError: parameters not defined on Windows10 (JuliaGPU#1371)

**Merged pull requests:**
- Optimize memoization helpers. (JuliaGPU#1345) (@maleadt)
- Update manifest (JuliaGPU#1348) (@github-actions[bot])
- Update manifest (JuliaGPU#1355) (@github-actions[bot])
- Fastmath improvements (JuliaGPU#1356) (@maleadt)
- Make the default pool visible when doing P2P (JuliaGPU#1357) (@maleadt)
- Fix resize of empty arrays. (JuliaGPU#1359) (@maleadt)
- CUSPARSE: add COO ctors and similar with eltype. (JuliaGPU#1360) (@maleadt)
- Add device_override for SpecialFunctions.gamma (JuliaGPU#1361) (@vchuravy)
- Implement (limited) broadcast of sparse arrays (JuliaGPU#1367) (@maleadt)
- Make nonblocking synchronization robust to errors. (JuliaGPU#1369) (@maleadt)
- Update manifest (JuliaGPU#1370) (@github-actions[bot])
- Backports for 3.8.1 (JuliaGPU#1374) (@maleadt)