Pulse · triton-lang/triton · GitHub

February 14, 2025 – February 21, 2025

Overview

50 Active pull requests

12 Active issues

34 Pull requests merged by 19 people

[BACKEND] Revert smem layout heuristic added in PR#5924
#5983 merged Feb 21, 2025
[LLVM] Bump to llvm/llvm-project@c78cb3028363
#5981 merged Feb 21, 2025
[Blackwell] Fallback to MMAv2 for numWarps other than 4 or 8
#5978 merged Feb 21, 2025
[AMD] Rework MFMA intrinsic mapping queries
#5937 merged Feb 20, 2025
Propagate DotOp thru Join & improve shmem load into LinearEnc
#5924 merged Feb 20, 2025
[Backend] Try to fix infinite loop in membar
#5973 merged Feb 20, 2025
[Analysis] Use verify-diagnostics for print-based tests (NFC)
#5970 merged Feb 20, 2025
[BACKEND] Enable generic reduction on all layouts
#5962 merged Feb 20, 2025
[AMD]Support Scale is None in DotScaledOp in gfx950
#5931 merged Feb 19, 2025
[AMD] Update smem size for cdna4
#5964 merged Feb 19, 2025
[BACKEND] enable lld
#5907 merged Feb 19, 2025
Remove git commit hash in wheel name when building from release branch
#5953 merged Feb 19, 2025
[Frontend] Support returning tensor descriptor from functions
#5958 merged Feb 19, 2025
[AMD]Make AMDGPUAccelerateMatmul depend on TritonAMDGPUDialect
#5959 merged Feb 18, 2025
[NFC][BACKEND] Simplify reduce helpers
#5954 merged Feb 18, 2025
[AMD]Fix an error in the cache modifier bit setting
#5948 merged Feb 18, 2025
[AMD] Skip scalar and 1D tensor load for sinkSecondLoad
#5955 merged Feb 18, 2025
[AMD] Fix loop trip count for scf.while in ConvertToBufferOps
#5952 merged Feb 18, 2025
[RELAND] [BC breaking] [FRONTEND] Throw an error when we would downcast an integral constant to a dtype it does not fit in (#5866)
#5926 merged Feb 18, 2025
[TritonGPU] Fix crash in Accelerate matmul
#5949 merged Feb 18, 2025
[BACKEND] Fix dereference nullptr
#5944 merged Feb 17, 2025
[AMD] Revert using llvm.intr.masked.{load|store}
#5913 merged Feb 17, 2025
[AMD-Pipeline] Add multi-stage global/local prefetch
#5353 merged Feb 17, 2025
[Blackwell] Fix test_pipeliner.py breakage
#5940 merged Feb 17, 2025
[FRONTEND] Cache and annotate the TRITON_F32_DEFAULT env variable
#5942 merged Feb 17, 2025
[AMD] Fix failing tests due to mid-air collision
#5943 merged Feb 17, 2025
[AMD] Improve ConvertToBufferOps with range analysis
#5563 merged Feb 16, 2025
[NFC] Remove duplicate test parameters for test_dot
#5938 merged Feb 16, 2025
Fix incorrect kernel compilation in batched matmul (#5620)
#5936 merged Feb 16, 2025
[BACKEND] Add arith::CeilFloorDivExpandOpsPatterns
#5934 merged Feb 16, 2025
[AMD] Add MLIR Remark Messages when the Ping Pong Scheduler Succeeds
#5914 merged Feb 16, 2025
[LAYOUTS] Make operator* associative and dimension-order-preserving
#5928 merged Feb 15, 2025
[AMD] Fix buffer cache modifier test index out of range
#5904 merged Feb 15, 2025
[FRONTEND] Fix default values of tl.range
#5932 merged Feb 14, 2025

16 Pull requests opened by 11 people

[Blackwell] Support narrower TMEM messages and shapes
#5945 opened Feb 17, 2025
[Blackwell] Propagate TMA attributes from MMA operand
#5947 opened Feb 18, 2025
typeConverter to llvm support addressSpace attribute
#5951 opened Feb 18, 2025
[PROTON-DEV] proton dialect to protongpu dialect lowering
#5956 opened Feb 18, 2025
[python][compiler] Implement CompilationListener to report compile times
#5957 opened Feb 18, 2025
[AMD] Turn buffer ops support on by default
#5960 opened Feb 18, 2025
[FRONTEND] [BC Breaking] Require global variables to be insantiated as constexpr ob…
#5961 opened Feb 19, 2025
[Backend] Plumb `ttg.warp_specialize` through LLVM lowering
#5963 opened Feb 19, 2025
[AMD] replace `rocm_lld` with lld API call
#5966 opened Feb 19, 2025
[WIP][DNR] Codegen for `ttg.warp_specialize`
#5968 opened Feb 20, 2025
[backend] Update LLVM version to https://github.com/llvm/llvm-project/commit/386af4a5c64ab75eaee2448dc38f2e34a40bfed0
#5974 opened Feb 20, 2025
[AMD] [DEBUG] Added LLVM Debug messages for when the pingpong scheduler fails
#5975 opened Feb 20, 2025
[AMD][NFC] refactor RangeAnalysis
#5977 opened Feb 20, 2025
[AMD] Remove non-linear-layout-based local load pattern
#5979 opened Feb 21, 2025
[LAYOUTS] Allow DistributedEncoding attributes to override get[Total]ElemsPerThread()
#5980 opened Feb 21, 2025
[Interface] Add dot interface methods to get A/B tensor
#5984 opened Feb 21, 2025

3 Issues closed by 3 people

Setting `TRITON_F32_DEFAULT` does not trigger recompilation
#5941 closed Feb 17, 2025
tl.dot with batched (3D) input is working only in emulation mode when os.environ['TRITON_INTERPRET'] = '1' is set.
#5620 closed Feb 16, 2025
I couldn't find this package in PIP... why...
#5935 closed Feb 15, 2025

9 Issues opened by 9 people

Addition Incorrect
#5972 opened Feb 20, 2025
Bug in tutorials/06-fused-attention.py: test_op assertion fails for specific input.
#5971 opened Feb 20, 2025
Nightly install
#5967 opened Feb 19, 2025
errors introduced by scalars in Interpreter mode
#5965 opened Feb 19, 2025
Dose Triton supports new features of Blackwell for RTX5090 and 5080?
#5950 opened Feb 18, 2025
Cache Modifier '.cs' Not Supported for LOAD
#5946 opened Feb 18, 2025
Triton kernel not compiling with multiple threads and GPUs
#5933 opened Feb 15, 2025
Upstream LLVM SLP vectorizer change requires the correct triple
#5930 opened Feb 14, 2025
fatal : Unsupported .version 8.6; current version is '8.5'
#5929 opened Feb 14, 2025

22 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[AMD] [FrontEnd] Optimize is_within_2gb and only enable with buffer ops supported
#5898 commented on Feb 18, 2025 • 8 new comments
[AMD][Atomic] Fix fp16 atomic operation
#5839 commented on Feb 20, 2025 • 2 new comments
Try Blackwell CI
#5922 commented on Feb 20, 2025 • 0 new comments
[PIPELINE] Refactor loop lowering.
#5918 commented on Feb 14, 2025 • 0 new comments
cache: add the triton version to the json metadata
#5912 commented on Feb 20, 2025 • 0 new comments
Protect autotuner with synchronization
#5893 commented on Feb 15, 2025 • 0 new comments
Add triton 3.13t builds - DO NOT Merge
#5455 commented on Feb 18, 2025 • 0 new comments
[NVIDIA][Backend] fix the wrong comment
#5305 commented on Feb 15, 2025 • 0 new comments
Is Triton unable to install in python 3.10 versions?
#1057 commented on Feb 21, 2025 • 0 new comments
IndexError: map::at with RTX 2080Ti
#4813 commented on Feb 20, 2025 • 0 new comments
FP8 GEMM implemented with triton is slower on Ada (SM89)
#5583 commented on Feb 20, 2025 • 0 new comments
Gather does not work if index is much longer than value
#5836 commented on Feb 20, 2025 • 0 new comments
SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats
#5919 commented on Feb 19, 2025 • 0 new comments
`tl.cumsum(i1)` computes `tl.cumsum_xor`
#5897 commented on Feb 19, 2025 • 0 new comments
No wheels for arm
#5561 commented on Feb 19, 2025 • 0 new comments
Wrong source read by triton.jit in some python versions
#1589 commented on Feb 19, 2025 • 0 new comments
Triton import is broken in Python 3.7 in triton 2.0.0.post1
#1727 commented on Feb 18, 2025 • 0 new comments
Potential Bug in **_attn_fwd_tma** Function
#5816 commented on Feb 17, 2025 • 0 new comments
Accessing slices of a tensor
#656 commented on Feb 17, 2025 • 0 new comments
Questions about the tutorial fused-attention
#3700 commented on Feb 15, 2025 • 0 new comments
Back-to-back BMMs failed with Triton nightly
#5424 commented on Feb 15, 2025 • 0 new comments
Is there a plan to support Windows?
#1640 commented on Feb 14, 2025 • 0 new comments