-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Insights: triton-lang/triton
Overview
Could not load contribution data
Please try again later
34 Pull requests merged by 19 people
-
[BACKEND] Revert smem layout heuristic added in PR#5924
#5983 merged
Feb 21, 2025 -
[LLVM] Bump to llvm/llvm-project@c78cb3028363
#5981 merged
Feb 21, 2025 -
[Blackwell] Fallback to MMAv2 for numWarps other than 4 or 8
#5978 merged
Feb 21, 2025 -
[AMD] Rework MFMA intrinsic mapping queries
#5937 merged
Feb 20, 2025 -
Propagate DotOp thru Join & improve shmem load into LinearEnc
#5924 merged
Feb 20, 2025 -
[Backend] Try to fix infinite loop in membar
#5973 merged
Feb 20, 2025 -
[Analysis] Use
verify-diagnostics
for print-based tests (NFC)#5970 merged
Feb 20, 2025 -
[BACKEND] Enable generic reduction on all layouts
#5962 merged
Feb 20, 2025 -
[AMD]Support Scale is None in DotScaledOp in gfx950
#5931 merged
Feb 19, 2025 -
[AMD] Update smem size for cdna4
#5964 merged
Feb 19, 2025 -
[BACKEND] enable lld
#5907 merged
Feb 19, 2025 -
Remove git commit hash in wheel name when building from release branch
#5953 merged
Feb 19, 2025 -
[Frontend] Support returning tensor descriptor from functions
#5958 merged
Feb 19, 2025 -
[AMD]Make AMDGPUAccelerateMatmul depend on TritonAMDGPUDialect
#5959 merged
Feb 18, 2025 -
[NFC][BACKEND] Simplify reduce helpers
#5954 merged
Feb 18, 2025 -
[AMD]Fix an error in the cache modifier bit setting
#5948 merged
Feb 18, 2025 -
[AMD] Skip scalar and 1D tensor load for sinkSecondLoad
#5955 merged
Feb 18, 2025 -
[AMD] Fix loop trip count for scf.while in ConvertToBufferOps
#5952 merged
Feb 18, 2025 -
[TritonGPU] Fix crash in Accelerate matmul
#5949 merged
Feb 18, 2025 -
[BACKEND] Fix dereference nullptr
#5944 merged
Feb 17, 2025 -
[AMD] Revert using llvm.intr.masked.{load|store}
#5913 merged
Feb 17, 2025 -
[AMD-Pipeline] Add multi-stage global/local prefetch
#5353 merged
Feb 17, 2025 -
[Blackwell] Fix
test_pipeliner.py
breakage#5940 merged
Feb 17, 2025 -
[FRONTEND] Cache and annotate the
TRITON_F32_DEFAULT
env variable#5942 merged
Feb 17, 2025 -
[AMD] Fix failing tests due to mid-air collision
#5943 merged
Feb 17, 2025 -
[AMD] Improve ConvertToBufferOps with range analysis
#5563 merged
Feb 16, 2025 -
[NFC] Remove duplicate test parameters for
test_dot
#5938 merged
Feb 16, 2025 -
Fix incorrect kernel compilation in batched matmul (#5620)
#5936 merged
Feb 16, 2025 -
[BACKEND] Add arith::CeilFloorDivExpandOpsPatterns
#5934 merged
Feb 16, 2025 -
[AMD] Add MLIR Remark Messages when the Ping Pong Scheduler Succeeds
#5914 merged
Feb 16, 2025 -
[LAYOUTS] Make operator* associative and dimension-order-preserving
#5928 merged
Feb 15, 2025 -
[AMD] Fix buffer cache modifier test index out of range
#5904 merged
Feb 15, 2025 -
[FRONTEND] Fix default values of
tl.range
#5932 merged
Feb 14, 2025
16 Pull requests opened by 11 people
-
[Blackwell] Support narrower TMEM messages and shapes
#5945 opened
Feb 17, 2025 -
[Blackwell] Propagate TMA attributes from MMA operand
#5947 opened
Feb 18, 2025 -
typeConverter to llvm support addressSpace attribute
#5951 opened
Feb 18, 2025 -
[PROTON-DEV] proton dialect to protongpu dialect lowering
#5956 opened
Feb 18, 2025 -
[python][compiler] Implement CompilationListener to report compile times
#5957 opened
Feb 18, 2025 -
[AMD] Turn buffer ops support on by default
#5960 opened
Feb 18, 2025 -
[FRONTEND] [BC Breaking] Require global variables to be insantiated as constexpr ob…
#5961 opened
Feb 19, 2025 -
[Backend] Plumb `ttg.warp_specialize` through LLVM lowering
#5963 opened
Feb 19, 2025 -
[AMD] replace `rocm_lld` with lld API call
#5966 opened
Feb 19, 2025 -
[WIP][DNR] Codegen for `ttg.warp_specialize`
#5968 opened
Feb 20, 2025 -
[AMD] [DEBUG] Added LLVM Debug messages for when the pingpong scheduler fails
#5975 opened
Feb 20, 2025 -
[AMD][NFC] refactor RangeAnalysis
#5977 opened
Feb 20, 2025 -
[AMD] Remove non-linear-layout-based local load pattern
#5979 opened
Feb 21, 2025 -
[LAYOUTS] Allow DistributedEncoding attributes to override get[Total]ElemsPerThread()
#5980 opened
Feb 21, 2025 -
[Interface] Add dot interface methods to get A/B tensor
#5984 opened
Feb 21, 2025
3 Issues closed by 3 people
-
Setting `TRITON_F32_DEFAULT` does not trigger recompilation
#5941 closed
Feb 17, 2025 -
I couldn't find this package in PIP... why...
#5935 closed
Feb 15, 2025
9 Issues opened by 9 people
-
Addition Incorrect
#5972 opened
Feb 20, 2025 -
Bug in tutorials/06-fused-attention.py: test_op assertion fails for specific input.
#5971 opened
Feb 20, 2025 -
Nightly install
#5967 opened
Feb 19, 2025 -
errors introduced by scalars in Interpreter mode
#5965 opened
Feb 19, 2025 -
Dose Triton supports new features of Blackwell for RTX5090 and 5080?
#5950 opened
Feb 18, 2025 -
Cache Modifier '.cs' Not Supported for LOAD
#5946 opened
Feb 18, 2025 -
Triton kernel not compiling with multiple threads and GPUs
#5933 opened
Feb 15, 2025 -
Upstream LLVM SLP vectorizer change requires the correct triple
#5930 opened
Feb 14, 2025 -
fatal : Unsupported .version 8.6; current version is '8.5'
#5929 opened
Feb 14, 2025
22 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[AMD] [FrontEnd] Optimize is_within_2gb and only enable with buffer ops supported
#5898 commented on
Feb 18, 2025 • 8 new comments -
[AMD][Atomic] Fix fp16 atomic operation
#5839 commented on
Feb 20, 2025 • 2 new comments -
Try Blackwell CI
#5922 commented on
Feb 20, 2025 • 0 new comments -
[PIPELINE] Refactor loop lowering.
#5918 commented on
Feb 14, 2025 • 0 new comments -
cache: add the triton version to the json metadata
#5912 commented on
Feb 20, 2025 • 0 new comments -
Protect autotuner with synchronization
#5893 commented on
Feb 15, 2025 • 0 new comments -
Add triton 3.13t builds - DO NOT Merge
#5455 commented on
Feb 18, 2025 • 0 new comments -
[NVIDIA][Backend] fix the wrong comment
#5305 commented on
Feb 15, 2025 • 0 new comments -
Is Triton unable to install in python 3.10 versions?
#1057 commented on
Feb 21, 2025 • 0 new comments -
IndexError: map::at with RTX 2080Ti
#4813 commented on
Feb 20, 2025 • 0 new comments -
FP8 GEMM implemented with triton is slower on Ada (SM89)
#5583 commented on
Feb 20, 2025 • 0 new comments -
Gather does not work if index is much longer than value
#5836 commented on
Feb 20, 2025 • 0 new comments -
SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats
#5919 commented on
Feb 19, 2025 • 0 new comments -
`tl.cumsum(i1)` computes `tl.cumsum_xor`
#5897 commented on
Feb 19, 2025 • 0 new comments -
No wheels for arm
#5561 commented on
Feb 19, 2025 • 0 new comments -
Wrong source read by triton.jit in some python versions
#1589 commented on
Feb 19, 2025 • 0 new comments -
Triton import is broken in Python 3.7 in triton 2.0.0.post1
#1727 commented on
Feb 18, 2025 • 0 new comments -
Potential Bug in **_attn_fwd_tma** Function
#5816 commented on
Feb 17, 2025 • 0 new comments -
Accessing slices of a tensor
#656 commented on
Feb 17, 2025 • 0 new comments -
Questions about the tutorial fused-attention
#3700 commented on
Feb 15, 2025 • 0 new comments -
Back-to-back BMMs failed with Triton nightly
#5424 commented on
Feb 15, 2025 • 0 new comments -
Is there a plan to support Windows?
#1640 commented on
Feb 14, 2025 • 0 new comments