-
Notifications
You must be signed in to change notification settings - Fork 102
Insights: tenstorrent/tt-metal
Overview
Could not load contribution data
Please try again later
4 Releases published by 1 person
-
v0.56.0-rc10
published
Feb 5, 2025 -
v0.56.0-rc16
published
Feb 8, 2025 -
v0.56.0-rc21
published
Feb 12, 2025 -
v0.56.0-rc24
published
Feb 12, 2025
127 Pull requests merged by 52 people
-
#17768: Documentation update for Batch Normalization
#17818 merged
Feb 12, 2025 -
Fix the version tag in python wheel
#17830 merged
Feb 12, 2025 -
#17731: generate gtest testcase xml and upload as artifacts during cpp/sd unit test workflows
#17732 merged
Feb 12, 2025 -
Add HF_MODEL to load models directly from huggingface
#17801 merged
Feb 12, 2025 -
[skip ci] Update t3000-nightly-tests-impl.yaml
#17778 merged
Feb 11, 2025 -
[skip ci] Update metal-api-surface workflow
#17823 merged
Feb 11, 2025 -
#17433: Part 1 of Versioned Documentation PR - Checking links
#17810 merged
Feb 11, 2025 -
#17811: Change job_success criteria so skipped jobs are not failing jobs
#17819 merged
Feb 11, 2025 -
#0: Make DispatchQueryManager::get_dispatch_core thread-safe
#17816 merged
Feb 11, 2025 -
LightMetal - SetRuntimeArgsUint32VecPerCore Trace + Replay support (some TTNN ops use) (Issue #17779)
#17780 merged
Feb 11, 2025 -
Fix non-deterministic hangs caused by MeshDevice trace replay
#17696 merged
Feb 11, 2025 -
Refactoring same definitions
#17747 merged
Feb 11, 2025 -
LightMetal - Store Program obj by id instead of ptr at capture time (Issue #17761)
#17762 merged
Feb 11, 2025 -
Additional EDM fabric optimizations (mix of low level and experimental flow control protocol trimming)
#17749 merged
Feb 10, 2025 -
All gather async llama ci
#17746 merged
Feb 10, 2025 -
[skip ci] Fix L2 workflow and add matmul nightly tests
#17802 merged
Feb 10, 2025 -
#17060: Flip TT_ASSERT to TT_FATAL for sharding validation
#17799 merged
Feb 10, 2025 -
Add more async slots for dispatch
#17742 merged
Feb 10, 2025 -
Update perf and latest features for llm models (Feb 10)
#17798 merged
Feb 10, 2025 -
Disable
ShardOrientation.COL_MAJOR
test cases forttnn.upsample
#17796 merged
Feb 10, 2025 -
Split
command_queue_interface.hpp
into header and implementation#17789 merged
Feb 10, 2025 -
Add Mistral-Small-24B-Instruct-2501 support
#17794 merged
Feb 10, 2025 -
#14596: new sfpi release
#17602 merged
Feb 10, 2025 -
[skip ci] Show All Post Commit Status Badge from main on README.md
#17783 merged
Feb 10, 2025 -
#17134: Add remaining SD unit tests
#17736 merged
Feb 10, 2025 -
Fix incorrect tracer error when fast runtime mode is enabled.
#17776 merged
Feb 10, 2025 -
#17758: Update Batch Norm Inference mode kernels
#17733 merged
Feb 10, 2025 -
[Fabric] Support for routing planes
#17777 merged
Feb 10, 2025 -
#17559: Update logit op
#17586 merged
Feb 10, 2025 -
[skip ci] Update README.md
#17716 merged
Feb 9, 2025 -
[UMD] Change logical to translated mapping to new API
#17674 merged
Feb 9, 2025 -
#17768: Float32 support for Inference mode in Batch Norm
#17587 merged
Feb 9, 2025 -
Quick fix for single card device perf
#17752 merged
Feb 8, 2025 -
#17737: move matmul sd tests to nightly and adjust matmul test dimens…
#17743 merged
Feb 8, 2025 -
#15496 Change Tensor serialization to serialize TensorSpec with flatbuffer
#17748 merged
Feb 8, 2025 -
Update get_dispatch_core() for unused TG MMIO dispatch cores
#17756 merged
Feb 8, 2025 -
[TT-Train] Updated cmake for tt_stl
#17753 merged
Feb 8, 2025 -
Automatically generate an overload w/o QueueId
#17640 merged
Feb 8, 2025 -
#16733: binary pow sfpu operation
#17228 merged
Feb 8, 2025 -
[Old-llama70b-vLLM] Remove 2x4 device assertion since t3k mesh now opens with 1x8
#17740 merged
Feb 7, 2025 -
#10718: Fix produce_data workflow crash when job log not found
#17738 merged
Feb 7, 2025 -
Fix undefined QueueId in ttnn events
#17739 merged
Feb 7, 2025 -
Run clang-tidy scan in a single container
#17734 merged
Feb 7, 2025 -
Use the same linker preference in all toolchains
#17735 merged
Feb 7, 2025 -
[UMD] Remove virtual_to_umd_coord_mapping_
#17678 merged
Feb 7, 2025 -
#0: skip credit handshake when no words have been received.
#17691 merged
Feb 7, 2025 -
#0: Fix failing Llama TG tests by preserving old behavior for ShardTensor
#17693 merged
Feb 7, 2025 -
Make QueueId a strong type
#17637 merged
Feb 7, 2025 -
#17128 Advanced programming example vecadd_multi_core
#17129 merged
Feb 7, 2025 -
Multi MeshCQ and MeshEvents API Bringup
#17582 merged
Feb 7, 2025 -
#0: Fix Llama3 RoPE eager test regression
#17728 merged
Feb 7, 2025 -
#17246: Fixing invalid test in ccl
#17727 merged
Feb 7, 2025 -
[UMD] Use new CoreCoord api for eth cores
#17642 merged
Feb 7, 2025 -
#17134: Add SD cross attn down block ut
#17712 merged
Feb 7, 2025 -
#0: remove duplicate header
#17722 merged
Feb 7, 2025 -
#0: fix golden functions for conv and matmul
#17592 merged
Feb 7, 2025 -
optimize edm fabric packet header structure
#17579 merged
Feb 7, 2025 -
#0: Update SD device perf margin to match other models
#17658 merged
Feb 7, 2025 -
Removed workaround for blackhole alignment
#17710 merged
Feb 7, 2025 -
Update README.md
#17715 merged
Feb 7, 2025 -
Add HF model support inc. DS-R1-Distill, Qwen needs yarn support
#17421 merged
Feb 7, 2025 -
Add torch tensor cache to conv2d unit tests to speedup test execution
#17708 merged
Feb 7, 2025 -
#17134: Add SD down block unit test
#17653 merged
Feb 7, 2025 -
#6539: (#7749 #3176 #4514 #5145 #3601 #3602 #6947) Fix multiple unit and sweep tests
#16850 merged
Feb 7, 2025 -
[TT-Train] fp32 turned off for softmax
#17683 merged
Feb 7, 2025 -
#0: Fix ordering of devices after reshape applied on MeshDevice
#17646 merged
Feb 7, 2025 -
#17477: Move SmallVector and ShapeBase to re-use within Metal
#17669 merged
Feb 7, 2025 -
Bump min CMake version
#17680 merged
Feb 7, 2025 -
#0: Fix ActiveEthTestWatcherSanitizeMailboxWrite after recent validat…
#17697 merged
Feb 7, 2025 -
Update sweeps setup for i2s/s2i and also rule out cases which lead to divide by 0.
#17611 merged
Feb 7, 2025 -
Add support for fabric 1D mcast
#17498 merged
Feb 7, 2025 -
These deps are required for TT-Train
#17690 merged
Feb 7, 2025 -
[skip ci] Add new nightly testing workflow
#17695 merged
Feb 7, 2025 -
#17215: Add write/read APIs for TTNN tensors allocated on mesh buffer
#17513 merged
Feb 7, 2025 -
Remove old trace_tests from tt_eager
#17626 merged
Feb 6, 2025 -
[tt-train] Add patch command in CPM
#17689 merged
Feb 6, 2025 -
Enable transaction IDs on the write side (receiver channel) of EDM fabric kernel
#17631 merged
Feb 6, 2025 -
Eth ubenchmark with bi-dir plus transaction id
#17635 merged
Feb 6, 2025 -
Split header and implementation for dispatch_core_manager.hpp
#17632 merged
Feb 6, 2025 -
#0: Fix noc sanitize mailbox validation for idle erisc
#17652 merged
Feb 6, 2025 -
#0: Add test_pgm_dispatch tests with long-running kernels.
#17608 merged
Feb 6, 2025 -
#16935: Added unary broadcast api
#17479 merged
Feb 6, 2025 -
#17623: Delete artifact when one of the steps fails.
#17666 merged
Feb 6, 2025 -
fixed 3 link all gather sdpa specialized shape and added workaround for nd pcc
#17670 merged
Feb 6, 2025 -
Hopefully fix tt-train build flow
#17644 merged
Feb 6, 2025 -
#17375: Adding sharded addrgen to ccl folders
#17052 merged
Feb 6, 2025 -
Run the entire build job in a container
#17634 merged
Feb 6, 2025 -
Afuller/cut out umd headers from jit build
#17663 merged
Feb 6, 2025 -
hot_fix: fix invalid test config and delete stale/invalid kernel asserts
#17665 merged
Feb 6, 2025 -
produce_data workflow: only call gh api for the current attempt and not attempts before it
#17657 merged
Feb 6, 2025 -
#0: Remove large shapes from single device trace tests to cut down APC time
#17606 merged
Feb 6, 2025 -
Ncvetkovic/17132 max pool then pack untilize
#17486 merged
Feb 6, 2025 -
Add new buckets for common infra errors
#17612 merged
Feb 6, 2025 -
#17134: Fix SD cross attn upblock unit test
#17604 merged
Feb 6, 2025 -
Reduce amount of conv2d test
#17610 merged
Feb 6, 2025 -
Remove shared ptrs from async reduce scatter
#17609 merged
Feb 6, 2025 -
[Fabric] Add controller kernel to sync the producers
#17629 merged
Feb 6, 2025 -
Remove fetch_boost.cmake
#17336 merged
Feb 6, 2025 -
#13628: Update logit logic
#17467 merged
Feb 6, 2025 -
Use a hardcoded name because GHA doesn't evaluate variables when it s…
#17636 merged
Feb 6, 2025 -
[skip ci] Add flatbuffers codeowners
#17468 merged
Feb 6, 2025 -
#0: Fix act_c_num_blocks calculation for Conv2D Width Sharded
#17585 merged
Feb 6, 2025 -
Uniquely identify this job so we can enforce it
#17624 merged
Feb 6, 2025 -
#0: Rename overloads
Tensor::to
toTensor::to_layout
/Tensor::to_device
#17519 merged
Feb 5, 2025 -
Adding Expand and Re-architect Repeat
#17270 merged
Feb 5, 2025 -
LightMetal - Initial Replay infra/library for LightMetalBinary and standalone runner (#17039)
#17524 merged
Feb 5, 2025 -
Remove update_dispatch_cores_for_multi_cq_eth_dispatch() from device
#17574 merged
Feb 5, 2025 -
Add support for parallelization along the width for untilize with unpadding
#17538 merged
Feb 5, 2025 -
[skip ci] Delete models/bringup_testing/Tutorial_Adding_a_Model.md
#17617 merged
Feb 5, 2025 -
[skip ci] Update README.md, add SD3.5_m_512
#17621 merged
Feb 5, 2025 -
Build Wheel In the same job as the C++
#17581 merged
Feb 5, 2025 -
Revamp Dockerfiles
#17403 merged
Feb 5, 2025 -
#15450: Remove default values from circular buffer parameters in LLK compute APIs: Eltwise Binary
#16501 merged
Feb 5, 2025 -
Minimalistic All Gather Async
#17007 merged
Feb 5, 2025 -
Add per core kernel stats and first start to last start
#17591 merged
Feb 5, 2025 -
LightMetal - Initial Host API + Device Capture infra/library and unit tests (#17039)
#17514 merged
Feb 5, 2025 -
Use hw auto increment registers to keep track of available buffer space.
#17580 merged
Feb 5, 2025 -
BH post commit profiler regression
#17550 merged
Feb 5, 2025 -
[tt-train] Enable tensor parallel for MNIST
#17506 merged
Feb 5, 2025 -
Tag Docker images with sha1
#17516 merged
Feb 5, 2025 -
#13609: Uplift dram and l1 allocators to use dram/l1 specific alignment
#17122 merged
Feb 5, 2025 -
Resnet50 update for Blackhole + workarounds
#17058 merged
Feb 5, 2025 -
Fix regression from commit #5444b3c8
#17596 merged
Feb 5, 2025 -
Handle case where performance report only has unary ops
#17451 merged
Feb 5, 2025 -
#13541: Conv2d use L1 estimates for auto-sharding
#17485 merged
Feb 5, 2025 -
[UMD] Move to .get_cores API
#17543 merged
Feb 5, 2025
59 Pull requests opened by 41 people
-
[sweep] tt smi improvement
#17593 opened
Feb 5, 2025 -
Implement tiled concatenation for height sharded tensors
#17614 opened
Feb 5, 2025 -
Slice improvements
#17615 opened
Feb 5, 2025 -
[UMD] Switch pci_cores and dram_cores to CoreCoord api.
#17620 opened
Feb 5, 2025 -
Use CPM_USE_LOCAL_PACKAGES to get dependencies from Docker container
#17627 opened
Feb 5, 2025 -
[TT-Train] Clip norm fix for ddp
#17628 opened
Feb 5, 2025 -
Stop copying packet header when contiguous
#17630 opened
Feb 5, 2025 -
[UMD] Remove usage of outdated UMD apis
#17645 opened
Feb 6, 2025 -
Use weight_width_sliced boolean to determine if Block Sharded
#17649 opened
Feb 6, 2025 -
Extend llama sharded all gather for LN
#17650 opened
Feb 6, 2025 -
#0: Add trace_2cqs performant for SqueezeBERT model
#17655 opened
Feb 6, 2025 -
Add ttnn support for Yolov8x on N300 with Demo.
#17659 opened
Feb 6, 2025 -
Replace List Mesh to Tensor
#17667 opened
Feb 6, 2025 -
[tt-train] Fix timing of grad sync and clipping in NanoGPT in the presence of grad accumulation
#17672 opened
Feb 6, 2025 -
#13385: ttnn.round direct kernel implementation
#17673 opened
Feb 6, 2025 -
#8865: Update changed ttnn ops in dispatch time profiling infra
#17675 opened
Feb 6, 2025 -
Aliu/integrate fabric to metal
#17676 opened
Feb 6, 2025 -
Add support for page size > max prefetch cmd size for interleaved buffers
#17677 opened
Feb 6, 2025 -
#17682 Improve eltwise binary ng test coverage
#17684 opened
Feb 6, 2025 -
#17167: Remove build APIs from device
#17685 opened
Feb 6, 2025 -
#16364: update DispatchMemMap address calculation
#17688 opened
Feb 6, 2025 -
#17094: fill implicit pad sharded using the new shardedAddrGen
#17692 opened
Feb 6, 2025 -
First package (TT-Metalium runtime)
#17694 opened
Feb 7, 2025 -
SD3.5-medium-512-spacelike model
#17698 opened
Feb 7, 2025 -
[UMD] Remove a couple of leftover usages of old soc descriptor API
#17707 opened
Feb 7, 2025 -
Add split scatter implementation
#17709 opened
Feb 7, 2025 -
Lower aligment requirements for shallow conv2d
#17711 opened
Feb 7, 2025 -
#17679: Remove conv tt eager tests
#17717 opened
Feb 7, 2025 -
Update BERT Tiny device perf metrics
#17719 opened
Feb 7, 2025 -
Replacing L1 base address increment instructions with CFGSHIFTMASK
#17723 opened
Feb 7, 2025 -
feat: hypot device op support
#17726 opened
Feb 7, 2025 -
[skip ci] Reduce number of builds per APC job
#17744 opened
Feb 7, 2025 -
#17477: Introduce ND coordinate system for TT-distributed
#17745 opened
Feb 7, 2025 -
Disable test_data_parallel_falcon_mlp till it's be fixed
#17750 opened
Feb 8, 2025 -
Add support for both 20.04 and 22.04 in package and release workflow
#17755 opened
Feb 8, 2025 -
#0: Disable ttnn::experimental::view path for multi device storage types
#17760 opened
Feb 8, 2025 -
Add support for height sharded and tiled inputs in `ttnn.concat`
#17764 opened
Feb 8, 2025 -
#0: uplifting diffuser pkg for sd35, and add protobuf pkg
#17765 opened
Feb 9, 2025 -
Mbahnas/update mod owners
#17766 opened
Feb 9, 2025 -
#17768: Float32 support for Training mode in Batch Norm
#17769 opened
Feb 9, 2025 -
[Draft] Fix nanogpt training
#17772 opened
Feb 9, 2025 -
Add pcc checks to conv2d sweep local runs
#17774 opened
Feb 9, 2025 -
Fix I2S alignment issue on BH
#17775 opened
Feb 9, 2025 -
[Draft] Refactor Maxpool2d tests
#17782 opened
Feb 10, 2025 -
#0: Yolov7 ttnn implementation & demo
#17786 opened
Feb 10, 2025 -
#17758: Update Batch Norm Running stats kernel for training mode
#17788 opened
Feb 10, 2025 -
Parallelization over last two dims for tilize/untilize with padding
#17790 opened
Feb 10, 2025 -
Add support for minimal all_reduce for Llama shapes
#17792 opened
Feb 10, 2025 -
#0: add .ttinsn to sfpi
#17800 opened
Feb 10, 2025 -
Manage NCRISC IRAM transfer on NCRISC on Wormhole
#17805 opened
Feb 10, 2025 -
Optimised GCD implementation for int32 using pure SFPU
#17807 opened
Feb 10, 2025 -
Remove `tt_cluster.hpp` from public API
#17813 opened
Feb 11, 2025 -
#0: [tt-train] DRAFT add new ttml ops
#17814 opened
Feb 11, 2025 -
#0: increase test vc/mux demux thresholds
#17815 opened
Feb 11, 2025 -
fix to_layout sharded bug
#17820 opened
Feb 11, 2025 -
[tt-train] Add bf16 support
#17821 opened
Feb 11, 2025 -
#17687: Add data_type checker
#17828 opened
Feb 12, 2025 -
[skip ci] Dockerize tt-train cpp tests workflow
#17834 opened
Feb 12, 2025
69 Issues closed by 36 people
-
Yolov8m - Model Card
#17704 closed
Feb 12, 2025 -
Cleanup: Minimize public dependencies
#13499 closed
Feb 12, 2025 -
All Test Cases Not Running for T3K Nightly
#17702 closed
Feb 11, 2025 -
tt-metal, tt-umd, and tt-train are bootstrapping an environment variable CPM_SOURCE_CACHE
#17093 closed
Feb 11, 2025 -
build toolchain w/ RAW hazard workaround
#14596 closed
Feb 11, 2025 -
Fix local gnu toolchain testsuite failures
#16603 closed
Feb 11, 2025 -
Add ttnn::pad signature which accepts padding pairs and no queue_id
#17388 closed
Feb 11, 2025 -
Conv ops/models
#17247 closed
Feb 10, 2025 -
Reshape
#17310 closed
Feb 10, 2025 -
Flip sharding checks in tensor layout and tensor spec to TT_FATAL
#17060 closed
Feb 10, 2025 -
allocator uses 32B alignment for both DRAM and L1
#13609 closed
Feb 10, 2025 -
watcher: check that a barrier won't stall at the end of a kernel run
#15265 closed
Feb 10, 2025 -
SD unit tests (wormhole)
#17134 closed
Feb 10, 2025 -
[Bug Report] Incorrect tracer error message when enable_fast_runtime_mode=true
#17773 closed
Feb 10, 2025 -
Update _logit to use tensor-scalar binary op overloads where possible
#17559 closed
Feb 10, 2025 -
[Bug Report] BinaryNg forces the input scalar to have the same data format as the input tensor
#17681 closed
Feb 9, 2025 -
Batch normalization support
#12253 closed
Feb 9, 2025 -
ttnn.pow returns wrong values with exponent tensor of shape larger than [32,32]
#16733 closed
Feb 8, 2025 -
Sfpu kernels clean up
#5424 closed
Feb 7, 2025 -
EDM Fabric: Merge command type and noc command type in packet header.
#17429 closed
Feb 7, 2025 -
Improve consistency in how we pass around dynamic array of elements
#17544 closed
Feb 7, 2025 -
Disable shallow conv for blackhole
#17224 closed
Feb 7, 2025 -
PCC error on conv2d blackhole tests
#17226 closed
Feb 7, 2025 -
Alignment issue when the channel size = 16
#16964 closed
Feb 6, 2025 -
Deprecated operations
#17622 closed
Feb 6, 2025 -
Port Tilize/Untilize CPP Unit Tests to TTNN
#17323 closed
Feb 6, 2025 -
Port Padding CPP Unit Tests to TTNN
#17324 closed
Feb 6, 2025 -
Remove and Port Eager CPP TM Unit Tests to TTNN
#17321 closed
Feb 6, 2025 -
Improve comparison function to better handle inf values
#5815 closed
Feb 6, 2025 -
Add the ability to debug yaml based sweep tests
#6104 closed
Feb 6, 2025 -
Add more test combinations to tt_lib sweeps
#6873 closed
Feb 6, 2025 -
Check parameters for the ttnn operations
#7292 closed
Feb 6, 2025 -
Improve sweep infra so that each input can have different rank
#7539 closed
Feb 6, 2025 -
Move ttnn dependency out of tests/tt_eager
#8429 closed
Feb 6, 2025 -
Test ttnn ops with dimensions not aligned to 32x32 and Tile layout
#8596 closed
Feb 6, 2025 -
Update ttnn sweeps to have not aligned shapes with tile sizes
#8754 closed
Feb 6, 2025 -
Create ttnn.bcast sweeps
#12316 closed
Feb 6, 2025 -
[Bug Report] install_dependencies.sh script was unable to install dependencies on Ubuntu 22.04
#17613 closed
Feb 6, 2025 -
Unary broadcasting of a single tile with its first scalar/row/column
#16935 closed
Feb 6, 2025 -
Github-Pages artifact fetch failed
#17623 closed
Feb 6, 2025 -
Adding a sharding address generator
#17375 closed
Feb 6, 2025 -
[Bug Report] Resnet50 accuracy is way off on Wormhole
#16895 closed
Feb 6, 2025 -
Incorrect pack_untilize after ttnn.max_pool2d
#17132 closed
Feb 6, 2025 -
[Bug Report] PackUntilizeDstTinyTile breaks ReduceHW
#12772 closed
Feb 6, 2025 -
[Feature Request] ...
#17639 closed
Feb 6, 2025 -
ttnn.logit operation fails with low PCC when eps == 0 and input type is bfloat16
#13628 closed
Feb 6, 2025 -
Boost fetching from CPM is janky
#17441 closed
Feb 6, 2025 -
PCC error for Conv2d is seen for short sweep for Blackhole for width sharding scheme.
#17239 closed
Feb 6, 2025 -
Dockerfiles are janky
#17464 closed
Feb 6, 2025 -
[Feature Request] Add Light Metal capture/replay initial changes to tt-metal for some workloads
#17039 closed
Feb 5, 2025 -
RepeatOp produces Index is out of bounds for the rank error for 1D Tensor
#16698 closed
Feb 5, 2025 -
[Bug Report] Repeat doesn't work as expected
#14443 closed
Feb 5, 2025 -
Repeat operation fails for some combinations of shapes / repeat dimensions
#16701 closed
Feb 5, 2025 -
ttnn.repeat doesn't support broadcasting
#17243 closed
Feb 5, 2025 -
Remove update_dispatch_cores_for_multi_cq_eth_dispatch() from device
#17290 closed
Feb 5, 2025 -
Blackhole CNN bringup
#11124 closed
Feb 5, 2025 -
Close loop between install_dependencies.sh and requirements.txt / Dockerfile
#15034 closed
Feb 5, 2025 -
Add worker related apis to MeshDevice
#17203 closed
Feb 5, 2025 -
Enable Resnet50 on Blackhole
#16705 closed
Feb 5, 2025 -
CPM Timeout issues
#17319 closed
Feb 5, 2025 -
Implement auto shard selection for conv2d
#13541 closed
Feb 5, 2025 -
Stable diffusion is broken in "Nightly model and ttnn tests"
#17483 closed
Feb 5, 2025 -
[Bug Report] Packing from fp32 dst to bfloat16 CB always ceiling
#17018 closed
Feb 5, 2025
83 Issues opened by 49 people
-
Dockerize tt-train cpp-tests
#17833 opened
Feb 12, 2025 -
Device Initialization Failure – Firmware Initialization Error
#17832 opened
Feb 12, 2025 -
[Feature Request] Support using a subset of available devices in a system
#17831 opened
Feb 12, 2025 -
Binary_ng ops survey tests
#17829 opened
Feb 12, 2025 -
Build tt-train workflow
#17827 opened
Feb 12, 2025 -
ResNet N300 Benchmark Batch Size 32 Parallel Processing Not Supported
#17826 opened
Feb 12, 2025 -
tt_stl should be a standalone header only library
#17825 opened
Feb 12, 2025 -
Minimize tt_metal API surface area
#17824 opened
Feb 11, 2025 -
Connect Releases to a green CI Run
#17822 opened
Feb 11, 2025 -
[Bug Report] bad_optional_access error when adding new Op
#17817 opened
Feb 11, 2025 -
Request for large monitor to display Metal CI stats in TO & SC
#17812 opened
Feb 11, 2025 -
Skipped Build steps are showing up in Superset as indistinguishable from Failed Build steps
#17811 opened
Feb 11, 2025 -
Llama3.2-11b-vision perf is not being checked in CI or uploaded to Superset
#17809 opened
Feb 10, 2025 -
Llama3.2-11b-vision-n300-bs16 perf regression (27% for decode) between Jan 30 and Feb 3
#17808 opened
Feb 10, 2025 -
[Feature Request] Re-enable sharding checks for test_vector_conversion.cpp.
#17806 opened
Feb 10, 2025 -
CCL operations with replica_groups=1
#17804 opened
Feb 10, 2025 -
TT_ASSERT output is ambiguous
#17803 opened
Feb 10, 2025 -
New model support in Llama3 codebase [CI and README]
#17797 opened
Feb 10, 2025 -
Re-enable COL_MAJOR test cases for `ttnn.upsample`
#17795 opened
Feb 10, 2025 -
Can SDPA be further optimized?
#17793 opened
Feb 10, 2025 -
Llama3 demo multi-device tests hanging with trace
#17791 opened
Feb 10, 2025 -
[Bug Report] Regression: Out-of-memory in ttnn.conv2d
#17787 opened
Feb 10, 2025 -
Swin_S Model Card
#17785 opened
Feb 10, 2025 -
ttnn.hardtanh low PCC for a specific input used in mobilenetv1_100.ra4_e3600_r224_in1k
#17784 opened
Feb 10, 2025 -
Add LightMetal capture + replay support for more host_api.hpp APIs
#17779 opened
Feb 10, 2025 -
[Bug Report] ttnn.gcd doesn't support int32
#17771 opened
Feb 9, 2025 -
Add support for UINT32 for ttnn.copy/typecast
#17770 opened
Feb 9, 2025 -
Fp32 support for batch norm
#17768 opened
Feb 9, 2025 -
superset post commit per test data needs test file column and test count
#17767 opened
Feb 9, 2025 -
TTNN app leaves global system-wide no-cleared semaphore on exit . Severe. Need resotiion ASAP.
#17763 opened
Feb 8, 2025 -
Binary INT Ops in Catch22
#17759 opened
Feb 8, 2025 -
Optimization of Batch Norm
#17758 opened
Feb 8, 2025 -
Prototype unpack tilize that can be fused with matmul
#17757 opened
Feb 8, 2025 -
Unpredicted order of DevicePool destructor call
#17754 opened
Feb 8, 2025 -
Investigate tiny tile matmul pytest failure with watcher on
#17751 opened
Feb 8, 2025 -
[Bug Report] Performance regression in ttnn.add
#17741 opened
Feb 7, 2025 -
Move some matmul tests to new location and adjust some dimensions
#17737 opened
Feb 7, 2025 -
Upload gtest data to superset (cpp unit tests and sd unit tests)
#17731 opened
Feb 7, 2025 -
Minor typo fixes in tech reports.
#17730 opened
Feb 7, 2025 -
Extend hal to note which riscs use IRAM, clean up hard-coded ncrisc checks
#17729 opened
Feb 7, 2025 -
Support Stable Diffusion 1.4 on Blackhole
#17725 opened
Feb 7, 2025 -
ttnn.tanh low PCC for a specific input used in albert_v2_base for masked lm
#17721 opened
Feb 7, 2025 -
[Bug Report] ReshapeViewOperation invoke with MemoryConfig
#17720 opened
Feb 7, 2025 -
Yolov9 - Bring Up
#17718 opened
Feb 7, 2025 -
[Bug Report] Automatically pad EmbeddingBackward index tensor
#17714 opened
Feb 7, 2025 -
[Bug Report] Invalid rounding with reduce_tile REDUCE_SCALAR
#17713 opened
Feb 7, 2025 -
[Bug Report] to_layout with non-full grid shard when utilizing to DRAM stays in L1
#17706 opened
Feb 7, 2025 -
[Bug Report] to_layout with non-full grid shard data corrupted when untilizing and unpading
#17705 opened
Feb 7, 2025 -
Yolov8x - Model Card
#17703 opened
Feb 7, 2025 -
permuting [1,N] shape to [N,1] gives garbage values
#17701 opened
Feb 7, 2025 -
YOLOv9c - Model Card
#17700 opened
Feb 7, 2025 -
Compile FW and Kernels with -ffunction-sections, -fdata-sections, and -Wl,-gc-sections
#17699 opened
Feb 7, 2025 -
[Bug Report] `ttnn.add` doesn't work as expected for `ttnn.uint8`
#17687 opened
Feb 6, 2025 -
Improve eltwise binary ng test coverage
#17682 opened
Feb 6, 2025 -
Remove conv tests from tt-eager
#17679 opened
Feb 6, 2025 -
[Async CCL] send_payload_flush_non_blocking_from_address causes nd pcc
#17671 opened
Feb 6, 2025 -
Maxpool2d only outputs ROW_MAJOR tensors, need TILE
#17664 opened
Feb 6, 2025 -
[Bug Report] Conv2d accuracy issues on resnet50 conv (wormhole), caused by split reader
#17662 opened
Feb 6, 2025 -
[Feature Request] Support passing 4 values to padding for Conv2dOp
#17656 opened
Feb 6, 2025 -
Implement perf with trace_2cqs for SqueezeBERT on n300
#17654 opened
Feb 6, 2025 -
SD - Refactor model to use model config
#17651 opened
Feb 6, 2025 -
[Bug Report] Transposed conv2d PCC failures
#17647 opened
Feb 6, 2025 -
[Bug Report] ttnn.embedding returns incorrect result for tiled input
#17643 opened
Feb 6, 2025 -
Address llk_pack_untilize_hw_configure_disaggregated
#17641 opened
Feb 6, 2025 -
[Bug Report] test_conv_prepare_weights_and_biases assertion fails
#17638 opened
Feb 6, 2025 -
Surface data about the build steps
#17633 opened
Feb 6, 2025 -
Release Docker Image is 7GB
#17618 opened
Feb 5, 2025 -
Tensor's to_layout, pad, unpad APIs should behave identically to their TTNN Ops counterparts
#17616 opened
Feb 5, 2025 -
Collect singletons used in Metal codebase under a global "context" object
#17607 opened
Feb 5, 2025 -
[blackhole] intermittent hang on CommandQueueProgramFixture.TensixTestRandomizedProgram
#17605 opened
Feb 5, 2025 -
WEKA/NFS Mounting Issues
#17603 opened
Feb 5, 2025 -
[Bug Report] pytest tests/ttnn/distributed/test_data_parallel_example_TG.py failed to work
#17595 opened
Feb 5, 2025 -
ttnn.moreh_cumsum causes low accuracy on Bloom model
#17594 opened
Feb 5, 2025 -
[GH Permissions] Access to metalium-developers
#17590 opened
Feb 5, 2025 -
ttnn.gt fails when operands need boradcasting
#17589 opened
Feb 5, 2025 -
Access to metalium-developers
#17588 opened
Feb 5, 2025
97 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[tt-train] Add RMSNorm module
#16991 commented on
Feb 11, 2025 • 12 new comments -
New `split` based on `slice`
#17461 commented on
Feb 9, 2025 • 8 new comments -
#15450: Remove default values from circular buffer parameters in LLK compute APIs: Matmul
#16571 commented on
Feb 7, 2025 • 4 new comments -
Add avg_pool2d with kernel size support
#14268 commented on
Feb 6, 2025 • 3 new comments -
Allow the user to select the version of the docs
#17434 commented on
Feb 11, 2025 • 2 new comments -
#16174: Support for int32 subtraction for WHB0 and BH
#17359 commented on
Feb 5, 2025 • 1 new comment -
#15450: Remove default values from circular buffer parameters in LLK compute APIs: Docs
#17567 commented on
Feb 7, 2025 • 0 new comments -
Create knowledge sharing doc to explain Python packaging and wheel setup
#12707 commented on
Feb 12, 2025 • 0 new comments -
Build on Ubuntu 22.04
#14390 commented on
Feb 12, 2025 • 0 new comments -
Use a stable serialization format for caching tensors on disk
#16067 commented on
Feb 11, 2025 • 0 new comments -
Clean Device init APIs
#17209 commented on
Feb 11, 2025 • 0 new comments -
[Feature Request] Prebuilt `*.deb` packages and PPA for TT software
#7915 commented on
Feb 11, 2025 • 0 new comments -
Remove obselete dependencies and try moving remaining ones into CPM
#9407 commented on
Feb 11, 2025 • 0 new comments -
Breakout/Optimize Perf Microbenchmark Tests
#16774 commented on
Feb 11, 2025 • 0 new comments -
Anaconda Support
#13734 commented on
Feb 11, 2025 • 0 new comments -
implement rand for WH/BH
#14597 commented on
Feb 11, 2025 • 0 new comments -
investigate unsigned comparisons
#14598 commented on
Feb 11, 2025 • 0 new comments -
ttnn.fmod unary low PCC when scalar is between -0.003 and 0.003
#17362 commented on
Feb 11, 2025 • 0 new comments -
ttnn.remainder unary low PCC when scalar is between -0.003 and 0.003
#17361 commented on
Feb 11, 2025 • 0 new comments -
Yolov11 - Model card
#13772 commented on
Feb 11, 2025 • 0 new comments -
Unit tests and models fail new TT_FATAL validation for sharding
#16948 commented on
Feb 10, 2025 • 0 new comments -
[Bug Report] captured_graph is missing buffer address on both L1 and DRAM
#16499 commented on
Feb 10, 2025 • 0 new comments -
SFPU shift operator issue when using sfpi
#15514 commented on
Feb 10, 2025 • 0 new comments -
Multidevice tensors do not work in comparison mode
#15363 commented on
Feb 10, 2025 • 0 new comments -
[Bug Report] `dprint_tensix_dest_reg` Bug
#17481 commented on
Feb 10, 2025 • 0 new comments -
[Feature Request] Support large tensor sizes in ttnn.conv2d
#17489 commented on
Feb 10, 2025 • 0 new comments -
Remove "Reach-Arounds" in TT-NN interfacing with TT-Metal
#17199 commented on
Feb 10, 2025 • 0 new comments -
[Feature Request] Light Metal Feature parent/tracking ticket
#17037 commented on
Feb 10, 2025 • 0 new comments -
Fix shape in outer
#17492 commented on
Feb 7, 2025 • 0 new comments -
Fix stored size of sharded buffers to match what device buffer expects
#17450 commented on
Feb 5, 2025 • 0 new comments -
#17218: Add output_dtype support for binary_ng
#17417 commented on
Feb 6, 2025 • 0 new comments -
Printing packer's and unpacker's configuration registers
#17368 commented on
Feb 7, 2025 • 0 new comments -
#16147: Replace binary with binary_ng
#17160 commented on
Feb 7, 2025 • 0 new comments -
Support parallelization over width for tilize with val padding
#17100 commented on
Feb 5, 2025 • 0 new comments -
Enable ConvMnist and Mnist integration and performance tests.
#16965 commented on
Feb 6, 2025 • 0 new comments -
#16888: Fix Conv2D when output is in Row Major
#16937 commented on
Feb 6, 2025 • 0 new comments -
Refactor llama3 demo to the new generator API
#16753 commented on
Feb 7, 2025 • 0 new comments -
#14080: Preprocess weights for Conv2D on Device
#16750 commented on
Feb 6, 2025 • 0 new comments -
[WIP] [TT-Train] TTNN Training
#16617 commented on
Feb 6, 2025 • 0 new comments -
Use NOC stream registers for signaling
#16558 commented on
Feb 10, 2025 • 0 new comments -
TTNN generic OP
#16546 commented on
Feb 12, 2025 • 0 new comments -
Fix typos.
#15365 commented on
Feb 7, 2025 • 0 new comments -
#14732: add bert-tiny test_performance using trace and 2cq-WIP
#14799 commented on
Feb 7, 2025 • 0 new comments -
#0: async
#11158 commented on
Feb 12, 2025 • 0 new comments -
Run CI tests in 20.04 docker
#12498 commented on
Feb 12, 2025 • 0 new comments -
Yolov7 Trace+2cq fails with Out of Memoy issue
#17583 commented on
Feb 12, 2025 • 0 new comments -
[Bug Report] tt_simulation_device.cpp is always compiled.
#15161 commented on
Feb 12, 2025 • 0 new comments -
Add profiler-enabled wheel to release assets, despite it not working yet
#14301 commented on
Feb 12, 2025 • 0 new comments -
Upgrade CI runners to run Ubuntu 22.04 natively
#12492 commented on
Feb 12, 2025 • 0 new comments -
ttnn.neg, ttnn.abs, ttnn.selu and ttnn.identity give low pcc with sharded input
#16181 commented on
Feb 7, 2025 • 0 new comments -
ttnn.fill_implicit_tile_padding hangs for bfloat8_b
#17077 commented on
Feb 7, 2025 • 0 new comments -
FD ring buffer is stalling too often
#15221 commented on
Feb 6, 2025 • 0 new comments -
as_tensor fails when saving/loading a tensor with transposed tiles
#15496 commented on
Feb 6, 2025 • 0 new comments -
Request for SFPU LLKs for more flexible broadcasting
#16103 commented on
Feb 6, 2025 • 0 new comments -
ttnn.to_dtype conversion issue from bfloat8_b to bfloat16
#17159 commented on
Feb 6, 2025 • 0 new comments -
Add explicit BroadcastOp for TTNN
#16015 commented on
Feb 6, 2025 • 0 new comments -
Add retry loop when calling gh api during _produce_data.yaml to recover from rate limiting
#17374 commented on
Feb 6, 2025 • 0 new comments -
Remove default value for output operand (16) across BH LLK API calls
#15450 commented on
Feb 6, 2025 • 0 new comments -
[Ops] Support for Conv3d op (ttnn.Conv3d)
#15103 commented on
Feb 6, 2025 • 0 new comments -
[Feature Request] Support large tensor sizes in ttnn.group_norm
#17490 commented on
Feb 6, 2025 • 0 new comments -
REVERSE: Returns a tensor with the data reversed along the given axis.
#17116 commented on
Feb 5, 2025 • 0 new comments -
Blackhole: conv2d tests PCC failure when input channels = 16 (<32)
#16992 commented on
Feb 5, 2025 • 0 new comments -
Resnet50 on Blackhole: Optimizations
#17393 commented on
Feb 5, 2025 • 0 new comments -
TM Failures on BH
#17230 commented on
Feb 5, 2025 • 0 new comments -
PCC failure from `ttnn.moreh_norm` for non-last dim
#16335 commented on
Feb 5, 2025 • 0 new comments -
CPP Unit test MultiCommandQueueSingleDeviceFixture.TestMultiAppThreadSync hangs
#17345 commented on
Feb 5, 2025 • 0 new comments -
Async FD out of Eth cores on BH hang
#16643 commented on
Feb 5, 2025 • 0 new comments -
CCL Ops Test hang to be disabled
#17344 commented on
Feb 5, 2025 • 0 new comments -
repeat pytest hitting op assert
#14518 commented on
Feb 5, 2025 • 0 new comments -
[Feature Request] Improvement Needed for Unit Tests
#6633 commented on
Feb 5, 2025 • 0 new comments -
Tracy profiler on BH not working
#17099 commented on
Feb 5, 2025 • 0 new comments -
VMs losing communication with the GH server
#17240 commented on
Feb 5, 2025 • 0 new comments -
Make comparison mode work with fast runtime mode
#16762 commented on
Feb 5, 2025 • 0 new comments -
[Feature Request] Implement scatter communication mechanism on-device
#17314 commented on
Feb 10, 2025 • 0 new comments -
Missing interface - scatter
#16942 commented on
Feb 10, 2025 • 0 new comments -
Missing interface - gather
#16941 commented on
Feb 10, 2025 • 0 new comments -
Eltwise Master Tracking
#13795 commented on
Feb 9, 2025 • 0 new comments -
ttnn.maximum unsupported broadcast
#14852 commented on
Feb 8, 2025 • 0 new comments -
Matmul hang on BH
#16439 commented on
Feb 8, 2025 • 0 new comments -
sometimes on GS max tests fail when all tests in file are run
#17084 commented on
Feb 8, 2025 • 0 new comments -
Resnet50 on Blackhole: Using pre-trained data gives bad PCC
#17558 commented on
Feb 8, 2025 • 0 new comments -
[Bug Report] Matmul gives nondeterministic result
#17143 commented on
Feb 7, 2025 • 0 new comments -
Add handling in logs + artifacts download script for data collection for logs that don't exist
#12966 commented on
Feb 7, 2025 • 0 new comments -
Low PCC in LeNet Data Parallel with ttnn.reshape in TILE_LAYOUT.
#15422 commented on
Feb 7, 2025 • 0 new comments -
Fix CCL PCC error with Sharded Addrgen on disjointed core ranges
#17391 commented on
Feb 7, 2025 • 0 new comments -
Split TTNN into C++ library and Python binding
#16418 commented on
Feb 7, 2025 • 0 new comments -
Incorrect ttnn.linear result for activations with shape Mx1xN
#16599 commented on
Feb 7, 2025 • 0 new comments -
Llama3 model family - list of required ops for blackhole
#16013 commented on
Feb 7, 2025 • 0 new comments -
Multichip ops
#17246 commented on
Feb 7, 2025 • 0 new comments -
Stable diffusion 3.5 medium - Bring up
#15969 commented on
Feb 7, 2025 • 0 new comments -
Fabric EDM Optimization (to 10+ GB/s per direction):
#17423 commented on
Feb 7, 2025 • 0 new comments -
[Bug Report] reshape/permute incorrect outputs on multidevice
#17535 commented on
Feb 7, 2025 • 0 new comments -
[Master issue] Data pipeline and benchmarking infrastructure
#10718 commented on
Feb 7, 2025 • 0 new comments -
[Bug Report] TTNN typecast operation fails when the tensor is on host
#16279 commented on
Feb 7, 2025 • 0 new comments -
[Feature Request] Conv2d dram slicing
#17493 commented on
Feb 7, 2025 • 0 new comments -
Int32 support for subtract op
#16174 commented on
Feb 7, 2025 • 0 new comments -
Incorrect data from ttnn.from_torch for sharding
#15565 commented on
Feb 7, 2025 • 0 new comments