-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a sharding address generator #17375
Labels
Comments
6 tasks
jvegaTT
added a commit
that referenced
this issue
Feb 6, 2025
### Ticket #17375 ### Sales Pitch This is a big one to review but hopefully very helpful to alot of you. This PR introduces the sharding address generator which makes navigating sharded tensors as easy as interleaved ones !!! In fact you can use the exact same API as the interleaved address generators so no more # ifdefs to differentiate between interleaved or sharded after initializing your address generator. ### Problem description This PR introduces the sharded address generator accessor which allows you to get noc addresses for sharded tensors. It is meant to be as easy to use as possible, the instructions are below: ShardedAddrGen requires the type definition of a Sharded_Info class object who's templates hold the CT information ex. typedef Sharded_Info < SHARD_TYPE, NUMBER_OF_CORES, PAGE_SIZE_JUMP, PAGES_PER_TENSOR_ROW, CONTIGUITY, PAGES_PER_SHARD_WIDTH, ROWS_PER_SHARD_HEIGHT> tensor_1_shard_info; The above parameters are usually obtained using get_compile_time_arg_val. In the program factory you can create an vector containing the above parameters in order using the function shard_pf_builder:sharding_ct_table_builder(const tt::tt_metal::IDevice* device, const tt::tt_metal::Tensor& t) defined in ttnn/cpp/ttnn/operations/ccl/sharding_addrgen_pf_helper.cpp It also needs a shard array map which can be extracted from the RT args using shard_addr_gen_utils::parse_map function which requires the Sharded_Info class object ex. auto mapping = parse_map<tensor_1_shard_info>(rt_index); const uint32_t* const shard_array_map = mapping.first; //Contains the shard array map rt_index = mapping.second;//contains the new runtime index In the program factory you can create an vector containing the runtime arguments extracted by this function using the function shard_pf_builder:get_linear_shard_list(const tt::tt_metal::IDevice* device, const tt::tt_metal::Tensor& t) defined in ttnn/cpp/ttnn/operations/ccl/sharding_addrgen_pf_helper.cpp Lastly you need the bank_base_address from the Tensor object just like interleaved addr gen You can then create a sharded addrgen as follows: s = ShardedAddrGen <tensor_1_shard_info> {.bank_base_address = bank_base_address, .shard_array=shard_array_map}; This object can then be used by the get_noc_addr api: uint64_t noc_address = get_noc_addr(page_num,s); I also introduce get_contiguous_noc_addr which returns both the noc address and the number of contiguous pages after that point to speed up operations. ### Checklist - [x] Post commit CI passes - https://github.com/tenstorrent/tt-metal/actions/runs/13058740954 -- just a clang-format failure on a file I didn't write, should be fixed after rebase - [ ] Blackhole Post commit (if applicable) - [ ] Model regression CI testing passes (if applicable) - [ ] Device performance regression CI testing passes (if applicable) - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [ ] New/Existing tests provide coverage for changes
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
We require a sharding address generator that is fully compatible with its interleaved counterpart's api, is easy to set up by OP writers, and enables the native support of sharding in all cores
The text was updated successfully, but these errors were encountered: