[Proposal] Support Multiple Prefill + Decode in a loop #9466

kushrast · 2025-03-20T19:23:54Z

We would like to support multi-turn conversations with the AR-N model by allowing prefill + decode to be called in a loop without resetting the internal KV cache state.

Example:
Initialize Runner and KV Cache
Prompt: "Call David"
Response: "Okay, Call David Lee?"
Prompt: "No, call David Smith"
Response: "Okay, Calling David Smith, not David Lee"
Clear KV Cache

Assumptions:

The assumption here is that to support multiple prefill + decode in a loop, we need to update the prefill_input_pos and prefill_attention_mask to reflect the previously decoded tokens + new prompt tokens.
An additional assumption is that we need to update the pointers for k_cache and v_cache for prefill

What this PR does:

Add a function update_kv_to_prefill_io to advance prefill pointers for v_cache. Also sets attention_mask up to pos (to cover tokens generated during decode).
Update fill_prefill_toks to take in previous tokens prefilled + generated so far and using this to set input_pos and attention mask
Updates test runner to save number of tokens generated and pass to IO Manager. Also comments out resetting KV Cache state so it gets re-used.

Current State:
The code does not crash, but also does not produce proper input. Tested on Samsung S24 with QNN 2.28 binaries

./qnn_llama3_2_runner --model_path hybrid_llama_qnn.pte --tokenizer_path tiktokenizer.bin --eval_mode 1 --prompt "Call David" --kv_updater "ShiftPointer" --logits_scale 0.1 --output_path output.txt --num_iters 2

renamerenamerenamehabihabihabihabihabihabihabihabi date date date date date date握 culturesogh MMI MMI MMIhabihabihabihabihabislidesванetus Tangourt Abrams datefeedingfeedinghabi Date dateolved MMIhabihabihabihabihabihabihabihabihabihabihabihabi族OGLEalse date date date dateucusucusucushabihabi

pytorch-bot · 2025-03-20T19:23:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9466

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 785a121 with merge base 0342bab ():

NEW FAILURES - The following jobs have failed:

Check Labels / Check labels (gh)
RuntimeError: Error checking labels: PR does not have required labels
Lint / lintrunner / linux-job (gh)
>>> Lint for examples/qualcomm/oss_scripts/llama/runner/runner.cpp:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-03-20T19:24:05Z

This pull request was exported from Phabricator. Differential Revision: D71567692

github-actions · 2025-03-20T19:24:56Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Summary: Pull Request resolved: pytorch#9466 Differential Revision: D71567692

facebook-github-bot · 2025-03-20T20:58:18Z

This pull request was exported from Phabricator. Differential Revision: D71567692

sxu · 2025-03-20T22:13:09Z

but we are unsure if we need to update the pointers for the k_cache.

Yeah I think each cache input pointer need to be updated to base address + input_pos.

sxu · 2025-03-20T22:26:17Z

but we are unsure if we need to update the pointers for the k_cache.

Yeah I think each cache input pointer need to be updated to base address + input_pos.

Just to elaborate, for shift pointers when switching between one method to another, the followings are needed:

update K cache (need to scatter the update because K is transposed) and V cache content (simply memcpy).
prepare new method's KV input cache pointers: each K cache points to base address + input_pos (again, because K is transposed), each V cache points to base address + input_pos * head_dim.
update new methods mask.

The update from prefill to kv_io is already implemented by update_prefill_to_kv_io, it performs all of 1~3 outlined above. The new update_kv_to_prefill_io is only doing 2) for V caches. I think you at least need to set the points for K caches as well. You could also add 1) and 3), but they can also be performed by calling other existing functions (update_kv_io for 1, fill_prefill_toks for 3), it's up to you.

Summary: Pull Request resolved: pytorch#9466 Differential Revision: D71567692

facebook-github-bot · 2025-03-20T22:47:53Z

This pull request was exported from Phabricator. Differential Revision: D71567692

kushrast · 2025-03-20T22:50:15Z

but we are unsure if we need to update the pointers for the k_cache.

Yeah I think each cache input pointer need to be updated to base address + input_pos.

Just to elaborate, for shift pointers when switching between one method to another, the followings are needed:

update K cache (need to scatter the update because K is transposed) and V cache content (simply memcpy).

prepare new method's KV input cache pointers: each K cache points to base address + input_pos (again, because K is transposed), each V cache points to base address + input_pos * head_dim.

update new methods mask.

The update from prefill to kv_io is already implemented by update_prefill_to_kv_io, it performs all of 1~3 outlined above. The new update_kv_to_prefill_io is only doing 2) for V caches. I think you at least need to set the points for K caches as well. You could also add 1) and 3), but they can also be performed by calling other existing functions (update_kv_io for 1, fill_prefill_toks for 3), it's up to you.

Thanks for the feedback. I updated update_kv_to_prefill_io to do steps 2 and 3. I am assuming 1 is done by the last call to update_kv_io in kv_execute. I also noticed I was resetting the KV cache state instead of re-using it between iterations. I have commented that out now, though I still see bad output from the second iteration of the model through the runner.

haowhsu-quic · 2025-03-21T11:50:48Z

examples/qualcomm/oss_scripts/llama/runner/io_manager.cpp

+      v_cache_in_[prefill_forward_name_];
+  std::vector<std::unique_ptr<executorch::aten::TensorImpl>>& v_cache_out =
+      v_cache_out_[prefill_forward_name_];
+  for (int i = 0, v_cache_stride = head_dim_ * pos; i < v_cache_in.size();


Since the pointers of v_cache_in/out might have been updated few rounds (ARN + KV + ARN + KV...) before in your scenario, the base pointers will not stay in the initial state.
So if the pos is an absolute value, the pointers are going to be put beyond the expected positions. (same issues to k_cache)

examples/qualcomm/oss_scripts/llama/runner/io_manager.cpp

Summary: Pull Request resolved: pytorch#9466 Differential Revision: D71567692

facebook-github-bot · 2025-03-24T20:32:14Z

This pull request was exported from Phabricator. Differential Revision: D71567692

Summary: Pull Request resolved: pytorch#9466 Differential Revision: D71567692

facebook-github-bot · 2025-03-24T23:32:19Z

This pull request was exported from Phabricator. Differential Revision: D71567692

examples/qualcomm/oss_scripts/llama/runner/io_manager.cpp

Summary: Pull Request resolved: pytorch#9466 Differential Revision: D71567692

facebook-github-bot · 2025-03-26T02:03:20Z

This pull request was exported from Phabricator. Differential Revision: D71567692

kushrast · 2025-03-26T02:04:28Z

@haowhsu-quic I updated the PR with your comments - still seeing bad output but I think we are setting last position correctly.

haowhsu-quic · 2025-03-26T02:41:11Z

examples/qualcomm/oss_scripts/llama/runner/io_manager.cpp

+  for (int i = 0, v_cache_stride = head_dim_ * pos_diff; i < v_cache_in.size();
+       i++) {
+    v_cache_in[i]->set_data(
+        v_cache_in[i]->mutable_data<uint8_t>() + v_cache_stride);


v_cache_out needs to be updated as well, please refer to the resolved comment, thank you.

haowhsu-quic · 2025-03-26T02:43:24Z

examples/qualcomm/oss_scripts/llama/runner/io_manager.cpp

+       i++) {
+    k_cache_in[i]->set_data(
+        k_cache_in[i]->mutable_data<uint8_t>() + k_cache_stride);
+    uint8_t* ptr_in = k_cache_in[i]->mutable_data<uint8_t>() - pos_diff;


Here we need to get the origin for deep copy: uint8_t* ptr_in = k_cache_in[i]->mutable_data<uint8_t>() - pos;

cccclai · 2025-03-26T04:04:15Z

@haowhsu-quic do you mean changes like this?

+
+  // update v_cache
+  std::vector<std::unique_ptr<executorch::aten::TensorImpl>>& v_cache_out =
+      v_cache_out_[prefill_forward_name_];
+
+  for (int i = 0, v_cache_stride = head_dim_ * pos_diff; i < v_cache_in.size(); ++i) {
+    v_cache_in[i]->set_data(v_cache_in[i]->mutable_data<uint8_t>() + v_cache_stride);
+    v_cache_out[i]->set_data(v_cache_out[i]->mutable_data<uint8_t>() + v_cache_stride);
   }
 
   // update k_cache
@@ -521,7 +525,7 @@
        i++) {
     k_cache_in[i]->set_data(
         k_cache_in[i]->mutable_data<uint8_t>() + k_cache_stride);
-    uint8_t* ptr_in = k_cache_in[i]->mutable_data<uint8_t>() - pos_diff;
+    uint8_t* ptr_in = k_cache_in[i]->mutable_data<uint8_t>() - pos;
     for (int j = 0; j < head_dim_; ++j) {
       memcpy(
         ptr_in + j * prefill_cache_len_,

It seems still not quite right. I need to learn your logic a bit more.

haowhsu-quic · 2025-03-26T04:54:37Z

@haowhsu-quic do you mean changes like this?

+
+  // update v_cache
+  std::vector<std::unique_ptr<executorch::aten::TensorImpl>>& v_cache_out =
+      v_cache_out_[prefill_forward_name_];
+
+  for (int i = 0, v_cache_stride = head_dim_ * pos_diff; i < v_cache_in.size(); ++i) {
+    v_cache_in[i]->set_data(v_cache_in[i]->mutable_data<uint8_t>() + v_cache_stride);
+    v_cache_out[i]->set_data(v_cache_out[i]->mutable_data<uint8_t>() + v_cache_stride);
   }
 
   // update k_cache
@@ -521,7 +525,7 @@
        i++) {
     k_cache_in[i]->set_data(
         k_cache_in[i]->mutable_data<uint8_t>() + k_cache_stride);
-    uint8_t* ptr_in = k_cache_in[i]->mutable_data<uint8_t>() - pos_diff;
+    uint8_t* ptr_in = k_cache_in[i]->mutable_data<uint8_t>() - pos;
     for (int j = 0; j < head_dim_; ++j) {
       memcpy(
         ptr_in + j * prefill_cache_len_,

It seems still not quite right. I need to learn your logic a bit more.

Yes, could you update the latest change? thank you.

haowhsu-quic · 2025-03-26T04:57:58Z

examples/qualcomm/oss_scripts/llama/runner/io_manager.cpp

+  std::vector<std::unique_ptr<executorch::aten::TensorImpl>>& k_cache_in =
+      k_cache_in_[prefill_forward_name_];
+
+  size_t copied_size = pos_diff * sizeof(uint8_t);


copied_size should be pos * sizeof(uint8_t);

haowhsu-quic · 2025-03-26T05:50:16Z

examples/qualcomm/oss_scripts/llama/runner/io_manager.cpp

+    k_cache_in[i]->set_data(
+        k_cache_in[i]->mutable_data<uint8_t>() + k_cache_stride);
+    uint8_t* ptr_in = k_cache_in[i]->mutable_data<uint8_t>() - pos_diff;
+    for (int j = 0; j < head_dim_; ++j) {


Sorry, probably need to change here a bit: for (int j = 0; j <= head_dim_; ++j) {
I forgot we preserve extra space to prevent shifting beyond boundary.

cccclai · 2025-03-26T06:31:14Z

updated with the suggestion, still incorrect. I'm trying to dump the kv cache value to confirm

   int64_t pos_diff = pos - last_pos_;
   std::vector<std::unique_ptr<executorch::aten::TensorImpl>>& v_cache_in =
       v_cache_in_[prefill_forward_name_];
-  for (int i = 0, v_cache_stride = head_dim_ * pos_diff; i < v_cache_in.size();
-       i++) {
-    v_cache_in[i]->set_data(
-        v_cache_in[i]->mutable_data<uint8_t>() + v_cache_stride);
+
+  // update v_cache
+  std::vector<std::unique_ptr<executorch::aten::TensorImpl>>& v_cache_out =
+      v_cache_out_[prefill_forward_name_];
+
+  for (int i = 0, v_cache_stride = head_dim_ * pos_diff; i < v_cache_in.size(); ++i) {
+    v_cache_in[i]->set_data(v_cache_in[i]->mutable_data<uint8_t>() + v_cache_stride);
+    v_cache_out[i]->set_data(v_cache_out[i]->mutable_data<uint8_t>() + v_cache_stride);
   }
 
   // update k_cache
   std::vector<std::unique_ptr<executorch::aten::TensorImpl>>& k_cache_in =
       k_cache_in_[prefill_forward_name_];
 
-  size_t copied_size = pos_diff * sizeof(uint8_t);
+  size_t copied_size = pos * sizeof(uint8_t);
 
-  for (int i = 0, k_cache_stride = pos_diff * sizeof(uint8_t); i < k_cache_in_.size();
-       i++) {
-    k_cache_in[i]->set_data(
-        k_cache_in[i]->mutable_data<uint8_t>() + k_cache_stride);
-    uint8_t* ptr_in = k_cache_in[i]->mutable_data<uint8_t>() - pos_diff;
-    for (int j = 0; j < head_dim_; ++j) {
-      memcpy(
-        ptr_in + j * prefill_cache_len_,
-        ptr_in + j * kv_cache_len_,
-        copied_size);
+  for (int i = 0; i < k_cache_in.size(); i++) {
+    uint8_t* ptr_in = k_cache_in[i]->mutable_data<uint8_t>();

+    // Move pointer forward by pos_diff
+    k_cache_in[i]->set_data(ptr_in + pos_diff);
+    // Copy data from kv_cache region to prefill_cache region for each head
+    for (int j = 0; j <= head_dim_; ++j) {
+      uint8_t* dst = ptr_in - pos + j * prefill_cache_len_;
+      const uint8_t* src = ptr_in - pos + j * kv_cache_len_;
+      memcpy(dst, src, copied_size);
     }
   }

cccclai · 2025-03-26T06:32:06Z

I'll ask @kushrast to update the PR tomorrow. Probably I don't have the permission to update it...

cccclai · 2025-03-26T06:39:37Z

To double check, is the k_cache shape (head_dim + 1, seq_len - 1, num_layers)?

haowhsu-quic · 2025-03-26T06:50:56Z

I think last for loop should be:

for (int i = 0; i < k_cache_in.size(); i++) {
  // should update first to current absolute position
  k_cache_in[i]->set_data(ptr_in + pos_diff);
  uint8_t* ptr_in = k_cache_in[i]->mutable_data<uint8_t>();
  for (int j = 0; j <= head_dim_; ++j) {
    uint8_t* dst = ptr_in - pos + j * prefill_cache_len_;
    const uint8_t* src = ptr_in - pos + j * kv_cache_len_;
    memcpy(dst, src, copied_size);
  }
}

Shape of k_cache_in (single head) is (head_dim_+1, prefill_ar_len_) for prefill mode, (head_dim_+1, kv_cache_len_) for decode mode. But both of them are actually mapped to the same chunk of memory, that's why we need deep copy here. Or the data fetching would be incorrect.

haowhsu-quic · 2025-03-26T06:53:32Z

Is it possible we can have the .pte file to help resolve issue?

cccclai · 2025-03-26T15:08:12Z

Is it possible we can have the .pte file to help resolve issue?

That might be tricky, because it’s an internal model. The easiest way is online debug session

cccclai · 2025-03-26T16:33:13Z

Actually let me try export the stories model and verify accuracy with that. I think it will have similar issue. In the meanwhile, this is the latest

void ShiftPointerIoMgr::update_kv_to_prefill_io(
  int64_t pos,
  std::vector<std::vector<executorch::aten::Tensor>>& output_tensors) {
  // update v_cache
  assert(pos <= 512);
  int64_t pos_diff = pos - last_pos_;
  std::vector<std::unique_ptr<executorch::aten::TensorImpl>>& v_cache_in =
      v_cache_in_[prefill_forward_name_];

  // update v_cache
  std::vector<std::unique_ptr<executorch::aten::TensorImpl>>& v_cache_out =
      v_cache_out_[prefill_forward_name_];

  for (int i = 0, v_cache_stride = head_dim_ * pos_diff; i < v_cache_in.size(); ++i) {
    v_cache_in[i]->set_data(v_cache_in[i]->mutable_data<uint8_t>() + v_cache_stride);
    v_cache_out[i]->set_data(v_cache_out[i]->mutable_data<uint8_t>() + v_cache_stride);
  }

  // update k_cache
  std::vector<std::unique_ptr<executorch::aten::TensorImpl>>& k_cache_in =
      k_cache_in_[prefill_forward_name_];

  size_t copied_size = pos * sizeof(uint8_t);

  for (int i = 0; i < k_cache_in.size(); i++) {
    k_cache_in[i]->set_data(ptr_in + pos_diff);
    uint8_t* ptr_in = k_cache_in[i]->mutable_data<uint8_t>();
    // Copy data from kv_cache region to prefill_cache region for each head
    for (int j = 0; j <= head_dim_; ++j) {
      uint8_t* dst = ptr_in - pos + j * prefill_cache_len_;
      const uint8_t* src = ptr_in - pos + j * kv_cache_len_;
      memcpy(dst, src, copied_size);
    }
  }

  // Setting attention mask from context_len - prefill_ar_len - i to context_len
  IO* ptr = static_cast<IO*>(data_ptr_.get());
  for (int i = prefill_ar_len_; i < pos; i++) {
    for (int j = 0; j < prefill_ar_len_; j++) {
      ptr->prefill_attention_mask[j * context_len_ + context_len_ - prefill_ar_len_ - i] = 65535;
    }
  }
}

cccclai · 2025-03-26T18:07:42Z

Actually you can repro the accuracy with this command line with stories model

./qnn_llama3_2_runner --model_path hybrid_stories_qnn.pte    --tokenizer_path tokenizer.bin  --eval_mode 1 --prompt "Once" --kv_updater "ShiftPointer" --logits_scale 0.1 --output_path output.txt --num_iters 2

In the second iteration, It should include the previous prompt + generate output + second prompt

cccclai · 2025-03-26T22:11:09Z

I'm getting the correct output from #9662, wonder if you can take a look and see if it's correct

kushrast requested a review from cccclai as a code owner March 20, 2025 19:23

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 20, 2025

facebook-github-bot added the fb-exported label Mar 20, 2025

cccclai requested review from chunit-quic, haowhsu-quic, shewu-quic and winskuo-quic March 20, 2025 19:47

kushrast force-pushed the export-D71567692 branch from aff0993 to 08b0f80 Compare March 20, 2025 20:58

kushrast pushed a commit to kushrast/executorch that referenced this pull request Mar 20, 2025

Adding KV to Prefill IO (pytorch#9466)

08b0f80

Summary: Pull Request resolved: pytorch#9466 Differential Revision: D71567692

kushrast changed the title ~~Adding KV to Prefill IO~~ [Proposal] Support Multiple Prefill + Decode in a loop Mar 20, 2025

kushrast pushed a commit to kushrast/executorch that referenced this pull request Mar 20, 2025

Adding KV to Prefill IO (pytorch#9466)

03bd85a

Summary: Pull Request resolved: pytorch#9466 Differential Revision: D71567692

kushrast force-pushed the export-D71567692 branch from 08b0f80 to 03bd85a Compare March 20, 2025 22:47

haowhsu-quic reviewed Mar 21, 2025

View reviewed changes

examples/qualcomm/oss_scripts/llama/runner/io_manager.cpp Outdated Show resolved Hide resolved

haowhsu-quic reviewed Mar 21, 2025

View reviewed changes

examples/qualcomm/oss_scripts/llama/runner/io_manager.cpp Show resolved Hide resolved

kushrast force-pushed the export-D71567692 branch from 03bd85a to 274d55e Compare March 24, 2025 20:32

kushrast pushed a commit to kushrast/executorch that referenced this pull request Mar 24, 2025

Adding KV to Prefill IO (pytorch#9466)

274d55e

Summary: Pull Request resolved: pytorch#9466 Differential Revision: D71567692

kushrast pushed a commit to kushrast/executorch that referenced this pull request Mar 24, 2025

Adding KV to Prefill IO (pytorch#9466)

5cd8a6d

Summary: Pull Request resolved: pytorch#9466 Differential Revision: D71567692

kushrast force-pushed the export-D71567692 branch from 274d55e to 5cd8a6d Compare March 24, 2025 23:32

haowhsu-quic reviewed Mar 25, 2025

View reviewed changes

examples/qualcomm/oss_scripts/llama/runner/io_manager.cpp Show resolved Hide resolved

haowhsu-quic reviewed Mar 25, 2025

View reviewed changes

examples/qualcomm/oss_scripts/llama/runner/io_manager.cpp Outdated Show resolved Hide resolved

Adding KV to Prefill IO (pytorch#9466)

785a121

Summary: Pull Request resolved: pytorch#9466 Differential Revision: D71567692

kushrast force-pushed the export-D71567692 branch from 5cd8a6d to 785a121 Compare March 26, 2025 02:02

haowhsu-quic reviewed Mar 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] Support Multiple Prefill + Decode in a loop #9466

[Proposal] Support Multiple Prefill + Decode in a loop #9466

kushrast commented Mar 20, 2025 •

edited

Loading

pytorch-bot bot commented Mar 20, 2025 •

edited

Loading

facebook-github-bot commented Mar 20, 2025

github-actions bot commented Mar 20, 2025

facebook-github-bot commented Mar 20, 2025

sxu commented Mar 20, 2025

sxu commented Mar 20, 2025

facebook-github-bot commented Mar 20, 2025

kushrast commented Mar 20, 2025 •

edited

Loading

haowhsu-quic Mar 21, 2025

facebook-github-bot commented Mar 24, 2025

facebook-github-bot commented Mar 24, 2025

facebook-github-bot commented Mar 26, 2025

kushrast commented Mar 26, 2025

haowhsu-quic Mar 26, 2025

haowhsu-quic Mar 26, 2025

cccclai commented Mar 26, 2025

haowhsu-quic commented Mar 26, 2025

haowhsu-quic Mar 26, 2025

haowhsu-quic Mar 26, 2025

cccclai commented Mar 26, 2025

cccclai commented Mar 26, 2025

cccclai commented Mar 26, 2025

haowhsu-quic commented Mar 26, 2025

haowhsu-quic commented Mar 26, 2025

cccclai commented Mar 26, 2025

cccclai commented Mar 26, 2025

cccclai commented Mar 26, 2025

cccclai commented Mar 26, 2025

[Proposal] Support Multiple Prefill + Decode in a loop #9466

Are you sure you want to change the base?

[Proposal] Support Multiple Prefill + Decode in a loop #9466

Conversation

kushrast commented Mar 20, 2025 • edited Loading

pytorch-bot bot commented Mar 20, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/9466

❌ 2 New Failures

facebook-github-bot commented Mar 20, 2025

github-actions bot commented Mar 20, 2025

This PR needs a release notes: label

facebook-github-bot commented Mar 20, 2025

sxu commented Mar 20, 2025

sxu commented Mar 20, 2025

facebook-github-bot commented Mar 20, 2025

kushrast commented Mar 20, 2025 • edited Loading

haowhsu-quic Mar 21, 2025

Choose a reason for hiding this comment

facebook-github-bot commented Mar 24, 2025

facebook-github-bot commented Mar 24, 2025

facebook-github-bot commented Mar 26, 2025

kushrast commented Mar 26, 2025

haowhsu-quic Mar 26, 2025

Choose a reason for hiding this comment

haowhsu-quic Mar 26, 2025

Choose a reason for hiding this comment

cccclai commented Mar 26, 2025

haowhsu-quic commented Mar 26, 2025

haowhsu-quic Mar 26, 2025

Choose a reason for hiding this comment

haowhsu-quic Mar 26, 2025

Choose a reason for hiding this comment

cccclai commented Mar 26, 2025

cccclai commented Mar 26, 2025

cccclai commented Mar 26, 2025

haowhsu-quic commented Mar 26, 2025

haowhsu-quic commented Mar 26, 2025

cccclai commented Mar 26, 2025

cccclai commented Mar 26, 2025

cccclai commented Mar 26, 2025

cccclai commented Mar 26, 2025

kushrast commented Mar 20, 2025 •

edited

Loading

pytorch-bot bot commented Mar 20, 2025 •

edited

Loading

This PR needs a `release notes:` label

kushrast commented Mar 20, 2025 •

edited

Loading