-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SynOps Calculation #12
Comments
Hi, thank you for reaching out and for your interest in our work. I'd like to clarify that in the latest version of our paper, which you can find at this link, we no longer use the SynOps metric. We've decided that it wasn't the most appropriate measure for our purposes. Instead, we've switched to using the theoretical power consumption and have provided detailed steps for its calculation in the paper. Please refer to the linked document for more in-depth information. |
Hi @ridgerchu thank you for your previous answer. I have taken a look to the paper and I am able to replicate almost everything, but the energy-consumption estimate, which is what interests me the most. I would appreciate if you could explain how you can get the spiking firing rate from the code provided or already trained model. Furthermore, I do not manage to infer the attention values reported in Table 1. Could you explain why for the second row, MACs are "2T^2d vs 6Td" and why for the first row MACs are 3d^2T, based on the general equations for the SRWKV and SRFNN blocks as well as that for the self-attention mechanism?. I would eally appreciate your help. |
Hi, To measure the spiking rate, you can utilize the hook function in PyTorch. This function allows you to record the outputs of network layers, thereby enabling you to calculate the output firing rate effectively. Regarding the MACs: the term '3Td^2' refers to the computational consumption for the matrices Q, K, and V in a neural network. Each of these matrices requires 'Td^2' operations for computation. Specifically, in the context of the attention mechanism, the product of matrices Q and K involves a matrix multiplication operation, which results in a computational cost proportional to the square of T (T^2). This is due to the matrix multiplication dynamics in the attention process. I hope this explanation clarifies your queries. Feel free to reach out if you have more questions! |
Hi @ridgerchu thanks a lot for the help with the spiking rate calculation! But I am still struggling with defining the computational complexity of the model to derive the energy consumption from it. I will try to expose my doubts properly:
Sorry for such a long question, as I understand that answering it means explaining step by step all the calculations involved in that section of the paper, but maybe it is also helpful for you to include in supplementary materials as a clarification to reviewers. |
Hi, thank you for reaching out with your questions! Self-Attention Mechanism Complexity: In reference to the self-attention mechanism's computational complexity and its relation to energy consumption, we align our methodology with the approach used in Spike-Driven Transformer. Specifically, we employ Eac and Emac calculations similar to theirs. In their Spike Neural Network (SNN) model, they utilize Eac, which we have also adopted. For the Td calculation, we followed the precedent set in models like AFT, RWKV, and SpikeGPT, where the combination of R/Q, K, and V variables involves element-wise products, leading to a complexity level of Td. Calculations of Mp, Mg, and Ms Matrices: The additional terms you're inquiring about originate from the Mp, Mg, and Ms matrices. For Mg, its computation is based on 4Td^2 (with d=512, T=3072), resulting in 3221225472. This number, when multiplied by R=0.15 and Eac = 0.9, yields a value of 434865438.72. For Mp and Ms, we initially considered their calculation to be Td^2 since their matrix size is four times smaller. This was our calculation at the time, and while we strove for accuracy, we acknowledge there might be areas that lack rigor. If you find any discrepancies or have concerns, please feel free to point them out! |
@ridgerchu I really appreciate the effort you have made to answer my questions. I think that everything is clear now. I will get back to you in case that anything else arises. Thanks! |
Hi @ridgerchu I was wondering if you new where I could find the wkv implementation class in pytorch, not the current cuda one. Thanks! |
Hi, you can find PyTorch-Style RWKV code here: link |
@ridgerchu really appreciate your support. Thanks! |
Hi @ridgerchu , first of all congratulations for your work, it is amazing. I would like to know how you exactly calculate the number for SynOps reported in your paper, as I do not get the same results. Look forward to hearing from you.
The text was updated successfully, but these errors were encountered: