Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pinsrw latency overestimated(?) because dep chain competes for the same port #23

Closed
pcordes opened this issue Apr 22, 2021 · 0 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@pcordes
Copy link

pcordes commented Apr 22, 2021

https://uops.info/html-lat/SKX/PINSRW_XMM_R32_I8-Measurements.html#lat1-%3E1 experiments only use pinsrw xmm, r32, imm alone, or pinsrw with an XMM->XMM dep chain created by shufpd or pshufd.

But pinsrw itself is 2 uops for port 5 on Intel. Presumably a movd-equivalent uop to feed a 2-input shuffle. One would expect that the GP->XMM (movd) uop could run early if there was a free port, leaving the critical path latency from 1->1 being only 1 cycle.

But resource conflicts with the dep chain prevent this from being demonstrated. Perhaps pand xmm0,xmm0 would be a better choice for at least one of the experiments, or orps xmm0, xmm0. (I guess shufpd and pshufd are looking for bypass latency between integer and FP shuffles?)

@andreas-abel andreas-abel added the bug Something isn't working label Apr 23, 2021
@andreas-abel andreas-abel self-assigned this Apr 23, 2021
andreas-abel added a commit that referenced this issue Apr 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants