pinsrw latency overestimated(?) because dep chain competes for the same port #23

pcordes · 2021-04-22T04:43:19Z

https://uops.info/html-lat/SKX/PINSRW_XMM_R32_I8-Measurements.html#lat1-%3E1 experiments only use pinsrw xmm, r32, imm alone, or pinsrw with an XMM->XMM dep chain created by shufpd or pshufd.

But pinsrw itself is 2 uops for port 5 on Intel. Presumably a movd-equivalent uop to feed a 2-input shuffle. One would expect that the GP->XMM (movd) uop could run early if there was a free port, leaving the critical path latency from 1->1 being only 1 cycle.

But resource conflicts with the dep chain prevent this from being demonstrated. Perhaps pand xmm0,xmm0 would be a better choice for at least one of the experiments, or orps xmm0, xmm0. (I guess shufpd and pshufd are looking for bypass latency between integer and FP shuffles?)

The text was updated successfully, but these errors were encountered:

andreas-abel added the bug Something isn't working label Apr 23, 2021

andreas-abel self-assigned this Apr 23, 2021

andreas-abel added a commit that referenced this issue Apr 26, 2021

Issue #23

77e4985

andreas-abel closed this as completed Jun 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pinsrw latency overestimated(?) because dep chain competes for the same port #23

pinsrw latency overestimated(?) because dep chain competes for the same port #23

pcordes commented Apr 22, 2021

pinsrw latency overestimated(?) because dep chain competes for the same port #23

pinsrw latency overestimated(?) because dep chain competes for the same port #23

Comments

pcordes commented Apr 22, 2021