You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But pinsrw itself is 2 uops for port 5 on Intel. Presumably a movd-equivalent uop to feed a 2-input shuffle. One would expect that the GP->XMM (movd) uop could run early if there was a free port, leaving the critical path latency from 1->1 being only 1 cycle.
But resource conflicts with the dep chain prevent this from being demonstrated. Perhaps pand xmm0,xmm0 would be a better choice for at least one of the experiments, or orps xmm0, xmm0. (I guess shufpd and pshufd are looking for bypass latency between integer and FP shuffles?)
The text was updated successfully, but these errors were encountered:
https://uops.info/html-lat/SKX/PINSRW_XMM_R32_I8-Measurements.html#lat1-%3E1 experiments only use
pinsrw xmm, r32, imm
alone, orpinsrw
with an XMM->XMM dep chain created byshufpd
orpshufd
.But
pinsrw
itself is 2 uops for port 5 on Intel. Presumably amovd
-equivalent uop to feed a 2-input shuffle. One would expect that the GP->XMM (movd
) uop could run early if there was a free port, leaving the critical path latency from 1->1 being only 1 cycle.But resource conflicts with the dep chain prevent this from being demonstrated. Perhaps
pand xmm0,xmm0
would be a better choice for at least one of the experiments, ororps xmm0, xmm0
. (I guess shufpd and pshufd are looking for bypass latency between integer and FP shuffles?)The text was updated successfully, but these errors were encountered: