Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Microsoft Specific
Emits the Streaming SIMD Extensions 4 (SSE4) instruction dpps. This instruction computes the dot product of single precision floating point values.
__m128 _mm_dp_ps(
__m128 a,
__m128 b,
const int mask
);
Parameters
[in] a
A 128-bit parameter that contains four 32-bit floating point values.[in] b
A 128-bit parameter that contains four 32-bit floating point values.[in] mask
A constant mask that determines which components will be multiplied and where to place the results.
Result value
A 128 bit parameter that contains the 32-bit results of the dot products.
The result can be expressed with the following equations:
tmp0 := (mask4 == 1) ? (a0 * b0) : +0.0
tmp1 := (mask5 == 1) ? (a1 * b1) : +0.0
tmp2 := (mask6 == 1) ? (a2 * b2) : +0.0
tmp3 := (mask7 == 1) ? (a3 * b3) : +0.0
tmp4 := tmp0 + tmp1 + tmp2 + tmp3
r0 := (mask0 == 1) ? tmp4 : +0.0
r1 := (mask1 == 1) ? tmp4 : +0.0
r2 := (mask2 == 1) ? tmp4 : +0.0
r3 := (mask3 == 1) ? tmp4 : +0.0
Requirements
Intrinsic |
Architecture |
---|---|
_mm_dp_ps |
x86, x64 |
Header file <smmintrin.h>
Remarks
The immediate bits 4-7 of mask determine which of the corresponding source operand pairs are to be multiplied. Bits 0-3 determine whether the dot product result will be written. If a mask bit is 0, the corresponding product result or written value is +0.0.
r0-r3, a0-a3, and b0-b3are the sequentially ordered 32-bit components of return value r and parameters a and b, respectively. r0, a0, and b0 are the least significant 32 bits.
maski is bit i of parameter mask, where bit 0 is the least significant bit.
Before you use this intrinsic, software must ensure that the underlying processor supports the instruction.
Example
#include <stdio.h>
#include <smmintrin.h>
int main ()
{
__m128 a, b;
const int mask = 0x55;
a.m128_f32[0] = 1.5;
a.m128_f32[1] = 10.25;
a.m128_f32[2] = -11.0625;
a.m128_f32[3] = 81.0;
b.m128_f32[0] = -1.5;
b.m128_f32[1] = 3.125;
b.m128_f32[2] = -50.5;
b.m128_f32[3] = 100.0;
__m128 res = _mm_dp_ps(a, b, mask);
printf_s("Original a: %f\t%f\t%f\t%f\nOriginal b: %f\t%f\t%f\t%f\n",
a.m128_f32[0], a.m128_f32[1], a.m128_f32[2], a.m128_f32[3],
b.m128_f32[0], b.m128_f32[1], b.m128_f32[2], b.m128_f32[3]);
printf_s("Result res: %f\t%f\t%f\t%f\n",
res.m128_f32[0], res.m128_f32[1], res.m128_f32[2], res.m128_f32[3]);
return 0;
}
Original a: 1.500000 10.250000 -11.062500 81.000000 Original b: -1.500000 3.125000 -50.500000 100.000000 Result res: 556.406250 0.000000 556.406250 0.000000