Load Operations (Floating-Point SSE2 Intrinsics)

Article
2012-11-16

Microsoft Specific

For an explanation of the syntax used in code samples in this topic, see Floating-Point Intrinsics Using Streaming SIMD Extensions.

SSE2 intrinsics use the __m128, __m128i, and __m128d data types, which are not supported on Itanium Processor Family (IPF) processors. Any SSE2 intrinsics that use the __m64 data type are not supported on x64 processors.

The emmintrin.h header file contains the declarations for the SSE2 instructions intrinsics.

__m128d _mm_load_pd (double *p);

MOVAPD

Loads two double-precision, floating-point values. The address p must be 16-byte aligned.

r0 := p[0]
r1 := p[1]

__m128d _mm_load1_pd (double *p);

(MOVSD + shuffling)

Loads a single double-precision, floating-point value, copying to both elements. The address p does not need to be 16-byte aligned.

r0 := *p
r1 := *p

__m128d _mm_loadr_pd (double *p);

(MOVAPD + shuffling)

Loads two double-precision, floating-point values in reverse order. The address p must be 16-byte aligned.

r0 := p[1]
r1 := p[0]

__m128d _mm_loadu_pd (double *p);

MOVUPD

Loads two double-precision, floating-point values. The address p does not need to be 16-byte aligned.

r0 := p[0]
r1 := p[1]

__m128d _mm_load_sd (double *p);

MOVSD

Loads a double-precision, floating-point value. The upper double-precision, floating-point is set to zero. The address p does not need to be 16-byte aligned.

r0 := *p
r1 := 0.0

__m128d _mm_loadh_pd (__m128d a, double *p);

MOVHPD

Loads a double-precision, floating-point value as the upper double-precision, floating-point value of the result. The lower double-precision, floating-point value is passed through from a. The address p does not need to be 16-byte aligned.

r0 := a0
r1 := *p

__m128d _mm_loadl_pd (__m128d a, double *p);

MOVLPD

Loads a double-precision, floating-point value as the lower double-precision, floating-point value of the result. The upper double-precision, floating-point value is passed through from a. The address p does not need to be 16-byte aligned.

r0 := *p
r1 := a1

Share via

Load Operations (Floating-Point SSE2 Intrinsics)

Microsoft Specific

See Also

Reference

Additional resources