Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Kinect for Windows 1.5, 1.6, 1.7, 1.8
The Kinect sensor includes a four-element, linear microphone array, shown here in purple.
The microphone array captures audio data at a 24-bit resolution, which allows accuracy across a wide dynamic range of voice data, from normal speech at three or more meters to a person yelling.
What Can You Do with Audio?
The sensor (microphone array) enables several user scenarios, such as:
- High-quality audio capture
- Focus on audio coming from a particular direction with beamforming
- Identification of the direction of audio sources
- Improved speech recognition as a result of audio capture and beamforming
- Raw voice data access
Implementing Audio in a Native (Unmanaged) Application
A native application can use one of two different approaches for implementing solutions for these audio scenarios:
- Use the KinectAudio DirectX Media Object (DMO), as shown in the AudioBasics-D2D C++ sample
- Use the Windows Audio Session API (WASAPI), as shown in the AudioCaptureRaw-Console C++ sample
Using the KinectAudio DirectX Media Object (DMO)
Windows Vista, Windows 7, and Windows 8 include a voice-capture digital signal processor (DSP) that supports microphone arrays. Developers typically access that DSP through a DMO, which is a standard COM object that can be incorporated into a DirectShow graph or a Microsoft Media Foundation topology. The SDK includes an extended version of the Windows microphone array DMO, referred to here as the KinectAudio DMO, to support the Kinect microphone array.
Access a DMO in C++ by calling NuiGetAudioSource or INuiSensor::NuiGetAudioSource.
Using the Windows Audio Session API (WASAPI)
Use the Windows Audio Session API (WASAPI) to capture the raw audio stream as shown in the AudioCaptureRaw-Console C++ sample in the Developer Toolkit.
For more information about WASAPI, see About WASAPI (Windows).
Implenting Audio in a Managed Application
Managed applications use a KinectAudioSource object to implement all of the scenarios listed above.
KinectAudioSource Wraps a DirectX Media Object
A Windows DirectX Media Object (DMO) is a common Windows component for a single-channel microphone. Using this as a building block, the KinectAudio class extends this component with the following additional capabilities:
- An additional microphone mode, which is customized to support the Kinect microphone array
- Beamforming and source localization
- Noise suppression and automatic echo cancellation using the 24-bit ADC built into the DMO
Access the audio stream in managed code using the KinectSensor.AudioSource property.