Preparing for Audio Capture

Article
2013-09-12

Kinect for Windows 1.5, 1.6, 1.7, 1.8

Prepare for Audio Capture

Once the Kinect has been identified as a Windows audio device, capture can begin. The Audio Capture Raw-Console sample does this as follows:

int wmain()
{
  ...
  CWASAPICapture *capturer =
  new (std::nothrow) CWASAPICapture(device);
    
  if ((NULL != capturer) && capturer->Initialize(TargetLatency))
  {
       ...
  }
  ...
}

To create the capture object, wmain passes the device's IMMDevice interface to the constructor. The contents of the if block implement the capture process and are discussed in the next topic.

wmain also passes a target latency value to CWASAPICapture::Initialize to initialize the capture object. The capture object polls for data; target latency defines the wait time and also influences the size of the buffer that is shared between the application and the audio client.

Initialize the Audio Engine for Capture

The Initialize method performs the following tasks to set up the audio engine before capture starts:

Create shutdown event.
Activate an IAudioClient interface for the Kinect.
Load the audio format.
Call InitializeAudioEngine.
Create resampler.

Create a Shutdown Event for Termination of Audio Capture

As the audio capture is being performed on a dedicated thread, and the audio capture process is controlled by the user from the console, the capture thread needs a signal when the user wishes to end the capture.

 _ShutdownEvent = CreateEventEx(NULL, NULL, 0, EVENT_MODIFY_STATE | SYNCHRONIZE);

Activate an IAudioClient Interface for the Kinect

The device's IMMDevice:Activate method is then called to create an IAudioClient interface for the device.

HRESULT hr = _Endpoint->Activate(__uuidof(IAudioClient), CLSCTX_INPROC_SERVER, NULL, reinterpret_cast<void **>(&_AudioClient));

Load the Format

The LoadFormat method is then called to retrieve the format information about the Kinect's audio stream.

if (!LoadFormat())
{
   ...
}

The LoadFormat method also sets up the output format (PCM) for the audio capture file:

bool CWASAPICapture::LoadFormat()
{
    HRESULT hr = _AudioClient->GetMixFormat(&_MixFormat);
        
    // Use PCM output format, regardless of mix format coming from Kinect audio device
    _OutFormat.cbSize = 0;
    _OutFormat.wFormatTag = WAVE_FORMAT_PCM;
    _OutFormat.nChannels = _MixFormat->nChannels;
    _OutFormat.nSamplesPerSec = _MixFormat->nSamplesPerSec;
    _OutFormat.wBitsPerSample = _MixFormat->wBitsPerSample;
    _OutFormat.nBlockAlign = _OutFormat.nChannels * _OutFormat.wBitsPerSample / 8;
    _OutFormat.nAvgBytesPerSec = _OutFormat.nSamplesPerSec * _OutFormat.nBlockAlign;

    _MixFrameSize = (_MixFormat->wBitsPerSample / 8) * _MixFormat->nChannels;
    return true;
}

Initialize the Audio Engine

The Initialize method calls InitializeAudioEngine to initialize the audio engine in timer-driven mode. The InitializeAudioEngine method also creates a new capture client:

bool CWASAPICapture::InitializeAudioEngine()
{
   HRESULT hr = _AudioClient->Initialize(AUDCLNT_SHAREMODE_SHARED, AUDCLNT_STREAMFLAGS_NOPERSIST, _EngineLatencyInMS*10000, 0, _MixFormat, NULL);

   hr = _AudioClient->GetService(IID_PPV_ARGS(&_CaptureClient));
}

The capture client enables a client to read the input data from a capture device.

Create the Resampler

The final task before capturing audio from the Kinect is to create the resampler and its required buffers. The resampler is used to convert the Kinect's multichannel audio output into single-channel PCM for the .wav file.

The Initialize method creates an input and an output buffer for the resampler, and then creates the resampler object:

_InputBufferSize = _EngineLatencyInMS * _MixFormat->nAvgBytesPerSec / 1000;
_OutputBufferSize = _EngineLatencyInMS * _OutFormat.nAvgBytesPerSec / 1000;

hr = CreateResamplerBuffer(_InputBufferSize, &_InputSample, &_InputBuffer);

hr = CreateResamplerBuffer(_OutputBufferSize, &_OutputSample, &_OutputBuffer);

// Create resampler object
hr = CreateResampler(_MixFormat, &_OutFormat, &_Resampler);

The CreateResampler method creates an IMFTransform object, sets its input and output media types, and returns the object.

HRESULT CreateResampler(const WAVEFORMATEX* pwfxIn, const WAVEFORMATEX* pwfxOut, IMFTransform **ppResampler)
{
   HRESULT hr = S_OK;
   IMFMediaType* pInputType = NULL;
   IMFMediaType* pOutputType = NULL;
   IMFTransform* pResampler = NULL;

   hr = CoCreateInstance(CLSID_CResamplerMediaObject, NULL, CLSCTX_INPROC_SERVER, IID_PPV_ARGS(&pResampler));

   hr = CreateMediaType(pwfxIn, &pInputType);
   hr = pResampler->SetInputType(0, pInputType, 0);
   hr = CreateMediaType(pwfxOut, &pOutputType);
   hr = pResampler->SetOutputType(0, pOutputType, 0);
   *ppResampler = pResampler;
   pResampler = NULL;
}

At this point, we have everything initialized for the audio capture, resampling, and storage to the .wav file.

Share via