Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Kinect for Windows 1.5, 1.6, 1.7, 1.8
Overview
This page describes basic speech user tasks, such as identifying a speech recognition engine, creating a grammar, and using a confidence level to recognize user commands.
Important
By default, the AdaptationOn flag in the speech engine is set to ON, which means that the speech engine is actively adapting its speech models to the current speaker(s). This can cause problems over time in noisy environments or where there are a great number of speakers (in a kiosk environment, for example). Therefore we recommend that the AdaptationOn flag be set to OFF in such applications. For more details, see here.
In addition, as the speech engine is run continuously, its RAM requirement grows. We also recommend, for long recognition sessions, that the SpeechRecognitionEngine be recycled (destroyed and recreated) periodically, say every 2 minutes, based on your resource constraints.
Code It
Get the most suitable speech recognizer (acoustic model)
private static RecognizerInfo GetKinectRecognizer() { foreach (RecognizerInfo recognizer in SpeechRecognitionEngine.InstalledRecognizers()) { string value; recognizer.AdditionalInfo.TryGetValue("Kinect", out value); if ("True".Equals(value, StringComparison.OrdinalIgnoreCase) && "en-US".Equals(recognizer.Culture.Name, StringComparison.OrdinalIgnoreCase)) { return recognizer; } } return null; }
Create a speech recognition engine
private SpeechRecognitionEngine speechEngine; RecognizerInfo ri = GetKinectRecognizer(); if (null != ri) { this.speechEngine = new SpeechRecognitionEngine(ri.Id); }
The speech recognition engine uses audio data from the Kinect sensor.
Create and load a grammar
/**************************************************************** * Use this code to create grammar programmatically rather than from * a grammar file. * var directions = new Choices(); * directions.Add(new SemanticResultValue("forward", "FORWARD")); * directions.Add(new SemanticResultValue("forwards", "FORWARD")); * directions.Add(new SemanticResultValue("straight", "FORWARD")); * directions.Add(new SemanticResultValue("backward", "BACKWARD")); * directions.Add(new SemanticResultValue("backwards", "BACKWARD")); * directions.Add(new SemanticResultValue("back", "BACKWARD")); * directions.Add(new SemanticResultValue("turn left", "LEFT")); * directions.Add(new SemanticResultValue("turn right", "RIGHT")); * var gb = new GrammarBuilder { Culture = ri.Culture }; * gb.Append(directions); * var g = new Grammar(gb); ****************************************************************/ // Create a grammar from grammar definition XML file. using (var memoryStream = new MemoryStream(Encoding.ASCII.GetBytes(Properties.Resources.SpeechGrammar))) { var g = new Grammar(memoryStream); speechEngine.LoadGrammar(g); }
There is a known issue regarding support for standard numbers and dates, which may require changes to a grammar built in a Beta version.
Initialize the speech recognition engine
speechEngine.SpeechRecognized += SpeechRecognized; speechEngine.SpeechRecognitionRejected += SpeechRejected; speechEngine.SetInputToAudioStream( sensor.AudioSource.Start(), new SpeechAudioFormatInfo(EncodingFormat.Pcm, 16000, 16, 1, 32000, 2, null)); speechEngine.RecognizeAsync(RecognizeMode.Multiple);
Add a SpeechRecognized event handler
private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e) { // Speech utterance confidence below which we treat speech as if it hadn't been heard const double ConfidenceThreshold = 0.3; if (e.Result.Confidence >= ConfidenceThreshold) { ... } }
Shut down the speech recognition engine
if (null != this.sensor) { this.sensor.AudioSource.Stop(); this.sensor.Stop(); this.sensor = null; } if (null != this.speechEngine) { this.speechEngine.SpeechRecognized -= SpeechRecognized; this.speechEngine.SpeechRecognitionRejected -= SpeechRejected; this.speechEngine.RecognizeAsyncStop(); }
For additional examples, see the SpeechBasics samples (Speech Basics-WPF C# Sample and Speech Basics-D2D C++ Sample,Speech Basics-WPF-VB Sample), which show how to use the Kinect sensor’s microphone array with the Microsoft.Speech API to recognize voice commands. Also see the AudioCaptureRaw-Console C++ Sample, which demonstrates how to capture an audio stream from the Kinect sensor’s microphone array.