To configure an AudioConfig in Python for evaluating pronunciation of long audio files (~10 minutes) in a Django application without saving the audio to local or server storage, you can use Azure's PushAudioInputStream. Instead of writing the uploaded file to disk, you can read the in-memory uploaded audio file (from request.FILES) in chunks and stream it directly into the PushAudioInputStream, which acts as a live audio source for Azure's Speech SDK. You then create an AudioConfig using AudioConfig.from_stream_input(stream) and pass it to the SpeechRecognizer. This approach keeps everything in memory without requiring temporary files or external storage, enabling real-time or near-real-time processing of large audio files entirely within your application’s memory space. This method is fully supported by the Azure Speech SDK and works well for long-form audio as long as sufficient memory is available.
I hope this information helps.