Request for Technical Support & Reference Architecture for Azure-Based Voice Call Automation Pipeline

Question

Request for Technical Support & Reference Architecture for Azure-Based Voice Call Automation Pipeline

Mian Omair 0

We are developing an end-to-end voice AI automation pipeline for a healthcare provider using Azure-native components. The solution will be agentic, with AI agents orchestrating healthcare careflows in the future—leveraging Azure Communication Services (ACS), Azure Cognitive Services (TTS/STT), and Azure AI Foundry (LLMs or orchestration agents).

The current architecture (attached below) includes:

Outbound call orchestration via CallAutomationClient
TTS via Azure Cognitive Services
STT for speech analytics (coming soon)
Event-driven logic over HTTP using Azure Functions + FastAPI
Call lifecycle handling (connect, media play, disconnect) via custom webhook callbacks

User's image

Our current architecture (attached below) supports outbound PSTN voice calls using CallAutomationClient, but each layer—from audio synthesis to call event handling—has been built manually due to a lack of an end-to-end reference architecture or orchestration template.

Challenges & Request:

We're building every component manually because no cohesive reference implementation exists. Azure Copilot and documentation provide only fragmented or outdated code samples, particularly around CallConnectionClient, webhook retries and media playback timing.

What We Are Looking For:

A reference architecture or sample repo for a similar solution that demonstrates:

Outbound voice calls with real-time event handling

SMS or chat messaging flows managed by agents or functions

TTS and STT pipelines within active call sessions

A modular event system powered by Azure agents or durable workflows

Note: We are not committed to our current build—if Microsoft has a more scalable or modern reference architecture, we are open to redesigning the system around that.

Laxman Reddy Revuri 4,000 Reputation points Microsoft External Staff

2025-04-25T21:31:03.2266667+00:00

Hi @Mian Omair
You're on the right track by using Azure Communication Services (ACS) for voice call automation and Azure Cognitive Services for text-to-speech (TTS) and speech-to-text (STT).Currently, Azure does not provide complete end-to-end reference architecture for your specific use case. However, it is possible to build a scalable solution by combining existing Azure components

Use CallAutomationClient from ACS to manage outbound PSTN calls and control call flows.

Implement event-driven logic using Azure Functions to handle call events (connect, media playback, disconnect) through custom webhooks.

Integrate Azure Cognitive Services to generate speech during calls (TTS) and later capture caller responses using speech-to-text (STT) services.

Use Azure Durable Functions or Logic Apps to manage modular, scalable workflows, including call handling, messaging flows, and future AI integrations.

While there is no full sample repository, you can refer to Azure Communication Services and Azure OpenAI GitHub samples for call automation, media playback, and AI integration examples.
references:
https://learn.microsoft.com/en-us/azure/communication-services/samples/call-automation-azure-openai-sample?pivots=programming-language-javascript
https://learn.microsoft.com/en-us/azure/communication-services/concepts/call-automation/call-automation

1 answer

Your answer

Laxman Reddy Revuri 4,000 Reputation points Microsoft External Staff

2025-04-25T21:31:03.2266667+00:00

Hi @Mian Omair
You're on the right track by using Azure Communication Services (ACS) for voice call automation and Azure Cognitive Services for text-to-speech (TTS) and speech-to-text (STT).Currently, Azure does not provide complete end-to-end reference architecture for your specific use case. However, it is possible to build a scalable solution by combining existing Azure components

Use CallAutomationClient from ACS to manage outbound PSTN calls and control call flows.

Implement event-driven logic using Azure Functions to handle call events (connect, media playback, disconnect) through custom webhooks.

Integrate Azure Cognitive Services to generate speech during calls (TTS) and later capture caller responses using speech-to-text (STT) services.

Use Azure Durable Functions or Logic Apps to manage modular, scalable workflows, including call handling, messaging flows, and future AI integrations.

While there is no full sample repository, you can refer to Azure Communication Services and Azure OpenAI GitHub samples for call automation, media playback, and AI integration examples.
references:
https://learn.microsoft.com/en-us/azure/communication-services/samples/call-automation-azure-openai-sample?pivots=programming-language-javascript
https://learn.microsoft.com/en-us/azure/communication-services/concepts/call-automation/call-automation

Answer 1

Hi Mian Omair,

Right now, Azure doesn’t have one complete example that shows everything you’re trying to do all in one place. But it’s still possible to build what you need by connecting the right services together.

For making outbound phone calls, the CallAutomationClient lets you control the call and listen for events like when someone answers or a message finish playing. You can handle these events using webhooks with something like Azure Functions or FastAPI, just like you're doing now.

To speak to the caller, you can use Text-to-Speech (TTS) to turn messages into audio. If you want to understand what the caller says, you’ll need to use ACS Media Streaming. This lets you stream the audio and send it to Speech-to-Text (STT) for live transcription.

To manage the full flow like playing a message, waiting for input, then responding you can use Azure Durable Functions or Logic Apps. These help you organize and control what happens step-by-step.

Even though there isn’t a single GitHub repo with all of this combined, Microsoft does have a helpful sample that shows how to connect ACS with Azure OpenAI for dynamic voice calls - Call Automation + Azure OpenAI Sample (JavaScript)

And here’s a good reference on how call automation works in general - ACS Call Automation Concepts

Hope it helps!

Please do not forget to click "Accept the answer” and Yes wherever the information provided helps you, this can be beneficial to other community members.

User's image

If you have any other questions or still running into more issues, let me know in the "comments" and I would be happy to help you.

Suresh Chikkam 1,330 Reputation points Microsoft External Staff

2025-05-02T05:22:34.2666667+00:00

Hi Mian Omair,

Following up to see if the above answer was helpful. If this answers your query, do click Accept Answer and Yes, if you have any further query do let us know.

Share via

Request for Technical Support & Reference Architecture for Azure-Based Voice Call Automation Pipeline

1 answer

Your answer