How to implement streaming response from LLM in Azure Bot using NestJS-BotFramework?

Question

How to implement streaming response from LLM in Azure Bot using NestJS-BotFramework?

Harinath J 205

Hi everyone,

We’ve deployed a bot using Azure Bot Service and are using the nestjs-botframework package to build it. Our backend LLM service supports streaming responses (similar to OpenAI’s stream=True, sending token-by-token over a single request).

Our current setup:

Bot runtime: NestJS + nestjs-botframework

Hosting: Azure Bot Service

Client integration: Using Direct Line (via WebChat or custom frontend)

LLM backend: Custom LLM API that supports chunked streaming or Server-Sent Events (SSE)

Problem: While the LLM backend streams properly, the bot built using NestJS and BotFramework does not forward these token streams to the client in real time. Instead, the response gets sent only once the full message is received from the LLM API.

We want to enable real-time token streaming to the user via Direct Line (e.g., see the message build up word-by-word in the chat UI).

Question:

How can we support streaming responses in a NestJS + BotFramework bot using Direct Line?

Do we need to send multiple Activity messages with partial content from the bot?

Is there a known pattern to integrate streaming from an upstream LLM API to downstream BotFramework responses?

Any suggestions or code examples would be very helpful. If someone has implemented LLM streaming with Direct Line and Bot Framework SDK in Node/NestJS, I’d love to hear how you structured it!

Thanks!Hi everyone,

We’ve deployed a bot using Azure Bot Service and are using the nestjs-botframework package to build it. Our backend LLM service supports streaming responses (similar to OpenAI’s stream=True, sending token-by-token over a single request).

Our current setup:

Bot runtime: NestJS + nestjs-botframework

Hosting: Azure Bot Service

Client integration: Using Direct Line (via WebChat or custom frontend)

LLM backend: Custom LLM API that supports chunked streaming or Server-Sent Events (SSE)

Problem:
While the LLM backend streams properly, the bot built using NestJS and BotFramework does not forward these token streams to the client in real time. Instead, the response gets sent only once the full message is received from the LLM API.

We want to enable real-time token streaming to the user via Direct Line (e.g., see the message build up word-by-word in the chat UI).

Question:

How can we support streaming responses in a NestJS + BotFramework bot using Direct Line?

Do we need to send multiple Activity messages with partial content from the bot?

Is there a known pattern to integrate streaming from an upstream LLM API to downstream BotFramework responses?

Any suggestions or code examples would be very helpful. If someone has implemented LLM streaming with Direct Line and Bot Framework SDK in Node/NestJS, I’d love to hear how you structured it!

Thanks!

Harinath J 205 Reputation points

2025-04-11T05:26:07.1166667+00:00

Thank you @Prashanth Veeragoni
Prashanth Veeragoni 4,030 Reputation points Microsoft External Staff

2025-04-11T05:52:34.7433333+00:00

Hi Harinath J,

If you feel that the above suggestion was helpful, I converted that comment to actual solution. please do upvote and accept it, so that it will be helpful to other community members.

Thank you!
Prashanth Veeragoni 4,030 Reputation points Microsoft External Staff

2025-04-14T05:08:45.07+00:00

Hi Harinath J,

Just wanted to follow up on the suggestion I shared earlier. I hope it helped resolve your issue. If it worked for you, please consider accepting it as the solution and giving it an upvote—it helps others in the community find helpful answers more easily.

Thank you!

Accepted answer

2 additional answers

Your answer

Harinath J 205 Reputation points

2025-04-11T05:26:07.1166667+00:00

Thank you @Prashanth Veeragoni
Prashanth Veeragoni 4,030 Reputation points Microsoft External Staff

2025-04-11T05:52:34.7433333+00:00

Hi Harinath J,

If you feel that the above suggestion was helpful, I converted that comment to actual solution. please do upvote and accept it, so that it will be helpful to other community members.

Thank you!
Prashanth Veeragoni 4,030 Reputation points Microsoft External Staff

2025-04-14T05:08:45.07+00:00

Hi Harinath J,

Just wanted to follow up on the suggestion I shared earlier. I hope it helped resolve your issue. If it worked for you, please consider accepting it as the solution and giving it an upvote—it helps others in the community find helpful answers more easily.

Thank you!

Answer 1

Hi Harinath J,

I understand that you want to Enable real-time streaming responses (token-by-token or chunk-by-chunk) from a backend LLM API to a user through Bot Framework + Direct Line (e.g., WebChat), built with NestJS.

Actually, this is a common challenge when trying to bridge LLM streaming (like OpenAI’s stream=True or SSE-based APIs) with Bot Framework + Direct Line, especially when using NestJS with the nestjs-botframework package.

Here I am directly pointing to your three queries:

1.How can we support streaming responses in a NestJS + BotFramework bot using Direct Line?

To support streaming:

· Consume the LLM stream (e.g., OpenAI with stream=True or custom SSE) inside your NestJS bot handler.

· Forward that stream token-by-token (or in small batches) to the client using multiple Activity messages via context.sendActivity().

NestJS Integration Strategy:

// Inside your NestJS bot service (extending ActivityHandler)
this.onMessage(async (context) => {
  const userMessage = context.activity.text;
  await this.streamFromLLMAndSendToClient(context, userMessage);
});

And in your streamFromLLMAndSendToClient():

· Call your LLM backend (with streaming)

· Parse tokens from the stream

· For each token or group of tokens, use context.sendActivity({ type: 'message', text: partialText })

2.Do we need to send multiple Activity messages with partial content from the bot?

YES. That’s the core method of achieving real-time updates in Bot Framework.

Why? Direct Line/WebChat doesn't support updating an ongoing message bubble in a clean way out of the box. The only reliable way to simulate typing/streaming is:

· Sending multiple Activity messages (e.g., one per token or chunk).

· These are rendered in WebChat as new lines or bubbles (can customize that).

Optional UX Enhancement: Use typing indicator:

await context.sendActivities([
  { type: 'typing' },
  { type: 'delay', value: 500 }
]);

3.Is there a known pattern to integrate streaming from an upstream LLM API to downstream BotFramework responses?

Yes — here’s the most common and effective pattern:

[Pattern]: Streaming from LLM -> Bot -> Direct Line

User's image

Code for this Pattern:

async streamFromLLMAndSendToClient(context: TurnContext, prompt: string) {
  const response = await fetch('https://llm-api/stream', { method: 'POST', body: JSON.stringify({ prompt }) });
  const reader = response.body.getReader();
  const decoder = new TextDecoder();
  let batch = '';
  let tokenCount = 0;
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    const chunk = decoder.decode(value, { stream: true });
    batch += chunk;
    tokenCount++;
    // Send partial updates every 5 tokens
    if (tokenCount >= 5) {
      await context.sendActivity({ type: 'message', text: batch.trim() });
      batch = '';
      tokenCount = 0;
    }
  }
  if (batch.trim()) {
    await context.sendActivity({ type: 'message', text: batch.trim() });
  }
}

Hope this helps, do let me know if you have further queries.

Thank you!

Answer 2

Deleted

This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Comments have been turned off. Learn more

Answer 3

Abdul Subhan 10

Screenshot 2025-04-16 131832

Harinath J 205 Reputation points

2025-04-16T17:53:07.79+00:00

@Prashanth Veeragoni thank you for the solutions.

We implemented that logic when we tried it with Bot Framework. The streaming should work like Abdul Subhan's post

Share via

How to implement streaming response from LLM in Azure Bot using NestJS-BotFramework?

2 additional answers

Your answer