How to implement streaming response from LLM in Azure Bot using NestJS-BotFramework?

Harinath J 205 Reputation points
2025-04-04T04:13:40.06+00:00

Hi everyone,

We’ve deployed a bot using Azure Bot Service and are using the nestjs-botframework package to build it. Our backend LLM service supports streaming responses (similar to OpenAI’s stream=True, sending token-by-token over a single request).

Our current setup:

  • Bot runtime: NestJS + nestjs-botframework

Hosting: Azure Bot Service

Client integration: Using Direct Line (via WebChat or custom frontend)

LLM backend: Custom LLM API that supports chunked streaming or Server-Sent Events (SSE)

Problem: While the LLM backend streams properly, the bot built using NestJS and BotFramework does not forward these token streams to the client in real time. Instead, the response gets sent only once the full message is received from the LLM API.

We want to enable real-time token streaming to the user via Direct Line (e.g., see the message build up word-by-word in the chat UI).

Question:

How can we support streaming responses in a NestJS + BotFramework bot using Direct Line?

Do we need to send multiple Activity messages with partial content from the bot?

Is there a known pattern to integrate streaming from an upstream LLM API to downstream BotFramework responses?

Any suggestions or code examples would be very helpful. If someone has implemented LLM streaming with Direct Line and Bot Framework SDK in Node/NestJS, I’d love to hear how you structured it!

Thanks!Hi everyone,

We’ve deployed a bot using Azure Bot Service and are using the nestjs-botframework package to build it. Our backend LLM service supports streaming responses (similar to OpenAI’s stream=True, sending token-by-token over a single request).

Our current setup:

Bot runtime: NestJS + nestjs-botframework

Hosting: Azure Bot Service

Client integration: Using Direct Line (via WebChat or custom frontend)

LLM backend: Custom LLM API that supports chunked streaming or Server-Sent Events (SSE)

Problem:
While the LLM backend streams properly, the bot built using NestJS and BotFramework does not forward these token streams to the client in real time. Instead, the response gets sent only once the full message is received from the LLM API.

We want to enable real-time token streaming to the user via Direct Line (e.g., see the message build up word-by-word in the chat UI).

Question:

How can we support streaming responses in a NestJS + BotFramework bot using Direct Line?

Do we need to send multiple Activity messages with partial content from the bot?

Is there a known pattern to integrate streaming from an upstream LLM API to downstream BotFramework responses?

Any suggestions or code examples would be very helpful. If someone has implemented LLM streaming with Direct Line and Bot Framework SDK in Node/NestJS, I’d love to hear how you structured it!

Thanks!

Azure AI Bot Service
Azure AI Bot Service
An Azure service that provides an integrated environment for bot development.
925 questions
{count} votes

Accepted answer
  1. Prashanth Veeragoni 4,030 Reputation points Microsoft External Staff
    2025-04-10T18:57:37.0433333+00:00

    Hi Harinath J,

    I understand that you want to Enable real-time streaming responses (token-by-token or chunk-by-chunk) from a backend LLM API to a user through Bot Framework + Direct Line (e.g., WebChat), built with NestJS.

    Actually, this is a common challenge when trying to bridge LLM streaming (like OpenAI’s stream=True or SSE-based APIs) with Bot Framework + Direct Line, especially when using NestJS with the nestjs-botframework package.

    Here I am directly pointing to your three queries:

    1.How can we support streaming responses in a NestJS + BotFramework bot using Direct Line?

    To support streaming:

    ·       Consume the LLM stream (e.g., OpenAI with stream=True or custom SSE) inside your NestJS bot handler.

    ·       Forward that stream token-by-token (or in small batches) to the client using multiple Activity messages via context.sendActivity().

    NestJS Integration Strategy:

    // Inside your NestJS bot service (extending ActivityHandler)
    this.onMessage(async (context) => {
      const userMessage = context.activity.text;
      await this.streamFromLLMAndSendToClient(context, userMessage);
    });
    

    And in your streamFromLLMAndSendToClient():

    ·       Call your LLM backend (with streaming)

    ·       Parse tokens from the stream

    ·       For each token or group of tokens, use context.sendActivity({ type: 'message', text: partialText })

    2.Do we need to send multiple Activity messages with partial content from the bot?

    YES. That’s the core method of achieving real-time updates in Bot Framework.

    Why? Direct Line/WebChat doesn't support updating an ongoing message bubble in a clean way out of the box. The only reliable way to simulate typing/streaming is:

    ·       Sending multiple Activity messages (e.g., one per token or chunk).

    ·       These are rendered in WebChat as new lines or bubbles (can customize that).

    Optional UX Enhancement: Use typing indicator:

    await context.sendActivities([
      { type: 'typing' },
      { type: 'delay', value: 500 }
    ]);
    

    3.Is there a known pattern to integrate streaming from an upstream LLM API to downstream BotFramework responses?

    Yes — here’s the most common and effective pattern:

    [Pattern]: Streaming from LLM -> Bot -> Direct Line

    User's image

    Code for this Pattern:

    async streamFromLLMAndSendToClient(context: TurnContext, prompt: string) {
      const response = await fetch('https://llm-api/stream', { method: 'POST', body: JSON.stringify({ prompt }) });
      const reader = response.body.getReader();
      const decoder = new TextDecoder();
      let batch = '';
      let tokenCount = 0;
      while (true) {
        const { done, value } = await reader.read();
        if (done) break;
        const chunk = decoder.decode(value, { stream: true });
        batch += chunk;
        tokenCount++;
        // Send partial updates every 5 tokens
        if (tokenCount >= 5) {
          await context.sendActivity({ type: 'message', text: batch.trim() });
          batch = '';
          tokenCount = 0;
        }
      }
      if (batch.trim()) {
        await context.sendActivity({ type: 'message', text: batch.trim() });
      }
    }
    
    

    Hope this helps, do let me know if you have further queries.

    Thank you!

    1 person found this answer helpful.
    0 comments No comments

2 additional answers

Sort by: Most helpful
  1. Deleted

    This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.


    Comments have been turned off. Learn more

  2. Abdul Subhan 10 Reputation points
    2025-04-16T11:45:21.2866667+00:00

    Screenshot 2025-04-16 131832


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.