Hi Harinath J,
I understand that you want to Enable real-time streaming responses (token-by-token or chunk-by-chunk) from a backend LLM API to a user through Bot Framework + Direct Line (e.g., WebChat), built with NestJS.
Actually, this is a common challenge when trying to bridge LLM streaming (like OpenAI’s stream=True or SSE-based APIs) with Bot Framework + Direct Line, especially when using NestJS with the nestjs-botframework package.
Here I am directly pointing to your three queries:
1.How can we support streaming responses in a NestJS + BotFramework bot using Direct Line?
To support streaming:
· Consume the LLM stream (e.g., OpenAI with stream=True or custom SSE) inside your NestJS bot handler.
· Forward that stream token-by-token (or in small batches) to the client using multiple Activity messages via context.sendActivity().
NestJS Integration Strategy:
// Inside your NestJS bot service (extending ActivityHandler)
this.onMessage(async (context) => {
const userMessage = context.activity.text;
await this.streamFromLLMAndSendToClient(context, userMessage);
});
And in your streamFromLLMAndSendToClient():
· Call your LLM backend (with streaming)
· Parse tokens from the stream
· For each token or group of tokens, use context.sendActivity({ type: 'message', text: partialText })
2.Do we need to send multiple Activity messages with partial content from the bot?
YES. That’s the core method of achieving real-time updates in Bot Framework.
Why? Direct Line/WebChat doesn't support updating an ongoing message bubble in a clean way out of the box. The only reliable way to simulate typing/streaming is:
· Sending multiple Activity messages (e.g., one per token or chunk).
· These are rendered in WebChat as new lines or bubbles (can customize that).
Optional UX Enhancement: Use typing indicator:
await context.sendActivities([
{ type: 'typing' },
{ type: 'delay', value: 500 }
]);
3.Is there a known pattern to integrate streaming from an upstream LLM API to downstream BotFramework responses?
Yes — here’s the most common and effective pattern:
[Pattern]: Streaming from LLM -> Bot -> Direct Line
Code for this Pattern:
async streamFromLLMAndSendToClient(context: TurnContext, prompt: string) {
const response = await fetch('https://llm-api/stream', { method: 'POST', body: JSON.stringify({ prompt }) });
const reader = response.body.getReader();
const decoder = new TextDecoder();
let batch = '';
let tokenCount = 0;
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
batch += chunk;
tokenCount++;
// Send partial updates every 5 tokens
if (tokenCount >= 5) {
await context.sendActivity({ type: 'message', text: batch.trim() });
batch = '';
tokenCount = 0;
}
}
if (batch.trim()) {
await context.sendActivity({ type: 'message', text: batch.trim() });
}
}
Hope this helps, do let me know if you have further queries.
Thank you!