Making token limit accessible for AI Foundary Assistant

Tejwant Kaur 0 Reputation points
2025-04-24T13:38:12.4766667+00:00

Hello Azure Support Team,

I’m currently developing a HIPAA-aligned healthcare platform in the, Australia east region, that can get transcripts of upto an hour or conversation,using the Azure OpenAI Assistant (via Threads API) for clinical report generation. Our application architecture depends heavily on:

  • Persistent thread history

Role-based system messages

Multi-turn assistant interactions

We’ve deployed our models with the maximum token limit (e.g. GPT-4o with 20K context), but when we create Assistants through the Azure Foundry UI, we’re unable to configure max_tokens. The resultant prompt is sometimes truncated. As a result, our ability to generate long-form structured outputs for medical purposes


🙏 Request:

Expose full assistant configuration options in Azure AI Studio or allow Assistant creation via SDK (like in OpenAI's platform):

max_tokens, temperature, tool_choice, metadata, instructions, etc.

Ensure Assistant inherits deployment model's token ceiling and allows override when used via Threads API.

Provide guidance or roadmap if token configuration and memory features are planned in upcoming updates.


Thank you for your support — this will enable us to build secure, compliant, and more powerful AI healthcare solutions within the Azure ecosystem.

Best regards,

This is critically limiting our ability to generate long-form structured outputs such as geriatric assessments or dementia care plans — which are central to products like ClinicalInsightsAI and ALMA (our personalized dementia assistant).


🙏 Request:

Expose full assistant configuration options in Azure AI Studio or allow Assistant creation via SDK (like in OpenAI's platform):

max_tokens, temperature, tool_choice, metadata, instructions, etc.

Ensure Assistant inherits deployment model's token ceiling and allows override when used via Threads API.

Provide guidance or roadmap if token configuration and memory features are planned in upcoming updates.


Thank you for your support — this will enable us to build secure, compliant, and more powerful AI healthcare solutions within the Azure ecosystem.

Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,396 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Manas Mohanty 3,210 Reputation points Microsoft External Staff
    2025-04-28T08:27:22.3633333+00:00

    Hi Tejwant Kaur

    Thank you for replying.

    Keeping under 300 words was an example to do it prompts.

    You can set 1000 words or more but below less than Token per minute quota set in deployment side to avoid the truncation in outputs.

    As mentioned, you can also adjust the output through max_prompt_tokens and max_completion_tokens in the thread run.

    You can submit the details on your requirements here on Azure OpenAI quota increase form.

    Thank you.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.