[Azure OpenAI - Structured Output] "Bad request: Invalid response_format provided." after a few minutes

Question

[Azure OpenAI - Structured Output] "Bad request: Invalid response_format provided." after a few minutes

Bogdan Pechounov 80

There is a "Bad request: Invalid response_format provided." after 567 seconds. What is strange is that it worked before, so the JSON schema is valid.


async def test_openai_2():
    config = await load_config()

    client = AzureOpenAI(
        azure_endpoint=config.azure_openai_endpoint_us,
        api_key=config.azure_openai_api_key_us.get_secret_value(),
        api_version="2025-03-01-preview",
    )

    start_time = time.time()
    try:
        response = client.beta.chat.completions.parse(
            model="gpt-4o",
            messages=[
                {
                    "role": "system",
                    "content": "",
                },
            ],
            response_format=SomePydanticClass,
        )
        print(response)
    finally:
        print("Process finished --- %s seconds ---" % (time.time() - start_time))

/app/tests/integration/usage/test_custom_callback.py::test_openai_2 failed: async def test_openai_2():
        config = await load_config()
    
        client = AzureOpenAI(
            azure_endpoint=config.azure_openai_endpoint_us,
            api_key=config.azure_openai_api_key_us.get_secret_value(),
            api_version="2025-03-01-preview",
        )
    
        start_time = time.time()
        try:
>           response = client.beta.chat.completions.parse(
                model="gpt-4o",
                messages=[
                    {
                        "role": "system",
                        "content": "",
                    },
                ],
                response_format=CollectiveInsurance,
            )

tests/integration/usage/test_custom_callback.py:292: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.10/site-packages/openai/resources/beta/chat/completions.py:156: in parse
    return self._post(
/usr/local/lib/python3.10/site-packages/openai/_base_client.py:1280: in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
/usr/local/lib/python3.10/site-packages/openai/_base_client.py:957: in request
    return self._request(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <openai.lib.azure.AzureOpenAI object at 0x7f62df48f7f0>

    def _request(
        self,
        *,
        cast_to: Type[ResponseT],
        options: FinalRequestOptions,
        retries_taken: int,
        stream: bool,
        stream_cls: type[_StreamT] | None,
    ) -> ResponseT | _StreamT:
        # create a copy of the options we were given so that if the
        # options are mutated later & we then retry, the retries are
        # given the original options
        input_options = model_copy(options)
    
        cast_to = self._maybe_override_cast_to(cast_to, options)
        options = self._prepare_options(options)
    
        remaining_retries = options.get_max_retries(self.max_retries) - retries_taken
        request = self._build_request(options, retries_taken=retries_taken)
        self._prepare_request(request)
    
        kwargs: HttpxSendArgs = {}
        if self.custom_auth is not None:
            kwargs["auth"] = self.custom_auth
    
        log.debug("Sending HTTP Request: %s %s", request.method, request.url)
    
        try:
            response = self._client.send(
                request,
                stream=stream or self._should_stream_response_body(request=request),
                **kwargs,
            )
        except httpx.TimeoutException as err:
            log.debug("Encountered httpx.TimeoutException", exc_info=True)
    
            if remaining_retries > 0:
                return self._retry_request(
                    input_options,
                    cast_to,
                    retries_taken=retries_taken,
                    stream=stream,
                    stream_cls=stream_cls,
                    response_headers=None,
                )
    
            log.debug("Raising timeout error")
            raise APITimeoutError(request=request) from err
        except Exception as err:
            log.debug("Encountered Exception", exc_info=True)
    
            if remaining_retries > 0:
                return self._retry_request(
                    input_options,
                    cast_to,
                    retries_taken=retries_taken,
                    stream=stream,
                    stream_cls=stream_cls,
                    response_headers=None,
                )
    
            log.debug("Raising connection error")
            raise APIConnectionError(request=request) from err
    
        log.debug(
            'HTTP Response: %s %s "%i %s" %s',
            request.method,
            request.url,
            response.status_code,
            response.reason_phrase,
            response.headers,
        )
        log.debug("request_id: %s", response.headers.get("x-request-id"))
    
        try:
            response.raise_for_status()
        except httpx.HTTPStatusError as err:  # thrown on 4xx and 5xx status code
            log.debug("Encountered httpx.HTTPStatusError", exc_info=True)
    
            if remaining_retries > 0 and self._should_retry(err.response):
                err.response.close()
                return self._retry_request(
                    input_options,
                    cast_to,
                    retries_taken=retries_taken,
                    response_headers=err.response.headers,
                    stream=stream,
                    stream_cls=stream_cls,
                )
    
            # If the response is streamed then we need to explicitly read the response
            # to completion before attempting to access the response text.
            if not err.response.is_closed:
                err.response.read()
    
            log.debug("Re-raising status error")
>           raise self._make_status_error_from_response(err.response) from None
E           openai.BadRequestError: Error code: 400 - {'error': {'message': 'Bad request: Invalid response_format provided.', 'type': 'invalid_request_error', 'param': None, 'code': None}}

/usr/local/lib/python3.10/site-packages/openai/_base_client.py:1061: BadRequestError

There is a timeout error when in streaming mode after about 241 seconds.

async def test_openai():
    config = await load_config()

    client = AzureOpenAI(
        azure_endpoint=config.azure_openai_endpoint_us,
        api_key=config.azure_openai_api_key_us.get_secret_value(),
        api_version="2025-03-01-preview",
    )


    start_time = time.time()
    try:
        with client.beta.chat.completions.stream(
            model="gpt-4o",
            messages=[
                {
                    "role": "system",
                    "content": "",
                },
            ],
            response_format=SomeLargePydanticClass,
        ) as stream:
            for event in stream:
                print("\n\n[EVENT RECEIVED]--- %s seconds ---" % (time.time() - start_time))

                print(event)

    finally:
        print("Process finished --- %s seconds ---" % (time.time() - start_time))

Running pytest with args: ['-p', 'vscode_pytest', '-s', '--rootdir=/app', '/app/tests/integration/usage/test_custom_callback.py::test_openai']
============================= test session starts ==============================
platform linux -- Python 3.10.17, pytest-8.3.4, pluggy-1.5.0
rootdir: /app
configfile: pyproject.toml
plugins: anyio-4.9.0, asyncio-0.25.1
asyncio: mode=auto, asyncio_default_fixture_loop_scope=function
collected 1 item

tests/integration/usage/test_custom_callback.py F

=================================== FAILURES ===================================
_________________________________ test_openai __________________________________

    async def test_openai():
        config = await load_config()
    
        client = AzureOpenAI(
            azure_endpoint=config.azure_openai_endpoint_us,
            api_key=config.azure_openai_api_key_us.get_secret_value(),
            api_version="2025-03-01-preview",
        )
    
        start_time = time.time()
        try:
>           with client.beta.chat.completions.stream(
                model="gpt-4o",
                messages=[
                    {
                        "role": "system",
                        "content": "",
                    },
                ],
                response_format=CollectiveInsurance,
            ) as stream:

tests/integration/usage/test_custom_callback.py:318: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/local/lib/python3.10/site-packages/openai/lib/streaming/chat/_completions.py:148: in __enter__
    raw_stream = self.__api_request()
/usr/local/lib/python3.10/site-packages/openai/_utils/_utils.py:275: in wrapper
    return func(*args, **kwargs)
/usr/local/lib/python3.10/site-packages/openai/resources/chat/completions.py:829: in create
    return self._post(
/usr/local/lib/python3.10/site-packages/openai/_base_client.py:1280: in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
/usr/local/lib/python3.10/site-packages/openai/_base_client.py:957: in request
    return self._request(
/usr/local/lib/python3.10/site-packages/openai/_base_client.py:1046: in _request
    return self._retry_request(
/usr/local/lib/python3.10/site-packages/openai/_base_client.py:1095: in _retry_request
    return self._request(
/usr/local/lib/python3.10/site-packages/openai/_base_client.py:1046: in _request
    return self._retry_request(
/usr/local/lib/python3.10/site-packages/openai/_base_client.py:1095: in _retry_request
    return self._request(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <openai.lib.azure.AzureOpenAI object at 0x7f3f322c7760>

    def _request(
        self,
        *,
        cast_to: Type[ResponseT],
        options: FinalRequestOptions,
        retries_taken: int,
        stream: bool,
        stream_cls: type[_StreamT] | None,
    ) -> ResponseT | _StreamT:
        # create a copy of the options we were given so that if the
        # options are mutated later & we then retry, the retries are
        # given the original options
        input_options = model_copy(options)
    
        cast_to = self._maybe_override_cast_to(cast_to, options)
        options = self._prepare_options(options)
    
        remaining_retries = options.get_max_retries(self.max_retries) - retries_taken
        request = self._build_request(options, retries_taken=retries_taken)
        self._prepare_request(request)
    
        kwargs: HttpxSendArgs = {}
        if self.custom_auth is not None:
            kwargs["auth"] = self.custom_auth
    
        log.debug("Sending HTTP Request: %s %s", request.method, request.url)
    
        try:
            response = self._client.send(
                request,
                stream=stream or self._should_stream_response_body(request=request),
                **kwargs,
            )
        except httpx.TimeoutException as err:
            log.debug("Encountered httpx.TimeoutException", exc_info=True)
    
            if remaining_retries > 0:
                return self._retry_request(
                    input_options,
                    cast_to,
                    retries_taken=retries_taken,
                    stream=stream,
                    stream_cls=stream_cls,
                    response_headers=None,
                )
    
            log.debug("Raising timeout error")
            raise APITimeoutError(request=request) from err
        except Exception as err:
            log.debug("Encountered Exception", exc_info=True)
    
            if remaining_retries > 0:
                return self._retry_request(
                    input_options,
                    cast_to,
                    retries_taken=retries_taken,
                    stream=stream,
                    stream_cls=stream_cls,
                    response_headers=None,
                )
    
            log.debug("Raising connection error")
            raise APIConnectionError(request=request) from err
    
        log.debug(
            'HTTP Response: %s %s "%i %s" %s',
            request.method,
            request.url,
            response.status_code,
            response.reason_phrase,
            response.headers,
        )
        log.debug("request_id: %s", response.headers.get("x-request-id"))
    
        try:
            response.raise_for_status()
        except httpx.HTTPStatusError as err:  # thrown on 4xx and 5xx status code
            log.debug("Encountered httpx.HTTPStatusError", exc_info=True)
    
            if remaining_retries > 0 and self._should_retry(err.response):
                err.response.close()
                return self._retry_request(
                    input_options,
                    cast_to,
                    retries_taken=retries_taken,
                    response_headers=err.response.headers,
                    stream=stream,
                    stream_cls=stream_cls,
                )
    
            # If the response is streamed then we need to explicitly read the response
            # to completion before attempting to access the response text.
            if not err.response.is_closed:
                err.response.read()
    
            log.debug("Re-raising status error")
>           raise self._make_status_error_from_response(err.response) from None
E           openai.APIStatusError: Error code: 408 - {'error': {'code': 'Timeout', 'message': 'The operation was timeout.'}}

/usr/local/lib/python3.10/site-packages/openai/_base_client.py:1061: APIStatusError

(Why is the error different than in non streaming mode)

When making the request after commenting one of the fields, it takes 145 seconds to receive the first event. When making a request with the class for this field, it takes 116 seconds to receive the message. Could there be a timeout for first token generation?

Does generating the CFG for the JSON schema/structured output take a long time? Shouldn't it already be cached since we made successful requests before?

Bogdan Pechounov 80 Reputation points

2025-04-30T16:08:25.3333333+00:00

Thank you for your response @Saideep Anchuri .

When I make the request and comment out some_field: SomeClass (which contains 50 fields), it works. Then, when I make the request with SomeClass, it works, so I am confused by why the JSON schema combined (without commenting out the field) would be invalid. (also it worked before)

If I use an invalid type such as dict or exceed 100 fields, it is shown in the returned error. However, the error returned 'Bad request: Invalid response_format provided.' doesn't contain any such details.
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
Saideep Anchuri 6,690 Reputation points Microsoft External Staff

2025-04-30T16:39:28.8633333+00:00
Hi Bogdan Pechounov

Schema Complexity: Ensure SomeClass has no more than 100 properties and no more than five levels of nesting.

Supported Field Types: Use only supported types like String, Number, Boolean, Object, Array, Enum, etc.

Invalid Keywords: Avoid any unsupported keywords in your schema definitions.

additionalProperties: Set additionalProperties to false in the schema for strict validation.

Kindly refer below link: supported-schemas-and-limitations

Thank you.
Saideep Anchuri 6,690 Reputation points Microsoft External Staff

2025-05-02T07:44:07.7233333+00:00

Hi Bogdan Pechounov

Did you get any chance to check above response.

Thank you.

Accepted answer

0 additional answers

Your answer

Bogdan Pechounov 80 Reputation points

2025-04-30T16:08:25.3333333+00:00

Thank you for your response @Saideep Anchuri .

When I make the request and comment out some_field: SomeClass (which contains 50 fields), it works. Then, when I make the request with SomeClass, it works, so I am confused by why the JSON schema combined (without commenting out the field) would be invalid. (also it worked before)

If I use an invalid type such as dict or exceed 100 fields, it is shown in the returned error. However, the error returned 'Bad request: Invalid response_format provided.' doesn't contain any such details.
Deleted

This comment has been deleted due to a violation of our Code of Conduct. The comment was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.
Saideep Anchuri 6,690 Reputation points Microsoft External Staff

2025-04-30T16:39:28.8633333+00:00

Hi Bogdan Pechounov

Schema Complexity: Ensure SomeClass has no more than 100 properties and no more than five levels of nesting.

Supported Field Types: Use only supported types like String, Number, Boolean, Object, Array, Enum, etc.

Invalid Keywords: Avoid any unsupported keywords in your schema definitions.

additionalProperties: Set additionalProperties to false in the schema for strict validation.

Kindly refer below link: supported-schemas-and-limitations

Thank you.
Saideep Anchuri 6,690 Reputation points Microsoft External Staff

2025-05-02T07:44:07.7233333+00:00

Hi Bogdan Pechounov

Did you get any chance to check above response.

Thank you.

Answer 1

Hi Bogdan Pechounov

The error message "Bad request: Invalid response_format provided." indicates that the response_format you are using in your request is not recognized or is invalid. This could happen if the SomePydanticClass or CollectiveInsurance you are passing does not match the expected format or type defined by the Azure OpenAI API.

Regarding the timeout issues, it is possible that the first token generation can take longer due to various factors such as model load, the complexity of the request, or network latency. The Azure OpenAI service may have internal timeouts for streaming responses, which can lead to a "408 - Timeout" error if the operation takes too long.

As for caching, while some aspects of the model's state may be cached, generating structured output or CFG for JSON schemas may still require processing time, especially if the request is complex or the model needs to validate the schema against the input.

Kindly refer below link: client-side-timeouts-third-party-client-apis

Thank You.

Bogdan Pechounov 80 Reputation points

2025-05-02T12:58:44.2466667+00:00

Hi,

The strange thing is that it doesn't seem to be any of these errors, since otherwise they would be specified in the error returned, and that the error via streaming is different. Also, it was working before and it suddenly stopped working about a month ago.

Thank you,

Share via

[Azure OpenAI - Structured Output] "Bad request: Invalid response_format provided." after a few minutes

0 additional answers

Your answer