Bug description
The AzureOpenAiChatModel
does not support the retry mechanism. According to the documentation, this feature should already be supported. However, after configuring the following parameters, retries still do not occur:
spring.ai.retry.on-client-errors=true
spring.ai.retry.on-http-codes=400,408,429,500,502,503,504
Environment
- Spring AI version: 1.0.0-M5
- Java version: 21
Steps to reproduce
1. Configure the retry settings in your application properties file as follows:
properties
spring.ai.retry.on-client-errors=true
spring.ai.retry.on-http-codes=400,408,429,500,502,503,504
2. Invoke the AzureOpenAiChatModel
under conditions that would trigger a retry (e.g., by forcing an HTTP 400 or other configured codes).
3. Observe that no retry occurs.
Expected behavior
The AzureOpenAiChatModel
should attempt to retry the request according to the specified retry configuration parameters.
Error Logs
com.azure.core.exception.HttpResponseException: Status code 400, "{"error":{"inner_error":{"code":"ResponsibleAIPolicyViolation","content_filter_results":{"sexual":{"filtered":true,"severity":"high"},"violence":{"filtered":false,"severity":"safe"},"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"}}},"code":"content_filter","message":"The response was filtered due to the prompt triggering Azure OpenAI's content management policy. Please modify your prompt and retry. To learn more about our content filtering policies please read our documentation: \r\nhttps://go.microsoft.com/fwlink/?linkid=2198766.","param":"prompt","type":null}}"
at com.azure.core.implementation.http.rest.RestProxyBase.instantiateUnexpectedException(RestProxyBase.java:388)
at com.azure.core.implementation.http.rest.SyncRestProxy.ensureExpectedStatus(SyncRestProxy.java:133)
at com.azure.core.implementation.http.rest.SyncRestProxy.handleRestReturnType(SyncRestProxy.java:211)
at com.azure.core.implementation.http.rest.SyncRestProxy.invoke(SyncRestProxy.java:86)
at com.azure.core.implementation.http.rest.RestProxyBase.invoke(RestProxyBase.java:124)
at com.azure.core.http.rest.RestProxy.invoke(RestProxy.java:95)
at jdk.proxy2/jdk.proxy2.$Proxy196.getChatCompletionsSync(Unknown Source)
at com.azure.ai.openai.implementation.OpenAIClientImpl.getChatCompletionsWithResponse(OpenAIClientImpl.java:1900)
at com.azure.ai.openai.OpenAIClient.getChatCompletionsWithResponse(OpenAIClient.java:350)
at com.azure.ai.openai.OpenAIClient.getChatCompletions(OpenAIClient.java:760)
at org.springframework.ai.azure.openai.AzureOpenAiChatModel.lambda$internalCall$1(AzureOpenAiChatModel.java:244)
at io.micrometer.observation.Observation.observe(Observation.java:565)
at org.springframework.ai.azure.openai.AzureOpenAiChatModel.internalCall(AzureOpenAiChatModel.java:240)
at org.springframework.ai.azure.openai.AzureOpenAiChatModel.call(AzureOpenAiChatModel.java:226)
at org.springframework.ai.chat.client.DefaultChatClient$DefaultChatClientRequestSpec$1.aroundCall(DefaultChatClient.java:675)
at org.springframework.ai.chat.client.advisor.DefaultAroundAdvisorChain.lambda$nextAroundCall$1(DefaultAroundAdvisorChain.java:98)
at io.micrometer.observation.Observation.observe(Observation.java:565)
at org.springframework.ai.chat.client.advisor.DefaultAroundAdvisorChain.nextAroundCall(DefaultAroundAdvisorChain.java:98)
at org.springframework.ai.chat.client.DefaultChatClient$DefaultCallResponseSpec.doGetChatResponse(DefaultChatClient.java:488)
at org.springframework.ai.chat.client.DefaultChatClient$DefaultCallResponseSpec.lambda$doGetObservableChatResponse$1(DefaultChatClient.java:477)
at io.micrometer.observation.Observation.observe(Observation.java:565)
at org.springframework.ai.chat.client.DefaultChatClient$DefaultCallResponseSpec.doGetObservableChatResponse(DefaultChatClient.java:477)
at org.springframework.ai.chat.client.DefaultChatClient$DefaultCallResponseSpec.doSingleWithBeanOutputConverter(DefaultChatClient.java:451)
at org.springframework.ai.chat.client.DefaultChatClient$DefaultCallResponseSpec.entity(DefaultChatClient.java:446)
Comment From: markpollack
i believe that we didn't add spring retry around the azure openai sdk, because there is already a feature in that SDK relating to retry. I can't seem to find the docs on that. Could someone confirm?
Comment From: markpollack
@mkheck any insight?
Comment From: mkheck
@markpollack I'll dig into it and let you know. If there is no Azure-specific mechanism, I should be able to wrap it with Spring Retry. More news shortly.
Comment From: mkheck
@markpollack Please go ahead and assign it to me. I'll look at the ones you've tagged me with for review and work through them.
Comment From: iAMSagar44
Hi @markpollack / @mkheck ,
There is already a feature in the azure-sdk-for-java which retries for transient errors. I think the default retry count is 3.
Please check the RetryPolicy.class
and ExponentialBackoff.class
n the com.azure.core.http.policy package.
I believe these are added to the HttpPipelinePolicy
array in the createHttpPipeline()
method in the OpenAIClientBuilder.class
.
Here is an example of the retry occurring for a 429 error.
1st Request -
2025-05-13T17:35:59.381+10:00 INFO 10148 --- [docs-ai-assistant] [oundedElastic-8] c.a.a.o.i.O.getChatCompletions : {"az.sdk.message":"HTTP request","method":"POST","url":"https://{Azure_OpenAI_Endpoint}//openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview","tryCount":1,"content-length":26643}
1st Response - Failed -
2025-05-13T17:35:59.428+10:00 INFO 10148 --- [docs-ai-assistant] [-http-kqueue-13] c.a.a.o.i.O.getChatCompletions : {"az.sdk.message":"HTTP response","statusCode":429,"url":"https://{Azure_OpenAI_Endpoint}//openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview","durationMs":46,"content-length":440,"content-length":440,"body":"{\"error\":{\"code\":\"429\",\"message\": \"Requests to the ChatCompletions_Create Operation under Azure OpenAI API version 2025-01-01-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 51 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit. For Free Account customers, upgrade to Pay as you Go here: https://aka.ms/429TrialUpgrade.\"}}"}
Retry attempt 1 - 2nd Request -
2025-05-13T17:36:50.434+10:00 INFO 10148 --- [docs-ai-assistant] [ parallel-5] c.a.a.o.i.O.getChatCompletions : {"az.sdk.message":"HTTP request","method":"POST","url":"https://{Azure_OpenAI_Endpoint}//openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview","tryCount":2,"content-length":26643}
Success Response -
2025-05-13T17:36:51.354+10:00 INFO 10148 --- [docs-ai-assistant] [-http-kqueue-15] c.a.a.o.i.O.getChatCompletions : {"az.sdk.message":"HTTP response","statusCode":200,"url":"https://{Azure_OpenAI_Endpoint}//openai/deployments/gpt-4o/chat/completions?api-version=2025-01-01-preview","durationMs":921}
I did not have any custom retry mechanism in my code.
You can see the logs in your application by setting this env variable - export AZURE_HTTP_LOG_DETAIL_LEVEL=BODY
when using Azure OpenAI Chat models or embedding models or AI Search in your Spring AI application.
Hope this analysis helps.