Bug description No exception is thrown when a 429 (Too Many Requests) error occurs. The application hangs indefinitely at the chatResponsemethod invocation, waiting for it to return. This behavior likely applies to other errors as well.

Environment - Spring AI: v1.0.0-M5 - LLM Provider: Azure OpenAI

Steps to reproduce 1. Call the chatResponsemethod of the ChatClient when the rate limit is reached. 2. Observe that no exception is thrown, and the operation hangs indefinitely at the method call. 3. The control does not proceed to the next line, potentially causing a memory leak or unresponsiveness in the application.

Note: No retry mechanism has been used.

Image

Observed Behavior The method does not throw any exception. Control does not return from the chatResponse method, leading to an indefinite block.

Expected behavior - An appropriate exception should be thrown when a 429 error occurs, allowing it to be caught in a try-catch block. - Alternatively, the control should proceed to the next line, enabling developers to handle the error gracefully.

Additional Context This issue is critical as it blocks operations and can lead to resource leaks. Is there any configuration or workaround available to handle this scenario until a fix is provided?

Comment From: codertushar

@markpollack @ilayaperumalg @tzolov @ThomasVitale any idea?

Comment From: johannesrave

is this related to https://github.com/Azure/azure-sdk-for-java/issues/43583 ? could you post logs?

Comment From: joanna-kjm

@codertushar Have you looked at the HTTP response from Azure? I’ve experienced the same issue, and after some investigation, I realized that:

  • Since there is a "tokens per minute" limit in the Azure API, HTTP 429 responses from the Azure API contain a "Retry-After" header indicating the amount of time you need to wait before processing the next requests.
  • My Azure client checks this value, waits for the specified amount of time, and then retries the request.
  • Sometimes, the Azure API (e.g., when I send a single request that exceeds the entire limit at once) responds with a Retry-After set to 24 hours, causing my client to stop the thread for 24 hours.

Comment From: markpollack

I believe this is happening inside the Azure OpenAI SDK? @joanna-kjm I am not sure how to approach a solution. Any advice?