It would be great if we could utilize batch mode in LLM providers such as Vertex AI, Open AI and Claude.
Working with LLMs at a production level can mean a lot of data constantly which results in a big cost factor.
Batch mode from Vertex AI and Open AI both suggest a 50% cost reduction when using batch mode. Right now, we would have to switch our services and use the native SDK of the provider so that we can use batch mode.
https://ai.google.dev/gemini-api/docs/batch-mode
https://platform.openai.com/docs/guides/batch
Comment From: sunyuhan1998
Related issues: #3905
Comment From: sunyuhan1998
Related PR: #3913