Expected Behavior

Spring AI should provide success/failure metrics for individual tool calls to enable production monitoring and reliability tracking.

Proposed Metrics

  1. Tool Result Counter: spring.ai.tool.result
  2. Tags: tool.name, success (true/false)
  3. Tracks individual tool execution outcomes

  4. Tool Error Counter: spring.ai.tool.error

  5. Tags: tool.name, error.type
  6. Tracks tool failures by error type

Code Example

# Configuration (optional, enabled by default)
spring:
 ai:
   tool:
     metrics:
       enabled: true
       include-error-details: false 


# View tool success metrics
curl http://localhost:8080/actuator/metrics/spring.ai.tool.result

# Expected response:
{
  "name": "spring.ai.tool.result",
  "measurements": [
    { "statistic": "COUNT", "value": 45 }
  ],
  "availableTags": [
    { "tag": "tool.name", "values": ["getCurrentWeather", "getRecommendation"] },
    { "tag": "success", "values": ["true", "false"] }
  ]
}

# Filter by specific tool and outcome
curl "http://localhost:8080/actuator/metrics/spring.ai.tool.result?tag=tool.name:getCurrentWeather&tag=success:false" 

Implementation Approach

Add a new ToolMetricsObservationHandler that extends the existing observation system:

@Component
@ConditionalOnClass({ObservationRegistry.class, MeterRegistry.class})
public class ToolMetricsObservationHandler implements ObservationHandler<Observation.Context> {

    @Override
    public void onStop(Observation.Context context) {
        if (context instanceof ToolCallingObservationContext toolContext) {
            String toolName = toolContext.getToolDefinition().name();
            boolean isSuccess = !toolContext.hasError();

            Counter.builder("spring.ai.tool.result")
                .tag("tool.name", toolName)
                .tag("success", String.valueOf(isSuccess))
                .register(meterRegistry)
                .increment();
        }
    }
}

Current Behavior

Spring AI currently provides basic spring.ai.tool timer metrics through the existing observation system:

✅ Tool execution time (COUNT, TOTAL_TIME, MAX) ✅ Basic tool call tracking ❌ Success/failure breakdown ❌ Error rate monitoring ❌ Tool reliability insights

The existing metrics are generated automatically by Micrometer's DefaultMeterObservationHandler, but lack domain-specific insights needed for production monitoring.

Context

How has this issue affected you? When running AI applications in production, it's critical to monitor tool reliability. Currently, there's no way to:

Identify which tools are failing frequently Set up alerts for tool failure rates Track tool performance degradation over time Monitor SLA compliance for tool-dependent services