Problem Statement

Config Servers that are configured with Git environment repositories are making multiple connections to Git servers and may be providing filters properties to many clients. When there are timeouts or other issues with Git server fetches and applications have issues getting updates on restarts it can be difficult to figure out where the root of the problem is. Especially when the symptom is on the client that is having an issue getting properties.

Emit more granular metrics

There are two environment repositories that we initially need more detailed metrics for:

Git

  • Git server fetch response time, errors and rate of requests
  • Processing time of fetched repo into properties

Vault

  • Vault secrets fetch time
  • Processing time of secrets into properties

Enhance health actuator

A Config Server should provide more details on what factors may have affected property fetching from services (Git, Vault and environment repositories) and dissemination to clients when it is unhealthy.

Additional context

Many organizations I talk with have Config Server dashboards showing health. They also have observability tools but are not able to get sufficient data points to help in troubleshooting Config Server instances.