coredns/plugin/kubernetes/object/metrics.go

package object

import (
	"time"

	"github.com/coredns/coredns/plugin"
	"github.com/coredns/coredns/plugin/pkg/log"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promauto"
	api "k8s.io/api/core/v1"
	meta "k8s.io/apimachinery/pkg/apis/meta/v1"
)

var (
	// DNSProgrammingLatency is defined as the time it took to program a DNS instance - from the time
	// a service or pod has changed to the time the change was propagated and was available to be
	// served by a DNS server.
	// The definition of this SLI can be found at https://github.com/kubernetes/community/blob/master/sig-scalability/slos/dns_programming_latency.md
	// Note that the metrics is partially based on the time exported by the endpoints controller on
	// the master machine. The measurement may be inaccurate if there is a clock drift between the
	// node and master machine.
	// The service_kind label can be one of:
	//   * cluster_ip
	//   * headless_with_selector
	//   * headless_without_selector
	DNSProgrammingLatency = promauto.NewHistogramVec(prometheus.HistogramOpts{
		Namespace: plugin.Namespace,
		Subsystem: "kubernetes",
		Name:      "dns_programming_duration_seconds",
		// From 1 millisecond to ~17 minutes.
		Buckets: prometheus.ExponentialBuckets(0.001, 2, 20),
		Help:    "Histogram of the time (in seconds) it took to program a dns instance.",
	}, []string{"service_kind"})

	// DurationSinceFunc returns the duration elapsed since the given time.
	// Added as a global variable to allow injection for testing.
	DurationSinceFunc = time.Since
)

// EndpointLatencyRecorder records latency metric for endpoint objects
type EndpointLatencyRecorder struct {
	TT          time.Time
	ServiceFunc func(meta.Object) []*Service
	Services    []*Service
}

func (l *EndpointLatencyRecorder) init(o meta.Object) {
	l.Services = l.ServiceFunc(o)
	l.TT = time.Time{}
	stringVal, ok := o.GetAnnotations()[api.EndpointsLastChangeTriggerTime]
	if ok {
		tt, err := time.Parse(time.RFC3339Nano, stringVal)
		if err != nil {
			log.Warningf("DnsProgrammingLatency cannot be calculated for Endpoints '%s/%s'; invalid %q annotation RFC3339 value of %q",
				o.GetNamespace(), o.GetName(), api.EndpointsLastChangeTriggerTime, stringVal)
			// In case of error val = time.Zero, which is ignored downstream.
		}
		l.TT = tt
	}
}

func (l *EndpointLatencyRecorder) record() {
	// isHeadless indicates whether the endpoints object belongs to a headless
	// service (i.e. clusterIp = None). Note that this can be a  false negatives if the service
	// informer is lagging, i.e. we may not see a recently created service. Given that the services
	// don't change very often (comparing to much more frequent endpoints changes), cases when this method
	// will return wrong answer should be relatively rare. Because of that we intentionally accept this
	// flaw to keep the solution simple.
	isHeadless := len(l.Services) == 1 && l.Services[0].Headless()

	if !isHeadless || l.TT.IsZero() {
		return
	}

	// If we're here it means that the Endpoints object is for a headless service and that
	// the Endpoints object was created by the endpoints-controller (because the
	// LastChangeTriggerTime annotation is set). It means that the corresponding service is a
	// "headless service with selector".
	DNSProgrammingLatency.WithLabelValues("headless_with_selector").
		Observe(DurationSinceFunc(l.TT).Seconds())
}
plugin/kubernetes: Fix dns programming duration metric (#4255) * get data reqd to record latency before calling toFuncs * refactor out unnecessary toFunc wrappers * remove latency metric unit tests per PR feedback Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-12-01 15:29:05 -05:00			`package object`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00
			`import (`
			`"time"`

			`"github.com/coredns/coredns/plugin"`
plugin/kubernetes: Fix dns programming duration metric (#4255) * get data reqd to record latency before calling toFuncs * refactor out unnecessary toFunc wrappers * remove latency metric unit tests per PR feedback Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-12-01 15:29:05 -05:00			`"github.com/coredns/coredns/plugin/pkg/log"`

Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`"github.com/prometheus/client_golang/prometheus"`
using promauto package to ensure all created metrics are properly registered (#4025) Signed-off-by: zounengren <zounengren@cmss.chinamobile.com> 2020-07-25 23:06:28 +08:00			`"github.com/prometheus/client_golang/prometheus/promauto"`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`api "k8s.io/api/core/v1"`
plugin/kubernetes: Watch EndpointSlices (#4209) * initial commit Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * convert endpointslices to object.endpoints Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * add opt hard coded for now Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * check that server supports endpointslice Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * fix import grouping Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * dont use endpoint slice in 1.17 or 1.18 Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * bump kind/k8s in circle ci to latest Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * drop k8s to latest supported by kind Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * use endpointslice name as endoint Name; index by Service name Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * use index key comparison in nsAddrs() Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * add Index to object.Endpoint fixtures; fix direct endpoint name compares Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * add slice dup check and test Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * todo Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * add ep-slice skew dup test for reverse Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * nsaddrs: de-dup ep-slice skew dups; add test Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * remove todo Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * address various feedback Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * consolidate endpoint/slice informer code Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * fix endpoint informer consolidation; use clearer func name Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * log info; use major/minor fields Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * fix nsAddr and unit test Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * add latency tracking for endpointslices Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * endpointslice latency unit test & fix Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * code shuffling Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * rename endpointslices in tests Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * remove de-dup from nsAddrs and test Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * remove de-dup from findServices / test Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-10-30 08:14:30 -04:00			`meta "k8s.io/apimachinery/pkg/apis/meta/v1"`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`)`

			`var (`
Fix golint warnings (#4241) Include: 1. plugin/forward/type.go:8:2: const typeUdp should be typeUDP 2. plugin/forward/type.go:9:2: const typeTcp should be typeTCP 3. plugin/forward/type.go:10:2: const typeTls should be typeTLS 4. plugin/kubernetes/metrics.go:24:2: var DnsProgrammingLatency should be DNSProgrammingLatency 5. plugin/kubernetes/metrics_test.go:124:102: func parameter clusterIp should be clusterIP Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com> 2020-10-28 14:39:56 +08:00			`// DNSProgrammingLatency is defined as the time it took to program a DNS instance - from the time`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`// a service or pod has changed to the time the change was propagated and was available to be`
			`// served by a DNS server.`
			`// The definition of this SLI can be found at https://github.com/kubernetes/community/blob/master/sig-scalability/slos/dns_programming_latency.md`
			`// Note that the metrics is partially based on the time exported by the endpoints controller on`
			`// the master machine. The measurement may be inaccurate if there is a clock drift between the`
			`// node and master machine.`
			`// The service_kind label can be one of:`
			`// * cluster_ip`
			`// * headless_with_selector`
			`// * headless_without_selector`
Fix golint warnings (#4241) Include: 1. plugin/forward/type.go:8:2: const typeUdp should be typeUDP 2. plugin/forward/type.go:9:2: const typeTcp should be typeTCP 3. plugin/forward/type.go:10:2: const typeTls should be typeTLS 4. plugin/kubernetes/metrics.go:24:2: var DnsProgrammingLatency should be DNSProgrammingLatency 5. plugin/kubernetes/metrics_test.go:124:102: func parameter clusterIp should be clusterIP Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com> 2020-10-28 14:39:56 +08:00			`DNSProgrammingLatency = promauto.NewHistogramVec(prometheus.HistogramOpts{`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`Namespace: plugin.Namespace,`
plugin/kubernetes: Fix dns programming duration metric (#4255) * get data reqd to record latency before calling toFuncs * refactor out unnecessary toFunc wrappers * remove latency metric unit tests per PR feedback Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-12-01 15:29:05 -05:00			`Subsystem: "kubernetes",`
kubernetes: brush up README, rename metric (#3360) Other latency metrics have `_duration` in the name change this metric to be in sync with the other ones. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-07 16:38:46 +01:00			`Name: "dns_programming_duration_seconds",`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`// From 1 millisecond to ~17 minutes.`
			`Buckets: prometheus.ExponentialBuckets(0.001, 2, 20),`
			`Help: "Histogram of the time (in seconds) it took to program a dns instance.",`
			`}, []string{"service_kind"})`

plugin/kubernetes: Fix dns programming duration metric (#4255) * get data reqd to record latency before calling toFuncs * refactor out unnecessary toFunc wrappers * remove latency metric unit tests per PR feedback Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-12-01 15:29:05 -05:00			`// DurationSinceFunc returns the duration elapsed since the given time.`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`// Added as a global variable to allow injection for testing.`
plugin/kubernetes: Fix dns programming duration metric (#4255) * get data reqd to record latency before calling toFuncs * refactor out unnecessary toFunc wrappers * remove latency metric unit tests per PR feedback Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-12-01 15:29:05 -05:00			`DurationSinceFunc = time.Since`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`)`

plugin/kubernetes: Fix dns programming duration metric (#4255) * get data reqd to record latency before calling toFuncs * refactor out unnecessary toFunc wrappers * remove latency metric unit tests per PR feedback Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-12-01 15:29:05 -05:00			`// EndpointLatencyRecorder records latency metric for endpoint objects`
			`type EndpointLatencyRecorder struct {`
			`TT time.Time`
			`ServiceFunc func(meta.Object) []*Service`
			`Services []*Service`
			`}`

			`func (l *EndpointLatencyRecorder) init(o meta.Object) {`
			`l.Services = l.ServiceFunc(o)`
			`l.TT = time.Time{}`
			`stringVal, ok := o.GetAnnotations()[api.EndpointsLastChangeTriggerTime]`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`if ok {`
plugin/kubernetes: Fix dns programming duration metric (#4255) * get data reqd to record latency before calling toFuncs * refactor out unnecessary toFunc wrappers * remove latency metric unit tests per PR feedback Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-12-01 15:29:05 -05:00			`tt, err := time.Parse(time.RFC3339Nano, stringVal)`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`if err != nil {`
			`log.Warningf("DnsProgrammingLatency cannot be calculated for Endpoints '%s/%s'; invalid %q annotation RFC3339 value of %q",`
plugin/kubernetes: Fix dns programming duration metric (#4255) * get data reqd to record latency before calling toFuncs * refactor out unnecessary toFunc wrappers * remove latency metric unit tests per PR feedback Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-12-01 15:29:05 -05:00			`o.GetNamespace(), o.GetName(), api.EndpointsLastChangeTriggerTime, stringVal)`
			`// In case of error val = time.Zero, which is ignored downstream.`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`}`
plugin/kubernetes: Fix dns programming duration metric (#4255) * get data reqd to record latency before calling toFuncs * refactor out unnecessary toFunc wrappers * remove latency metric unit tests per PR feedback Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-12-01 15:29:05 -05:00			`l.TT = tt`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`}`
plugin/kubernetes: Fix dns programming duration metric (#4255) * get data reqd to record latency before calling toFuncs * refactor out unnecessary toFunc wrappers * remove latency metric unit tests per PR feedback Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-12-01 15:29:05 -05:00			`}`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00
plugin/kubernetes: Fix dns programming duration metric (#4255) * get data reqd to record latency before calling toFuncs * refactor out unnecessary toFunc wrappers * remove latency metric unit tests per PR feedback Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-12-01 15:29:05 -05:00			`func (l *EndpointLatencyRecorder) record() {`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`// isHeadless indicates whether the endpoints object belongs to a headless`
			`// service (i.e. clusterIp = None). Note that this can be a false negatives if the service`
			`// informer is lagging, i.e. we may not see a recently created service. Given that the services`
			`// don't change very often (comparing to much more frequent endpoints changes), cases when this method`
			`// will return wrong answer should be relatively rare. Because of that we intentionally accept this`
			`// flaw to keep the solution simple.`
plugin/kubernetes: Add support for dual stack ClusterIP Services (#4339) * support dual stack clusterIPs Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * stickler Signed-off-by: Chris O'Haver <cohaver@infoblox.com> * fix ClusterIPs make Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-12-21 05:30:24 -05:00			`isHeadless := len(l.Services) == 1 && l.Services[0].Headless()`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00
plugin/kubernetes: Fix dns programming duration metric (#4255) * get data reqd to record latency before calling toFuncs * refactor out unnecessary toFunc wrappers * remove latency metric unit tests per PR feedback Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-12-01 15:29:05 -05:00			`if !isHeadless \|\| l.TT.IsZero() {`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`return`
			`}`

			`// If we're here it means that the Endpoints object is for a headless service and that`
			`// the Endpoints object was created by the endpoints-controller (because the`
			`// LastChangeTriggerTime annotation is set). It means that the corresponding service is a`
			`// "headless service with selector".`
Fix golint warnings (#4241) Include: 1. plugin/forward/type.go:8:2: const typeUdp should be typeUDP 2. plugin/forward/type.go:9:2: const typeTcp should be typeTCP 3. plugin/forward/type.go:10:2: const typeTls should be typeTLS 4. plugin/kubernetes/metrics.go:24:2: var DnsProgrammingLatency should be DNSProgrammingLatency 5. plugin/kubernetes/metrics_test.go:124:102: func parameter clusterIp should be clusterIP Signed-off-by: zouyu <zouy.fnst@cn.fujitsu.com> 2020-10-28 14:39:56 +08:00			`DNSProgrammingLatency.WithLabelValues("headless_with_selector").`
plugin/kubernetes: Fix dns programming duration metric (#4255) * get data reqd to record latency before calling toFuncs * refactor out unnecessary toFunc wrappers * remove latency metric unit tests per PR feedback Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2020-12-01 15:29:05 -05:00			`Observe(DurationSinceFunc(l.TT).Seconds())`
Measure and expose DNS programming latency from Kubernetes plugin. (#3171) For now metric is measure only for headless services. Informer has been slighlty refactored, so the code can measure latency without storing extra fields on Endpoint struct. Signed-off-by: Janek Łukaszewicz <janluk@google.com> Suggestions from code review Co-Authored-By: Chris O'Haver <cohaver@infoblox.com> 2019-10-04 17:48:43 +02:00			`}`