plugin/health/overloaded.go

package health

import (
	"context"
	"net"
	"net/http"
	"time"

	"github.com/coredns/coredns/plugin"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promauto"
)

// overloaded queries the health end point and updates a metrics showing how long it took.
func (h *health) overloaded(ctx context.Context) {
	bypassProxy := &http.Transport{
		Proxy: nil,
		DialContext: (&net.Dialer{
			Timeout:   30 * time.Second,
			KeepAlive: 30 * time.Second,
		}).DialContext,
		ForceAttemptHTTP2:     true,
		MaxIdleConns:          100,
		IdleConnTimeout:       90 * time.Second,
		TLSHandshakeTimeout:   10 * time.Second,
		ExpectContinueTimeout: 1 * time.Second,
	}
	timeout := 3 * time.Second
	client := http.Client{
		Timeout:   timeout,
		Transport: bypassProxy,
	}

	req, _ := http.NewRequestWithContext(ctx, http.MethodGet, h.healthURI.String(), nil)
	tick := time.NewTicker(1 * time.Second)
	defer tick.Stop()

	for {
		select {
		case <-tick.C:
			start := time.Now()
			resp, err := client.Do(req)
			if err != nil && ctx.Err() == context.Canceled {
				// request was cancelled by parent goroutine
				return
			}
			if err != nil {
				HealthDuration.Observe(time.Since(start).Seconds())
				HealthFailures.Inc()
				log.Warningf("Local health request to %q failed: %s", req.URL.String(), err)
				continue
			}
			resp.Body.Close()
			elapsed := time.Since(start)
			HealthDuration.Observe(elapsed.Seconds())
			if elapsed > time.Second { // 1s is pretty random, but a *local* scrape taking that long isn't good
				log.Warningf("Local health request to %q took more than 1s: %s", req.URL.String(), elapsed)
			}

		case <-ctx.Done():
			return
		}
	}
}

var (
	// HealthDuration is the metric used for exporting how fast we can retrieve the /health endpoint.
	HealthDuration = promauto.NewHistogram(prometheus.HistogramOpts{
		Namespace:                   plugin.Namespace,
		Subsystem:                   "health",
		Name:                        "request_duration_seconds",
		Buckets:                     plugin.SlimTimeBuckets,
		NativeHistogramBucketFactor: plugin.NativeHistogramBucketFactor,
		Help:                        "Histogram of the time (in seconds) each request took.",
	})
	// HealthFailures is the metric used to count how many times the health request failed
	HealthFailures = promauto.NewCounter(prometheus.CounterOpts{
		Namespace: plugin.Namespace,
		Subsystem: "health",
		Name:      "request_failures_total",
		Help:      "The number of times the health check failed.",
	})
)
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`package health`

			`import (`
plugin/health : rework overloaded goroutine to support graceful shutdown (#5244) Signed-off-by: Ondřej Benkovský <ondrej.benkovsky@jamf.com> 2022-04-13 19:09:03 +02:00			`"context"`
plugin/health: Bypass proxy in self health check (#5401) * add detail to docs; bypass proxy in self health check Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2022-06-17 15:49:53 -04:00			`"net"`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`"net/http"`
			`"time"`

			`"github.com/coredns/coredns/plugin"`

			`"github.com/prometheus/client_golang/prometheus"`
using promauto package to ensure all created metrics are properly registered (#4025) Signed-off-by: zounengren <zounengren@cmss.chinamobile.com> 2020-07-25 23:06:28 +08:00			`"github.com/prometheus/client_golang/prometheus/promauto"`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`)`

			`// overloaded queries the health end point and updates a metrics showing how long it took.`
plugin/health : rework overloaded goroutine to support graceful shutdown (#5244) Signed-off-by: Ondřej Benkovský <ondrej.benkovsky@jamf.com> 2022-04-13 19:09:03 +02:00			`func (h *health) overloaded(ctx context.Context) {`
plugin/health: Bypass proxy in self health check (#5401) * add detail to docs; bypass proxy in self health check Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2022-06-17 15:49:53 -04:00			`bypassProxy := &http.Transport{`
			`Proxy: nil,`
			`DialContext: (&net.Dialer{`
			`Timeout: 30 * time.Second,`
			`KeepAlive: 30 * time.Second,`
			`}).DialContext,`
			`ForceAttemptHTTP2: true,`
			`MaxIdleConns: 100,`
			`IdleConnTimeout: 90 * time.Second,`
			`TLSHandshakeTimeout: 10 * time.Second,`
			`ExpectContinueTimeout: 1 * time.Second,`
			`}`
plugin/health : rework overloaded goroutine to support graceful shutdown (#5244) Signed-off-by: Ondřej Benkovský <ondrej.benkovsky@jamf.com> 2022-04-13 19:09:03 +02:00			`timeout := 3 * time.Second`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`client := http.Client{`
plugin/health: Bypass proxy in self health check (#5401) * add detail to docs; bypass proxy in self health check Signed-off-by: Chris O'Haver <cohaver@infoblox.com> 2022-06-17 15:49:53 -04:00			`Timeout: timeout,`
			`Transport: bypassProxy,`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`}`
plugin/health : rework overloaded goroutine to support graceful shutdown (#5244) Signed-off-by: Ondřej Benkovský <ondrej.benkovsky@jamf.com> 2022-04-13 19:09:03 +02:00
plugin/health: Poll localhost by default (#5934) defaulting to localhost makes things explicit in CoreDNS code, and will give us valid URIs in the logs Signed-off-by: W. Trevor King <wking@tremily.us> 2023-03-29 06:57:54 -07:00			`req, _ := http.NewRequestWithContext(ctx, http.MethodGet, h.healthURI.String(), nil)`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`tick := time.NewTicker(1 * time.Second)`
plugin/health: cleanups (#2811) Small, trivial cleanup: got triggered because I saw a comment on how health plugins polls other plugins which isn't true. * Remove useless newHealth function * healthParse -> parse * Remove useless constants Net deletion of code. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-05-04 21:06:04 +01:00			`defer tick.Stop()`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00
			`for {`
			`select {`
			`case <-tick.C:`
			`start := time.Now()`
plugin/health : rework overloaded goroutine to support graceful shutdown (#5244) Signed-off-by: Ondřej Benkovský <ondrej.benkovsky@jamf.com> 2022-04-13 19:09:03 +02:00			`resp, err := client.Do(req)`
			`if err != nil && ctx.Err() == context.Canceled {`
			`// request was cancelled by parent goroutine`
			`return`
			`}`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`if err != nil {`
Reduce the cardinality of health endpoint metrics (#4650) The health endpoint histogram has a large amount of cardinality for a simple endpoint. Introduce a new "Slim" set of buckets for `/health` to reduce the metrics load on large deployments. Especially those that have per-node DNS caching services. Add a metric to count internal health check failures rather than use the timeout value as side effect monitor of the check error. This avoids incorrectly recording the timeout value if there is an error that is not a timeout (ex. refused) Signed-off-by: SuperQ <superq@gmail.com> 2021-05-27 15:16:38 +02:00			`HealthDuration.Observe(time.Since(start).Seconds())`
			`HealthFailures.Inc()`
plugin/health: Poll localhost by default (#5934) defaulting to localhost makes things explicit in CoreDNS code, and will give us valid URIs in the logs Signed-off-by: W. Trevor King <wking@tremily.us> 2023-03-29 06:57:54 -07:00			`log.Warningf("Local health request to %q failed: %s", req.URL.String(), err)`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`continue`
			`}`
			`resp.Body.Close()`
plugin/health: add logging for local health request (#4533) 2021-03-19 11:40:38 +01:00			`elapsed := time.Since(start)`
			`HealthDuration.Observe(elapsed.Seconds())`
			`if elapsed > time.Second { // 1s is pretty random, but a local scrape taking that long isn't good`
plugin/health: Poll localhost by default (#5934) defaulting to localhost makes things explicit in CoreDNS code, and will give us valid URIs in the logs Signed-off-by: W. Trevor King <wking@tremily.us> 2023-03-29 06:57:54 -07:00			`log.Warningf("Local health request to %q took more than 1s: %s", req.URL.String(), elapsed)`
plugin/health: add logging for local health request (#4533) 2021-03-19 11:40:38 +01:00			`}`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00
plugin/health : rework overloaded goroutine to support graceful shutdown (#5244) Signed-off-by: Ondřej Benkovský <ondrej.benkovsky@jamf.com> 2022-04-13 19:09:03 +02:00			`case <-ctx.Done():`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`return`
			`}`
			`}`
			`}`

			`var (`
			`// HealthDuration is the metric used for exporting how fast we can retrieve the /health endpoint.`
using promauto package to ensure all created metrics are properly registered (#4025) Signed-off-by: zounengren <zounengren@cmss.chinamobile.com> 2020-07-25 23:06:28 +08:00			`HealthDuration = promauto.NewHistogram(prometheus.HistogramOpts{`
Enable Prometheus native histograms (#6524) Add a NativeHistogramBucketFactor parameter to the use of `NewHistogramVec` in order to enable use of Prometheus Native Histograms. This will store automatically computed sparse buckets in CoreDNS. If a compatible Prometeus requests native histograms this data will returned instead of the static buckets. The default factor of 1.05 should provide high quality resolution data. Signed-off-by: SuperQ <superq@gmail.com> 2024-03-11 21:09:09 +01:00			`Namespace: plugin.Namespace,`
			`Subsystem: "health",`
			`Name: "request_duration_seconds",`
			`Buckets: plugin.SlimTimeBuckets,`
			`NativeHistogramBucketFactor: plugin.NativeHistogramBucketFactor,`
			`Help: "Histogram of the time (in seconds) each request took.",`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`})`
Fix a typo in plugin/health (#4982) Signed-off-by: xuweiwei <xuweiwei_yewu@cmss.chinamobile.com> 2021-11-15 20:29:52 +08:00			`// HealthFailures is the metric used to count how many times the health request failed`
Reduce the cardinality of health endpoint metrics (#4650) The health endpoint histogram has a large amount of cardinality for a simple endpoint. Introduce a new "Slim" set of buckets for `/health` to reduce the metrics load on large deployments. Especially those that have per-node DNS caching services. Add a metric to count internal health check failures rather than use the timeout value as side effect monitor of the check error. This avoids incorrectly recording the timeout value if there is an error that is not a timeout (ex. refused) Signed-off-by: SuperQ <superq@gmail.com> 2021-05-27 15:16:38 +02:00			`HealthFailures = promauto.NewCounter(prometheus.CounterOpts{`
			`Namespace: plugin.Namespace,`
			`Subsystem: "health",`
			`Name: "request_failures_total",`
			`Help: "The number of times the health check failed.",`
			`})`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`)`