plugin/health/overloaded.go

package health

import (
	"context"
	"net/http"
	"time"

	"github.com/coredns/coredns/plugin"

	"github.com/prometheus/client_golang/prometheus"
	"github.com/prometheus/client_golang/prometheus/promauto"
)

// overloaded queries the health end point and updates a metrics showing how long it took.
func (h *health) overloaded(ctx context.Context) {
	timeout := 3 * time.Second
	client := http.Client{
		Timeout: timeout,
	}

	url := "http://" + h.Addr + "/health"
	req, _ := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
	tick := time.NewTicker(1 * time.Second)
	defer tick.Stop()

	for {
		select {
		case <-tick.C:
			start := time.Now()
			resp, err := client.Do(req)
			if err != nil && ctx.Err() == context.Canceled {
				// request was cancelled by parent goroutine
				return
			}
			if err != nil {
				HealthDuration.Observe(time.Since(start).Seconds())
				HealthFailures.Inc()
				log.Warningf("Local health request to %q failed: %s", url, err)
				continue
			}
			resp.Body.Close()
			elapsed := time.Since(start)
			HealthDuration.Observe(elapsed.Seconds())
			if elapsed > time.Second { // 1s is pretty random, but a *local* scrape taking that long isn't good
				log.Warningf("Local health request to %q took more than 1s: %s", url, elapsed)
			}

		case <-ctx.Done():
			return
		}
	}
}

var (
	// HealthDuration is the metric used for exporting how fast we can retrieve the /health endpoint.
	HealthDuration = promauto.NewHistogram(prometheus.HistogramOpts{
		Namespace: plugin.Namespace,
		Subsystem: "health",
		Name:      "request_duration_seconds",
		Buckets:   plugin.SlimTimeBuckets,
		Help:      "Histogram of the time (in seconds) each request took.",
	})
	// HealthFailures is the metric used to count how many times the health request failed
	HealthFailures = promauto.NewCounter(prometheus.CounterOpts{
		Namespace: plugin.Namespace,
		Subsystem: "health",
		Name:      "request_failures_total",
		Help:      "The number of times the health check failed.",
	})
)
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`package health`

			`import (`
plugin/health : rework overloaded goroutine to support graceful shutdown (#5244) Signed-off-by: Ondřej Benkovský <ondrej.benkovsky@jamf.com> 2022-04-13 19:09:03 +02:00			`"context"`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`"net/http"`
			`"time"`

			`"github.com/coredns/coredns/plugin"`

			`"github.com/prometheus/client_golang/prometheus"`
using promauto package to ensure all created metrics are properly registered (#4025) Signed-off-by: zounengren <zounengren@cmss.chinamobile.com> 2020-07-25 23:06:28 +08:00			`"github.com/prometheus/client_golang/prometheus/promauto"`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`)`

			`// overloaded queries the health end point and updates a metrics showing how long it took.`
plugin/health : rework overloaded goroutine to support graceful shutdown (#5244) Signed-off-by: Ondřej Benkovský <ondrej.benkovsky@jamf.com> 2022-04-13 19:09:03 +02:00			`func (h *health) overloaded(ctx context.Context) {`
			`timeout := 3 * time.Second`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`client := http.Client{`
			`Timeout: timeout,`
			`}`
plugin/health : rework overloaded goroutine to support graceful shutdown (#5244) Signed-off-by: Ondřej Benkovský <ondrej.benkovsky@jamf.com> 2022-04-13 19:09:03 +02:00
Fix health check endpoint (#4231) Signed-off-by: Serge Logvinov <serge.logvinov@gmail.com> 2020-10-27 10:15:42 +02:00			`url := "http://" + h.Addr + "/health"`
plugin/health : rework overloaded goroutine to support graceful shutdown (#5244) Signed-off-by: Ondřej Benkovský <ondrej.benkovsky@jamf.com> 2022-04-13 19:09:03 +02:00			`req, _ := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`tick := time.NewTicker(1 * time.Second)`
plugin/health: cleanups (#2811) Small, trivial cleanup: got triggered because I saw a comment on how health plugins polls other plugins which isn't true. * Remove useless newHealth function * healthParse -> parse * Remove useless constants Net deletion of code. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-05-04 21:06:04 +01:00			`defer tick.Stop()`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00
			`for {`
			`select {`
			`case <-tick.C:`
			`start := time.Now()`
plugin/health : rework overloaded goroutine to support graceful shutdown (#5244) Signed-off-by: Ondřej Benkovský <ondrej.benkovsky@jamf.com> 2022-04-13 19:09:03 +02:00			`resp, err := client.Do(req)`
			`if err != nil && ctx.Err() == context.Canceled {`
			`// request was cancelled by parent goroutine`
			`return`
			`}`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`if err != nil {`
Reduce the cardinality of health endpoint metrics (#4650) The health endpoint histogram has a large amount of cardinality for a simple endpoint. Introduce a new "Slim" set of buckets for `/health` to reduce the metrics load on large deployments. Especially those that have per-node DNS caching services. Add a metric to count internal health check failures rather than use the timeout value as side effect monitor of the check error. This avoids incorrectly recording the timeout value if there is an error that is not a timeout (ex. refused) Signed-off-by: SuperQ <superq@gmail.com> 2021-05-27 15:16:38 +02:00			`HealthDuration.Observe(time.Since(start).Seconds())`
			`HealthFailures.Inc()`
plugin/health: add logging for local health request (#4533) 2021-03-19 11:40:38 +01:00			`log.Warningf("Local health request to %q failed: %s", url, err)`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`continue`
			`}`
			`resp.Body.Close()`
plugin/health: add logging for local health request (#4533) 2021-03-19 11:40:38 +01:00			`elapsed := time.Since(start)`
			`HealthDuration.Observe(elapsed.Seconds())`
			`if elapsed > time.Second { // 1s is pretty random, but a local scrape taking that long isn't good`
			`log.Warningf("Local health request to %q took more than 1s: %s", url, elapsed)`
			`}`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00
plugin/health : rework overloaded goroutine to support graceful shutdown (#5244) Signed-off-by: Ondřej Benkovský <ondrej.benkovsky@jamf.com> 2022-04-13 19:09:03 +02:00			`case <-ctx.Done():`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`return`
			`}`
			`}`
			`}`

			`var (`
			`// HealthDuration is the metric used for exporting how fast we can retrieve the /health endpoint.`
using promauto package to ensure all created metrics are properly registered (#4025) Signed-off-by: zounengren <zounengren@cmss.chinamobile.com> 2020-07-25 23:06:28 +08:00			`HealthDuration = promauto.NewHistogram(prometheus.HistogramOpts{`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`Namespace: plugin.Namespace,`
			`Subsystem: "health",`
			`Name: "request_duration_seconds",`
Reduce the cardinality of health endpoint metrics (#4650) The health endpoint histogram has a large amount of cardinality for a simple endpoint. Introduce a new "Slim" set of buckets for `/health` to reduce the metrics load on large deployments. Especially those that have per-node DNS caching services. Add a metric to count internal health check failures rather than use the timeout value as side effect monitor of the check error. This avoids incorrectly recording the timeout value if there is an error that is not a timeout (ex. refused) Signed-off-by: SuperQ <superq@gmail.com> 2021-05-27 15:16:38 +02:00			`Buckets: plugin.SlimTimeBuckets,`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`Help: "Histogram of the time (in seconds) each request took.",`
			`})`
Fix a typo in plugin/health (#4982) Signed-off-by: xuweiwei <xuweiwei_yewu@cmss.chinamobile.com> 2021-11-15 20:29:52 +08:00			`// HealthFailures is the metric used to count how many times the health request failed`
Reduce the cardinality of health endpoint metrics (#4650) The health endpoint histogram has a large amount of cardinality for a simple endpoint. Introduce a new "Slim" set of buckets for `/health` to reduce the metrics load on large deployments. Especially those that have per-node DNS caching services. Add a metric to count internal health check failures rather than use the timeout value as side effect monitor of the check error. This avoids incorrectly recording the timeout value if there is an error that is not a timeout (ex. refused) Signed-off-by: SuperQ <superq@gmail.com> 2021-05-27 15:16:38 +02:00			`HealthFailures = promauto.NewCounter(prometheus.CounterOpts{`
			`Namespace: plugin.Namespace,`
			`Subsystem: "health",`
			`Name: "request_failures_total",`
			`Help: "The number of times the health check failed.",`
			`})`
Overloaded (#1364) * plugin/health: add 'overloaded metrics' Query our on health endpoint and record (and export as a metric) the time it takes. The Get has a 5s timeout, that, when reached, will set the metric duration to 5s. The actually call "I'm I overloaded" is left to an external entity. * README * golint and govet * and the tests 2018-01-10 11:41:22 +00:00			`)`