mirror of
https://github.com/coredns/coredns.git
synced 2025-11-11 22:42:21 -05:00
Add OnReStartFailed which makes the health plugin stay up if the Corefile is corrupt and we revert to the previous version. Also needs a fix for the channel handling See #2659 Testing it will log the following when restarting with a corrupted Corefile ~~~ 2019-05-04T18:01:59.431Z [INFO] linux/amd64, go1.12.4, CoreDNS-1.5.0 linux/amd64, go1.12.4, [INFO] SIGUSR1: Reloading [INFO] Reloading [ERROR] Restart failed: Corefile:5 - Error during parsing: Unknown directive 'bdhfhdhj' [ERROR] SIGUSR1: starting with listener file descriptors: Corefile:5 - Error during parsing: Unknown directive 'bdhfhdhj' ~~~ After which the curl still works. This also needed a change to reset the channel used for the metrics go-routine which gets closed on shutdown, otherwise you'll see: ~~~ ^C[INFO] SIGINT: Shutting down panic: close of closed channel goroutine 90 [running]: github.com/coredns/coredns/plugin/health.(*health).OnFinalShutdown(0xc000089bc0, 0xc000063d88, 0x4afe6d) ~~~ Signed-off-by: Miek Gieben <miek@miek.nl>
79 lines
1.8 KiB
Markdown
79 lines
1.8 KiB
Markdown
# health
|
|
|
|
## Name
|
|
|
|
*health* - enables a health check endpoint.
|
|
|
|
## Description
|
|
|
|
Enabled process wide health endpoint. When CoreDNS is up and running this returns a 200 OK HTTP
|
|
status code. The health is exported, by default, on port 8080/health .
|
|
|
|
## Syntax
|
|
|
|
~~~
|
|
health [ADDRESS]
|
|
~~~
|
|
|
|
Optionally takes an address; the default is `:8080`. The health path is fixed to `/health`. The
|
|
health endpoint returns a 200 response code and the word "OK" when this server is healthy.
|
|
|
|
An extra option can be set with this extended syntax:
|
|
|
|
~~~
|
|
health [ADDRESS] {
|
|
lameduck DURATION
|
|
}
|
|
~~~
|
|
|
|
* Where `lameduck` will make the process unhealthy then *wait* for **DURATION** before the process
|
|
shuts down.
|
|
|
|
If you have multiple Server Blocks, *health* can only be enabled in one of them (as it is process
|
|
wide). If you really need multiple endpoints, you must run health endpoints on different ports:
|
|
|
|
~~~ corefile
|
|
com {
|
|
whoami
|
|
health :8080
|
|
}
|
|
|
|
net {
|
|
erratic
|
|
health :8081
|
|
}
|
|
~~~
|
|
|
|
Doing this is supported but both endponts ":8080" and ":8081" will export the exact same health.
|
|
|
|
## Metrics
|
|
|
|
If monitoring is enabled (via the *prometheus* directive) then the following metric is exported:
|
|
|
|
* `coredns_health_request_duration_seconds{}` - duration to process a HTTP query to the local
|
|
`/health` endpoint. As this a local operation it should be fast. A (large) increase in this
|
|
duration indicates the CoreDNS process is having trouble keeping up with its query load.
|
|
|
|
Note that this metric *does not* have a `server` label, because being overloaded is a symptom of
|
|
the running process, *not* a specific server.
|
|
|
|
## Examples
|
|
|
|
Run another health endpoint on http://localhost:8091.
|
|
|
|
~~~ corefile
|
|
. {
|
|
health localhost:8091
|
|
}
|
|
~~~
|
|
|
|
Set a lameduck duration of 1 second:
|
|
|
|
~~~ corefile
|
|
. {
|
|
health localhost:8092 {
|
|
lameduck 1s
|
|
}
|
|
}
|
|
~~~
|