Files
coredns/plugin/health/README.md
Miek Gieben 076b8d4fba plugin/health: add OnRestartFailed (#2812)
Add OnReStartFailed which makes the health plugin stay up if the
Corefile is corrupt and we revert to the previous version.

Also needs a fix for the channel handling

See #2659

Testing it will log the following when restarting with a corrupted
Corefile

~~~
2019-05-04T18:01:59.431Z [INFO] linux/amd64, go1.12.4,
CoreDNS-1.5.0
linux/amd64, go1.12.4,
[INFO] SIGUSR1: Reloading
[INFO] Reloading
[ERROR] Restart failed: Corefile:5 - Error during parsing: Unknown directive 'bdhfhdhj'
[ERROR] SIGUSR1: starting with listener file descriptors: Corefile:5 - Error during parsing: Unknown directive 'bdhfhdhj'
~~~

After which the curl still works.

This also needed a change to reset the channel used for the metrics
go-routine which gets closed on shutdown, otherwise you'll see:

~~~
^C[INFO] SIGINT: Shutting down
panic: close of closed channel

goroutine 90 [running]:
github.com/coredns/coredns/plugin/health.(*health).OnFinalShutdown(0xc000089bc0, 0xc000063d88, 0x4afe6d)
~~~

Signed-off-by: Miek Gieben <miek@miek.nl>
2019-05-04 16:06:25 -04:00

79 lines
1.8 KiB
Markdown

# health
## Name
*health* - enables a health check endpoint.
## Description
Enabled process wide health endpoint. When CoreDNS is up and running this returns a 200 OK HTTP
status code. The health is exported, by default, on port 8080/health .
## Syntax
~~~
health [ADDRESS]
~~~
Optionally takes an address; the default is `:8080`. The health path is fixed to `/health`. The
health endpoint returns a 200 response code and the word "OK" when this server is healthy.
An extra option can be set with this extended syntax:
~~~
health [ADDRESS] {
lameduck DURATION
}
~~~
* Where `lameduck` will make the process unhealthy then *wait* for **DURATION** before the process
shuts down.
If you have multiple Server Blocks, *health* can only be enabled in one of them (as it is process
wide). If you really need multiple endpoints, you must run health endpoints on different ports:
~~~ corefile
com {
whoami
health :8080
}
net {
erratic
health :8081
}
~~~
Doing this is supported but both endponts ":8080" and ":8081" will export the exact same health.
## Metrics
If monitoring is enabled (via the *prometheus* directive) then the following metric is exported:
* `coredns_health_request_duration_seconds{}` - duration to process a HTTP query to the local
`/health` endpoint. As this a local operation it should be fast. A (large) increase in this
duration indicates the CoreDNS process is having trouble keeping up with its query load.
Note that this metric *does not* have a `server` label, because being overloaded is a symptom of
the running process, *not* a specific server.
## Examples
Run another health endpoint on http://localhost:8091.
~~~ corefile
. {
health localhost:8091
}
~~~
Set a lameduck duration of 1 second:
~~~ corefile
. {
health localhost:8092 {
lameduck 1s
}
}
~~~