2016-04-06 09:21:46 +01:00
|
|
|
# health
|
|
|
|
|
|
2018-01-04 12:53:07 +00:00
|
|
|
## Name
|
2017-10-20 09:47:43 +01:00
|
|
|
|
2018-01-04 12:53:07 +00:00
|
|
|
*health* - enables a health check endpoint.
|
|
|
|
|
|
|
|
|
|
## Description
|
|
|
|
|
|
2019-03-07 22:13:47 +00:00
|
|
|
Enabled process wide health endpoint. When CoreDNS is up and running this returns a 200 OK http
|
|
|
|
|
status code. The health is exported, by default, on port 8080/health .
|
2016-04-06 09:21:46 +01:00
|
|
|
|
|
|
|
|
## Syntax
|
|
|
|
|
|
|
|
|
|
~~~
|
2016-10-10 20:13:22 +01:00
|
|
|
health [ADDRESS]
|
2016-04-06 09:21:46 +01:00
|
|
|
~~~
|
|
|
|
|
|
2017-08-27 21:33:38 +01:00
|
|
|
Optionally takes an address; the default is `:8080`. The health path is fixed to `/health`. The
|
2019-03-07 22:13:47 +00:00
|
|
|
health endpoint returns a 200 response code and the word "OK" when this server is healthy.
|
2016-04-06 09:21:46 +01:00
|
|
|
|
2019-03-07 22:13:47 +00:00
|
|
|
An extra option can be set with this extended syntax:
|
2018-01-18 10:40:09 +00:00
|
|
|
|
|
|
|
|
~~~
|
|
|
|
|
health [ADDRESS] {
|
|
|
|
|
lameduck DURATION
|
|
|
|
|
}
|
|
|
|
|
~~~
|
|
|
|
|
|
|
|
|
|
* Where `lameduck` will make the process unhealthy then *wait* for **DURATION** before the process
|
|
|
|
|
shuts down.
|
|
|
|
|
|
2019-03-07 22:13:47 +00:00
|
|
|
If you have multiple Server Blocks, *health* should only be enabled in one of them (as it is process
|
|
|
|
|
wide). If you really need multiple endpoints, you must run health endpoints on different ports:
|
2018-03-02 21:40:14 -08:00
|
|
|
|
|
|
|
|
~~~ corefile
|
|
|
|
|
com {
|
|
|
|
|
whoami
|
|
|
|
|
health :8080
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
net {
|
|
|
|
|
erratic
|
|
|
|
|
health :8081
|
|
|
|
|
}
|
|
|
|
|
~~~
|
|
|
|
|
|
2018-01-10 11:41:22 +00:00
|
|
|
## Metrics
|
|
|
|
|
|
|
|
|
|
If monitoring is enabled (via the *prometheus* directive) then the following metric is exported:
|
|
|
|
|
|
|
|
|
|
* `coredns_health_request_duration_seconds{}` - duration to process a /health query. As this should
|
|
|
|
|
be a local operation it should be fast. A (large) increases in this duration indicates the
|
2018-03-01 18:32:15 -08:00
|
|
|
CoreDNS process is having trouble keeping up with its query load.
|
2018-01-10 11:41:22 +00:00
|
|
|
|
2018-04-20 15:03:59 +01:00
|
|
|
Note that this metric *does not* have a `server` label, because being overloaded is a symptom of
|
|
|
|
|
the running process, *not* a specific server.
|
|
|
|
|
|
2016-04-06 09:21:46 +01:00
|
|
|
## Examples
|
2016-04-28 10:26:58 +01:00
|
|
|
|
2017-08-27 21:33:38 +01:00
|
|
|
Run another health endpoint on http://localhost:8091.
|
|
|
|
|
|
2017-10-10 09:39:35 +02:00
|
|
|
~~~ corefile
|
|
|
|
|
. {
|
|
|
|
|
health localhost:8091
|
|
|
|
|
}
|
2016-04-28 10:26:58 +01:00
|
|
|
~~~
|
2018-01-18 10:40:09 +00:00
|
|
|
|
|
|
|
|
Set a lameduck duration of 1 second:
|
|
|
|
|
|
|
|
|
|
~~~ corefile
|
|
|
|
|
. {
|
2018-03-02 21:40:14 -08:00
|
|
|
health localhost:8092 {
|
2018-01-18 10:40:09 +00:00
|
|
|
lameduck 1s
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
~~~
|
2018-05-09 10:09:06 -04:00
|
|
|
|
|
|
|
|
## Bugs
|
|
|
|
|
|
2019-03-07 22:13:47 +00:00
|
|
|
When reloading, the health handler is stopped before the new server instance is started. If that
|
|
|
|
|
new server fails to start, then the initial server instance is still available and DNS queries still
|
|
|
|
|
served, but health handler stays down. Health will not reply HTTP request until a successful reload
|
|
|
|
|
or a complete restart of CoreDNS.
|