mirror of https://github.com/coredns/coredns.git synced 2026-06-26 10:50:14 -04:00

Files

Miek Gieben c778b3a67c plugin/health: remove ability to poll other plugins (#2547 )

* plugin/health: remove ability to poll other plugins

This mechanism defeats the purpose any plugin (mostly) caching can still
be alive, we can probably forward queries still. Don't poll plugins,
just tell the world we're up and running.

It was only actually used in kubernetes; and there specifically would
mean any network hiccup would NACK the entire server health.

Fixes: #2534

Signed-off-by: Miek Gieben <miek@miek.nl>

* update docs based on feedback

Signed-off-by: Miek Gieben <miek@miek.nl>

2019-03-07 22:13:47 +00:00

health_test.go

plugin/health: remove ability to poll other plugins (#2547 )

2019-03-07 22:13:47 +00:00

health.go

plugin/health: remove ability to poll other plugins (#2547 )

2019-03-07 22:13:47 +00:00

log_test.go

Clean up tests logging (#1979 )

2018-07-19 16:23:06 +01:00

overloaded.go

plugin/metrics: add MustRegister function (#1648 )

2018-04-01 13:58:13 +01:00

OWNERS

Add OWNERS file (#1486 )

2018-02-08 10:55:51 +00:00

README.md

plugin/health: remove ability to poll other plugins (#2547 )

2019-03-07 22:13:47 +00:00

setup_test.go

plugin/health: add lameduck mode (#1379 )

2018-01-18 10:40:09 +00:00

setup.go

plugin/health: remove ability to poll other plugins (#2547 )

2019-03-07 22:13:47 +00:00

README.md

health

Name

health - enables a health check endpoint.

Description

Enabled process wide health endpoint. When CoreDNS is up and running this returns a 200 OK http status code. The health is exported, by default, on port 8080/health .

Syntax

health [ADDRESS]

Optionally takes an address; the default is :8080. The health path is fixed to /health. The health endpoint returns a 200 response code and the word "OK" when this server is healthy.

An extra option can be set with this extended syntax:

health [ADDRESS] {
    lameduck DURATION
}

Where lameduck will make the process unhealthy then wait for DURATION before the process shuts down.

If you have multiple Server Blocks, health should only be enabled in one of them (as it is process wide). If you really need multiple endpoints, you must run health endpoints on different ports:

com {
    whoami
    health :8080
}

net {
    erratic
    health :8081
}

Metrics

If monitoring is enabled (via the prometheus directive) then the following metric is exported:

coredns_health_request_duration_seconds{} - duration to process a /health query. As this should be a local operation it should be fast. A (large) increases in this duration indicates the CoreDNS process is having trouble keeping up with its query load.

Note that this metric does not have a server label, because being overloaded is a symptom of the running process, not a specific server.

Examples

Run another health endpoint on http://localhost:8091.

. {
    health localhost:8091
}

Set a lameduck duration of 1 second:

. {
    health localhost:8092 {
        lameduck 1s
    }
}

Bugs

When reloading, the health handler is stopped before the new server instance is started. If that new server fails to start, then the initial server instance is still available and DNS queries still served, but health handler stays down. Health will not reply HTTP request until a successful reload or a complete restart of CoreDNS.