mirror of
https://github.com/coredns/coredns.git
synced 2025-11-01 18:53:43 -04:00
plugin/forward using pkg/up (#1493)
* plugin/forward: on demand healtchecking Only start doing health checks when we encouner an error (any error). This uses the new pluing/pkg/up package to abstract away the actual checking. This reduces the LOC quite a bit; does need more testing, unit testing and tcpdumping a bit. * fix tests * Fix readme * Use pkg/up for healthchecks * remove unused channel * more cleanups * update readme * * Again do go generate and go build; still referencing the wrong forward repo? Anyway fixed. * Use pkg/up for doing the healtchecks to cut back on unwanted queries * Change up.Func to return an error instead of a boolean. * Drop the string target argument as it doesn't make sense. * Add healthcheck test on failing to get an upstream answer. TODO(miek): double check Forward and Lookup and how they interact with HC, and if we correctly call close() on those * actual test * Tests here * more tests * try getting rid of host * Get rid of the host indirection * Finish removing hosts * moar testing * import fmt * field is not used * docs * move some stuff * bring back health_check * maxfails=0 test * git and merging, bah * review
This commit is contained in:
@@ -6,10 +6,17 @@
|
||||
|
||||
## Description
|
||||
|
||||
The *forward* plugin is generally faster (~30+%) than *proxy* as it re-uses already opened sockets
|
||||
to the upstreams. It supports UDP, TCP and DNS-over-TLS and uses inband health checking that is
|
||||
enabled by default.
|
||||
When *all* upstreams are down it assumes healtchecking as a mechanism has failed and will try to
|
||||
The *forward* plugin re-uses already opened sockets to the upstreams. It supports UDP, TCP and
|
||||
DNS-over-TLS and uses in band health checking.
|
||||
|
||||
When it detects an error a health check is performed. This checks runs in a loop, every *0.5s*, for
|
||||
as long as the upstream reports unhealthy. Once healthy we stop health checking (until the next
|
||||
error). The health checks use a recursive DNS query (`. IN NS`) to get upstream health. Any response
|
||||
that is not a network error (REFUSED, NOTIMPL, SERVFAIL, etc) is taken as a healthy upstream. The
|
||||
health check uses the same protocol as specified in **TO**. If `max_fails` is set to 0, no checking
|
||||
is performed and upstreams will always be considered healthy.
|
||||
|
||||
When *all* upstreams are down it assumes health checking as a mechanism has failed and will try to
|
||||
connect to a random upstream (which may or may not work).
|
||||
|
||||
## Syntax
|
||||
@@ -22,16 +29,11 @@ forward FROM TO...
|
||||
|
||||
* **FROM** is the base domain to match for the request to be forwarded.
|
||||
* **TO...** are the destination endpoints to forward to. The **TO** syntax allows you to specify
|
||||
a protocol, `tls://9.9.9.9` or `dns://` for plain DNS. The number of upstreams is limited to 15.
|
||||
a protocol, `tls://9.9.9.9` or `dns://` (or no protocol) for plain DNS. The number of upstreams is
|
||||
limited to 15.
|
||||
|
||||
The health checks are done every *0.5s*. After *two* failed checks the upstream is considered
|
||||
unhealthy. The health checks use a recursive DNS query (`. IN NS`) to get upstream health. Any
|
||||
response that is not an error (REFUSED, NOTIMPL, SERVFAIL, etc) is taken as a healthy upstream. The
|
||||
health check uses the same protocol as specific in the **TO**. On startup each upstream is marked
|
||||
unhealthy until it passes a health check. A 0 duration will disable any health checks.
|
||||
|
||||
Multiple upstreams are randomized (default policy) on first use. When a healthy proxy returns an
|
||||
error during the exchange the next upstream in the list is tried.
|
||||
Multiple upstreams are randomized (see `policy`) on first use. When a healthy proxy returns an error
|
||||
during the exchange the next upstream in the list is tried.
|
||||
|
||||
Extra knobs are available with an expanded syntax:
|
||||
|
||||
@@ -39,12 +41,12 @@ Extra knobs are available with an expanded syntax:
|
||||
forward FROM TO... {
|
||||
except IGNORED_NAMES...
|
||||
force_tcp
|
||||
health_check DURATION
|
||||
expire DURATION
|
||||
max_fails INTEGER
|
||||
tls CERT KEY CA
|
||||
tls_servername NAME
|
||||
policy random|round_robin
|
||||
health_checks DURATION
|
||||
}
|
||||
~~~
|
||||
|
||||
@@ -52,21 +54,16 @@ forward FROM TO... {
|
||||
* **IGNORED_NAMES** in `except` is a space-separated list of domains to exclude from forwarding.
|
||||
Requests that match none of these names will be passed through.
|
||||
* `force_tcp`, use TCP even when the request comes in over UDP.
|
||||
* `health_checks`, use a different **DURATION** for health checking, the default duration is 0.5s.
|
||||
A value of 0 disables the health checks completely.
|
||||
* `max_fails` is the number of subsequent failed health checks that are needed before considering
|
||||
a backend to be down. If 0, the backend will never be marked as down. Default is 2.
|
||||
an upstream to be down. If 0, the upstream will never be marked as down (nor health checked).
|
||||
Default is 2.
|
||||
* `expire` **DURATION**, expire (cached) connections after this time, the default is 10s.
|
||||
* `tls` **CERT** **KEY** **CA** define the TLS properties for TLS; if you leave this out the
|
||||
system's configuration will be used.
|
||||
* `tls_servername` **NAME** allows you to set a server name in the TLS configuration; for instance 9.9.9.9
|
||||
needs this to be set to `dns.quad9.net`.
|
||||
* `policy` specifies the policy to use for selecting upstream servers. The default is `random`.
|
||||
|
||||
The upstream selection is done via random (default policy) selection. If the socket for this client
|
||||
isn't known *forward* will randomly choose one. If this turns out to be unhealthy, the next one is
|
||||
tried. If *all* hosts are down, we assume health checking is broken and select a *random* upstream to
|
||||
try.
|
||||
* `health_checks`, use a different **DURATION** for health checking, the default duration is 0.5s.
|
||||
|
||||
Also note the TLS config is "global" for the whole forwarding proxy if you need a different
|
||||
`tls-name` for different upstreams you're out of luck.
|
||||
@@ -80,7 +77,7 @@ If monitoring is enabled (via the *prometheus* directive) then the following met
|
||||
* `coredns_forward_response_rcode_total{to, rcode}` - count of RCODEs per upstream.
|
||||
* `coredns_forward_healthcheck_failure_count_total{to}` - number of failed health checks per upstream.
|
||||
* `coredns_forward_healthcheck_broken_count_total{}` - counter of when all upstreams are unhealthy,
|
||||
and we are randomly spraying to a target.
|
||||
and we are randomly (this always uses the `random` policy) spraying to an upstream.
|
||||
* `coredns_forward_socket_count_total{to}` - number of cached sockets per upstream.
|
||||
|
||||
Where `to` is one of the upstream servers (**TO** from the config), `proto` is the protocol used by
|
||||
@@ -125,16 +122,10 @@ Proxy everything except `example.org` using the host's `resolv.conf`'s nameserve
|
||||
}
|
||||
~~~
|
||||
|
||||
Forward to a IPv6 host:
|
||||
|
||||
~~~ corefile
|
||||
. {
|
||||
forward . [::1]:1053
|
||||
}
|
||||
~~~
|
||||
|
||||
Proxy all requests to 9.9.9.9 using the DNS-over-TLS protocol, and cache every answer for up to 30
|
||||
seconds.
|
||||
seconds. Note the `tls_servername` is mandatory if you want a working setup, as 9.9.9.9 can't be
|
||||
used in the TLS negotiation. Also set the health check duration to 5s to not completely swamp the
|
||||
service with health checks.
|
||||
|
||||
~~~ corefile
|
||||
. {
|
||||
@@ -148,7 +139,7 @@ seconds.
|
||||
|
||||
## Bugs
|
||||
|
||||
The TLS config is global for the whole forwarding proxy if you need a different `tls-name` for
|
||||
The TLS config is global for the whole forwarding proxy if you need a different `tls_serveraame` for
|
||||
different upstreams you're out of luck.
|
||||
|
||||
## Also See
|
||||
|
||||
Reference in New Issue
Block a user