plugin/proxy: decrease health timeouts (#1107)

Turn down the timeouts and numbers a bit:
FailTimeout 10s -> 5s
Future 60s -> 12s
TryDuration 60s -> 16s
The timeout for decrementing the fails in a host: 10s -> 2s

And the biggest change: don't set fails when the error is Timeout(),
meaning we loop for a bit and may try the same server again, but we
don't mark our upstream as bad, see comments in proxy.go. Testing this
with "ANY isc.org" and "MX miek.nl" we see:

~~~
::1 - [24/Sep/2017:08:06:17 +0100] "ANY IN isc.org. udp 37 false 4096" SERVFAIL qr,rd 37 10.001621221s
24/Sep/2017:08:06:17 +0100 [ERROR 0 isc.org. ANY] unreachable backend: read udp 192.168.1.148:37420->8.8.8.8:53: i/o timeout

::1 - [24/Sep/2017:08:06:17 +0100] "MX IN miek.nl. udp 37 false 4096" NOERROR qr,rd,ra,ad 170 35.957284ms

127.0.0.1 - [24/Sep/2017:08:06:18 +0100] "ANY IN isc.org. udp 37 false 4096" SERVFAIL qr,rd 37 10.002051726s
24/Sep/2017:08:06:18 +0100 [ERROR 0 isc.org. ANY] unreachable backend: read udp 192.168.1.148:54901->8.8.8.8:53: i/o timeout

::1 - [24/Sep/2017:08:06:19 +0100] "MX IN miek.nl. udp 37 false 4096" NOERROR qr,rd,ra,ad 170 56.848416ms
127.0.0.1 - [24/Sep/2017:08:06:21 +0100] "MX IN miek.nl. udp 37 false 4096" NOERROR qr,rd,ra,ad 170 48.118349ms
::1 - [24/Sep/2017:08:06:21 +0100] "MX IN miek.nl. udp 37 false 4096" NOERROR qr,rd,ra,ad 170 1.055172915s
~~~

So the ANY isc.org queries show up twice, because we retry internally -
this is I think WAI.

The `miek.nl MX` queries are just processed normally as no backend is
marked as unreachable.

May fix #1035 #486
This commit is contained in:
Miek Gieben
2017-09-24 20:05:36 +01:00
committed by GitHub
parent 148a99442d
commit 2a32cd4159
5 changed files with 41 additions and 15 deletions

View File

@@ -41,10 +41,10 @@ proxy FROM TO... {
communicate with it. The default value is 10 seconds ("10s").
* `max_fails` is the number of failures within fail_timeout that are needed before considering
a backend to be down. If 0, the backend will never be marked as down. Default is 1.
* `health_check` will check path (on port) on each backend. If a backend returns a status code of
* `health_check` will check **PATH** (on **PORT**) on each backend. If a backend returns a status code of
200-399, then that backend is marked healthy for double the healthcheck duration. If it doesn't,
it is marked as unhealthy and no requests are routed to it. If this option is not provided then
health checks are disabled. The default duration is 30 seconds ("30s").
health checks are disabled. The default duration is 4 seconds ("4s").
* **IGNORED_NAMES** in `except` is a space-separated list of domains to exclude from proxying.
Requests that match none of these names will be passed through.
* `spray` when all backends are unhealthy, randomly pick one to send the traffic to. (This is