plugin/proxy: decrease health timeouts (#1107)

Turn down the timeouts and numbers a bit: FailTimeout 10s -> 5s Future 60s -> 12s TryDuration 60s -> 16s The timeout for decrementing the fails in a host: 10s -> 2s And the biggest change: don't set fails when the error is Timeout(), meaning we loop for a bit and may try the same server again, but we don't mark our upstream as bad, see comments in proxy.go. Testing this with "ANY isc.org" and "MX miek.nl" we see: ~~~ ::1 - [24/Sep/2017:08:06:17 +0100] "ANY IN isc.org. udp 37 false 4096" SERVFAIL qr,rd 37 10.001621221s 24/Sep/2017:08:06:17 +0100 [ERROR 0 isc.org. ANY] unreachable backend: read udp 192.168.1.148:37420->8.8.8.8:53: i/o timeout ::1 - [24/Sep/2017:08:06:17 +0100] "MX IN miek.nl. udp 37 false 4096" NOERROR qr,rd,ra,ad 170 35.957284ms 127.0.0.1 - [24/Sep/2017:08:06:18 +0100] "ANY IN isc.org. udp 37 false 4096" SERVFAIL qr,rd 37 10.002051726s 24/Sep/2017:08:06:18 +0100 [ERROR 0 isc.org. ANY] unreachable backend: read udp 192.168.1.148:54901->8.8.8.8:53: i/o timeout ::1 - [24/Sep/2017:08:06:19 +0100] "MX IN miek.nl. udp 37 false 4096" NOERROR qr,rd,ra,ad 170 56.848416ms 127.0.0.1 - [24/Sep/2017:08:06:21 +0100] "MX IN miek.nl. udp 37 false 4096" NOERROR qr,rd,ra,ad 170 48.118349ms ::1 - [24/Sep/2017:08:06:21 +0100] "MX IN miek.nl. udp 37 false 4096" NOERROR qr,rd,ra,ad 170 1.055172915s ~~~ So the ANY isc.org queries show up twice, because we retry internally - this is I think WAI. The `miek.nl MX` queries are just processed normally as no backend is marked as unreachable. May fix #1035 #486
2026-02-03 22:13:09 -05:00 · 2017-09-24 20:05:36 +01:00
parent 148a99442d
commit 2a32cd4159
5 changed files with 41 additions and 15 deletions
--- a/plugin/proxy/lookup.go
+++ b/plugin/proxy/lookup.go
@@ -5,6 +5,7 @@ package proxy
 import (
 	"context"
 	"fmt"
+	"net"
 	"sync/atomic"
 	"time"

@@ -26,9 +27,9 @@ func NewLookupWithOption(hosts []string, opts Options) Proxy {
 	upstream := &staticUpstream{
 		from: ".",
 		HealthCheck: healthcheck.HealthCheck{
-			FailTimeout: 10 * time.Second,
-			MaxFails:    3, // TODO(miek): disable error checking for simple lookups?
-			Future:      60 * time.Second,
+			FailTimeout: 5 * time.Second,
+			MaxFails:    3,
+			Future:      12 * time.Second,
 		},
 		ex: newDNSExWithOption(opts),
 	}
@@ -85,7 +86,7 @@ func (p Proxy) lookup(state request.Request) (*dns.Msg, error) {
 			}

 			// duplicated from proxy.go, but with a twist, we don't write the
-			// reply back to the client, we return it and there is no monitoring.
+			// reply back to the client, we return it and there is no monitoring to update here.

 			atomic.AddInt64(&host.Conns, 1)

@@ -96,11 +97,20 @@ func (p Proxy) lookup(state request.Request) (*dns.Msg, error) {
 			if backendErr == nil {
 				return reply, nil
 			}
+
+			if oe, ok := backendErr.(*net.OpError); ok {
+				if oe.Timeout() { // see proxy.go for docs.
+					continue
+				}
+			}
+
 			timeout := host.FailTimeout
 			if timeout == 0 {
-				timeout = 10 * time.Second
+				timeout = 2 * time.Second
 			}
+
 			atomic.AddInt32(&host.Fails, 1)
+
 			go func(host *healthcheck.UpstreamHost, timeout time.Duration) {
 				time.Sleep(timeout)
 				atomic.AddInt32(&host.Fails, -1)