Files
coredns/plugin/forward/forward.go

310 lines
8.5 KiB
Go
Raw Normal View History

// Package forward implements a forwarding proxy. It caches an upstream net.Conn for some time, so if the same
// client returns the upstream's Conn will be precached. Depending on how you benchmark this looks to be
2018-08-14 17:55:55 +02:00
// 50% faster than just opening a new connection for every client. It works with UDP and TCP and uses
// inband healthchecking.
package forward
import (
"context"
"crypto/tls"
"errors"
"sync/atomic"
"time"
"github.com/coredns/coredns/plugin"
"github.com/coredns/coredns/plugin/debug"
plugin/dnstap: various cleanups (#4179) * plugin/dnstap: various cleanups A recent issue made me look into this plugin, I suspect various other cleanups (hopefully deletion of code) can be made as well Remove identical functions ToClientQuery etc, and just use tap.Message as the base type in plugin. Keep msg/ for a few helper functions that may proof useful. This remove the whole test directory as we will just check the things we are interested in which gives much better feedback and keeps that code closer together. tapwr dir is also not needed, writer_test.go was just duplicating the tests already done. This moves writer.go to the top directory. Make the only user of dnstap, the forward plugin, use the newer code also remove the test, a better test there would be a full e2e test to see the correct thing happens. Cleanup the Tapper interface and move it to dnstapio where it belongs, remove higher level interfaces that are not used. This remove dnstap.Tapper and dnstap.IORoutines. Use the standard mechanism for getting access to a plugin and remove shuffling the plugin into the context. Signed-off-by: Miek Gieben <miek@miek.nl> * use opts to get the correct proto Signed-off-by: Miek Gieben <miek@miek.nl> * Various fixes Signed-off-by: Miek Gieben <miek@miek.nl> * Remove bad addr test, as dnstap is only called from within coredns where these fields have been preparsed Signed-off-by: Miek Gieben <miek@miek.nl> * dnstap: remove saving the error all these fields have been preparsed, no need for dnstap to be pedantic and check (and save!) this error again. Simplifies it a bit more. Signed-off-by: Miek Gieben <miek@miek.nl> * Update plugin/forward/dnstap.go Co-authored-by: Ruslan Drozhdzh <30860269+rdrozhdzh@users.noreply.github.com> * Code review Signed-off-by: Miek Gieben <miek@miek.nl> * add back in preferUDP Signed-off-by: Miek Gieben <miek@miek.nl> * nit Signed-off-by: Miek Gieben <miek@miek.nl> Co-authored-by: Ruslan Drozhdzh <30860269+rdrozhdzh@users.noreply.github.com>
2020-10-12 19:10:35 +02:00
"github.com/coredns/coredns/plugin/dnstap"
"github.com/coredns/coredns/plugin/metadata"
clog "github.com/coredns/coredns/plugin/pkg/log"
proxyPkg "github.com/coredns/coredns/plugin/pkg/proxy"
"github.com/coredns/coredns/request"
"github.com/miekg/dns"
ot "github.com/opentracing/opentracing-go"
otext "github.com/opentracing/opentracing-go/ext"
)
var log = clog.NewWithPlugin("forward")
const (
defaultExpire = 10 * time.Second
hcInterval = 500 * time.Millisecond
)
// Forward represents a plugin instance that can proxy requests to another (DNS) server. It has a list
// of proxies each representing one upstream proxy.
type Forward struct {
concurrent int64 // atomic counters need to be first in struct for proper alignment
proxies []*proxyPkg.Proxy
p Policy
hcInterval time.Duration
from string
ignored []string
nextAlternateRcodes []int
tlsConfig *tls.Config
tlsServerName string
maxfails uint32
expire time.Duration
perf(proxy): use mutex-based connection pool (#7790) * perf(proxy): use mutex-based connection pool The proxy package (used for example by the forward plugin) utilized an actor model where a single connManager goroutine managed connection pooling via unbuffered channels (dial, yield, ret). This design serialized all connection acquisition and release operations through a single goroutine, creating a bottleneck under high concurrency. This was observable as a performance degradation when using a single upstream backend compared to multiple backends (which sharded the bottleneck). Changes: - Removed dial, yield, and ret channels from the Transport struct. - Removed the connManager goroutine's request processing loop. - Implemented Dial() and Yield() using a sync.Mutex to protect the connection slice, allowing for fast concurrent access without context switching. - Downgraded connManager to a simple background cleanup loop that only handles connection expiration on a ticker. - Updated plugin/pkg/proxy/connect.go to use direct method calls instead of channel sends. - Updated tests to reflect the removal of internal channels. Benchmarks show that this change eliminates the single-backend bottleneck. Now a single upstream backend performs on par with multiple backends, and overall throughput is improved. The implementation aligns with standard Go patterns for connection pooling (e.g., net/http.Transport). Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> * fix: address PR review for persistent.go - Named mutex field instead of embedding, to not expose Lock() and Unlock() - Move stop check outside of lock in Yield() - Close() without a separate goroutine - Change stop channel to struct Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> * fix: address code review feedback for conn pool - Switch from LIFO to FIFO connection selection for source port diversity, reducing DNS cache poisoning risk (RFC 5452). - Remove "clear entire cache" optimization as it was LIFO-specific. FIFO naturally iterates and skips expired connections. - Remove all goroutines for closing connections; collect connections while holding lock, close synchronously after releasing lock. Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> * fix: remove unused error consts No longer utilised after refactoring the channel based approach. Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> * feat(forward): add max_idle_conns option Add configurable connection pool limit for the forward plugin via the max_idle_conns Corefile option. Changes: - Add SetMaxIdleConns to proxy - Add maxIdleConns field to Forward struct - Add max_idle_conns parsing in forward plugin setup - Apply setting to each proxy during configuration - Update forward plugin README with new option By default the value is 0 (unbounded). When set, excess connections returned to the pool are closed immediately rather than cached. Also add a yield related test. Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> * chore(proxy): simple Dial by closing conns inline Remove toClose slice collection to reduce complexity. Instead close expired connections directly while iterating. Reduces complexity with negligible lock-time impact. Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> * chore: fewer explicit Unlock calls Cleaner and less chance of forgetting to unlock on new possible code paths. Signed-off-by: Ville Vesilehto <ville@vesilehto.fi> --------- Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
2026-01-14 03:49:46 +02:00
maxIdleConns int
maxConcurrent int64
failfastUnhealthyUpstreams bool
failoverRcodes []int
maxConnectAttempts uint32
opts proxyPkg.Options // also here for testing
// ErrLimitExceeded indicates that a query was rejected because the number of concurrent queries has exceeded
// the maximum allowed (maxConcurrent)
ErrLimitExceeded error
tapPlugins []*dnstap.Dnstap // when dnstap plugins are loaded, we use to this to send messages out.
plugin/dnstap: various cleanups (#4179) * plugin/dnstap: various cleanups A recent issue made me look into this plugin, I suspect various other cleanups (hopefully deletion of code) can be made as well Remove identical functions ToClientQuery etc, and just use tap.Message as the base type in plugin. Keep msg/ for a few helper functions that may proof useful. This remove the whole test directory as we will just check the things we are interested in which gives much better feedback and keeps that code closer together. tapwr dir is also not needed, writer_test.go was just duplicating the tests already done. This moves writer.go to the top directory. Make the only user of dnstap, the forward plugin, use the newer code also remove the test, a better test there would be a full e2e test to see the correct thing happens. Cleanup the Tapper interface and move it to dnstapio where it belongs, remove higher level interfaces that are not used. This remove dnstap.Tapper and dnstap.IORoutines. Use the standard mechanism for getting access to a plugin and remove shuffling the plugin into the context. Signed-off-by: Miek Gieben <miek@miek.nl> * use opts to get the correct proto Signed-off-by: Miek Gieben <miek@miek.nl> * Various fixes Signed-off-by: Miek Gieben <miek@miek.nl> * Remove bad addr test, as dnstap is only called from within coredns where these fields have been preparsed Signed-off-by: Miek Gieben <miek@miek.nl> * dnstap: remove saving the error all these fields have been preparsed, no need for dnstap to be pedantic and check (and save!) this error again. Simplifies it a bit more. Signed-off-by: Miek Gieben <miek@miek.nl> * Update plugin/forward/dnstap.go Co-authored-by: Ruslan Drozhdzh <30860269+rdrozhdzh@users.noreply.github.com> * Code review Signed-off-by: Miek Gieben <miek@miek.nl> * add back in preferUDP Signed-off-by: Miek Gieben <miek@miek.nl> * nit Signed-off-by: Miek Gieben <miek@miek.nl> Co-authored-by: Ruslan Drozhdzh <30860269+rdrozhdzh@users.noreply.github.com>
2020-10-12 19:10:35 +02:00
Next plugin.Handler
}
// New returns a new Forward.
func New() *Forward {
f := &Forward{maxfails: 2, tlsConfig: new(tls.Config), expire: defaultExpire, p: new(random), from: ".", hcInterval: hcInterval, opts: proxyPkg.Options{ForceTCP: false, PreferUDP: false, HCRecursionDesired: true, HCDomain: "."}}
return f
}
// SetProxy appends p to the proxy list and starts healthchecking.
func (f *Forward) SetProxy(p *proxyPkg.Proxy) {
f.proxies = append(f.proxies, p)
p.Start(f.hcInterval)
}
// SetProxyOptions setup proxy options
func (f *Forward) SetProxyOptions(opts proxyPkg.Options) {
f.opts = opts
}
// SetTapPlugin appends one or more dnstap plugins to the tap plugin list.
func (f *Forward) SetTapPlugin(tapPlugin *dnstap.Dnstap) {
f.tapPlugins = append(f.tapPlugins, tapPlugin)
if nextPlugin, ok := tapPlugin.Next.(*dnstap.Dnstap); ok {
f.SetTapPlugin(nextPlugin)
}
}
// Len returns the number of configured proxies.
func (f *Forward) Len() int { return len(f.proxies) }
// Name implements plugin.Handler.
func (f *Forward) Name() string { return "forward" }
// ServeDNS implements plugin.Handler.
func (f *Forward) ServeDNS(ctx context.Context, w dns.ResponseWriter, r *dns.Msg) (int, error) {
state := request.Request{W: w, Req: r}
if !f.match(state) {
return plugin.NextOrFailure(f.Name(), f.Next, ctx, w, r)
}
if f.maxConcurrent > 0 {
count := atomic.AddInt64(&(f.concurrent), 1)
defer atomic.AddInt64(&(f.concurrent), -1)
if count > f.maxConcurrent {
maxConcurrentRejectCount.Add(1)
return dns.RcodeRefused, f.ErrLimitExceeded
}
}
fails := 0
var span, child ot.Span
var upstreamErr error
span = ot.SpanFromContext(ctx)
i := 0
list := f.List()
deadline := time.Now().Add(defaultTimeout)
start := time.Now()
connectAttempts := uint32(0)
for time.Now().Before(deadline) && ctx.Err() == nil && (f.maxConnectAttempts == 0 || connectAttempts < f.maxConnectAttempts) {
if i >= len(list) {
// reached the end of list, reset to begin
i = 0
fails = 0
}
proxy := list[i]
i++
if proxy.Down(f.maxfails) {
fails++
if fails < len(f.proxies) {
continue
}
healthcheckBrokenCount.Add(1)
// All upstreams are dead, return servfail if all upstreams are down
if f.failfastUnhealthyUpstreams {
break
}
// assume healthcheck is completely broken and randomly
// select an upstream to connect to.
r := new(random)
proxy = r.List(f.proxies)[0]
}
if span != nil {
child = span.Tracer().StartSpan("connect", ot.ChildOf(span.Context()))
otext.PeerAddress.Set(child, proxy.Addr())
ctx = ot.ContextWithSpan(ctx, child)
}
metadata.SetValueFunc(ctx, "forward/upstream", func() string {
return proxy.Addr()
})
var (
ret *dns.Msg
err error
)
opts := f.opts
for {
ret, err = proxy.Connect(ctx, state, opts)
if err == proxyPkg.ErrCachedClosed { // Remote side closed conn, can only happen with TCP.
continue
}
// Retry with TCP if truncated and prefer_udp configured.
if ret != nil && ret.Truncated && !opts.ForceTCP && opts.PreferUDP {
opts.ForceTCP = true
continue
}
break
}
if child != nil {
child.Finish()
}
plugin/dnstap: various cleanups (#4179) * plugin/dnstap: various cleanups A recent issue made me look into this plugin, I suspect various other cleanups (hopefully deletion of code) can be made as well Remove identical functions ToClientQuery etc, and just use tap.Message as the base type in plugin. Keep msg/ for a few helper functions that may proof useful. This remove the whole test directory as we will just check the things we are interested in which gives much better feedback and keeps that code closer together. tapwr dir is also not needed, writer_test.go was just duplicating the tests already done. This moves writer.go to the top directory. Make the only user of dnstap, the forward plugin, use the newer code also remove the test, a better test there would be a full e2e test to see the correct thing happens. Cleanup the Tapper interface and move it to dnstapio where it belongs, remove higher level interfaces that are not used. This remove dnstap.Tapper and dnstap.IORoutines. Use the standard mechanism for getting access to a plugin and remove shuffling the plugin into the context. Signed-off-by: Miek Gieben <miek@miek.nl> * use opts to get the correct proto Signed-off-by: Miek Gieben <miek@miek.nl> * Various fixes Signed-off-by: Miek Gieben <miek@miek.nl> * Remove bad addr test, as dnstap is only called from within coredns where these fields have been preparsed Signed-off-by: Miek Gieben <miek@miek.nl> * dnstap: remove saving the error all these fields have been preparsed, no need for dnstap to be pedantic and check (and save!) this error again. Simplifies it a bit more. Signed-off-by: Miek Gieben <miek@miek.nl> * Update plugin/forward/dnstap.go Co-authored-by: Ruslan Drozhdzh <30860269+rdrozhdzh@users.noreply.github.com> * Code review Signed-off-by: Miek Gieben <miek@miek.nl> * add back in preferUDP Signed-off-by: Miek Gieben <miek@miek.nl> * nit Signed-off-by: Miek Gieben <miek@miek.nl> Co-authored-by: Ruslan Drozhdzh <30860269+rdrozhdzh@users.noreply.github.com>
2020-10-12 19:10:35 +02:00
if len(f.tapPlugins) != 0 {
toDnstap(ctx, f, proxy.Addr(), state, opts, ret, start)
plugin/dnstap: various cleanups (#4179) * plugin/dnstap: various cleanups A recent issue made me look into this plugin, I suspect various other cleanups (hopefully deletion of code) can be made as well Remove identical functions ToClientQuery etc, and just use tap.Message as the base type in plugin. Keep msg/ for a few helper functions that may proof useful. This remove the whole test directory as we will just check the things we are interested in which gives much better feedback and keeps that code closer together. tapwr dir is also not needed, writer_test.go was just duplicating the tests already done. This moves writer.go to the top directory. Make the only user of dnstap, the forward plugin, use the newer code also remove the test, a better test there would be a full e2e test to see the correct thing happens. Cleanup the Tapper interface and move it to dnstapio where it belongs, remove higher level interfaces that are not used. This remove dnstap.Tapper and dnstap.IORoutines. Use the standard mechanism for getting access to a plugin and remove shuffling the plugin into the context. Signed-off-by: Miek Gieben <miek@miek.nl> * use opts to get the correct proto Signed-off-by: Miek Gieben <miek@miek.nl> * Various fixes Signed-off-by: Miek Gieben <miek@miek.nl> * Remove bad addr test, as dnstap is only called from within coredns where these fields have been preparsed Signed-off-by: Miek Gieben <miek@miek.nl> * dnstap: remove saving the error all these fields have been preparsed, no need for dnstap to be pedantic and check (and save!) this error again. Simplifies it a bit more. Signed-off-by: Miek Gieben <miek@miek.nl> * Update plugin/forward/dnstap.go Co-authored-by: Ruslan Drozhdzh <30860269+rdrozhdzh@users.noreply.github.com> * Code review Signed-off-by: Miek Gieben <miek@miek.nl> * add back in preferUDP Signed-off-by: Miek Gieben <miek@miek.nl> * nit Signed-off-by: Miek Gieben <miek@miek.nl> Co-authored-by: Ruslan Drozhdzh <30860269+rdrozhdzh@users.noreply.github.com>
2020-10-12 19:10:35 +02:00
}
upstreamErr = err
if err != nil {
// Kick off health check to see if *our* upstream is broken.
if f.maxfails != 0 {
proxy.Healthcheck()
}
// If a per-request connect-attempt cap is configured, count this
// failed connect attempt and stop retrying when the cap is hit.
if f.maxConnectAttempts > 0 {
connectAttempts++
if connectAttempts >= f.maxConnectAttempts {
break
}
}
if fails < len(f.proxies) {
continue
}
break
}
// Check if the reply is correct; if not return FormErr.
if !state.Match(ret) {
debug.Hexdumpf(ret, "Wrong reply for id: %d, %s %d", ret.Id, state.QName(), state.QType())
formerr := new(dns.Msg)
formerr.SetRcode(state.Req, dns.RcodeFormatError)
w.WriteMsg(formerr)
plugin/dnstap: various cleanups (#4179) * plugin/dnstap: various cleanups A recent issue made me look into this plugin, I suspect various other cleanups (hopefully deletion of code) can be made as well Remove identical functions ToClientQuery etc, and just use tap.Message as the base type in plugin. Keep msg/ for a few helper functions that may proof useful. This remove the whole test directory as we will just check the things we are interested in which gives much better feedback and keeps that code closer together. tapwr dir is also not needed, writer_test.go was just duplicating the tests already done. This moves writer.go to the top directory. Make the only user of dnstap, the forward plugin, use the newer code also remove the test, a better test there would be a full e2e test to see the correct thing happens. Cleanup the Tapper interface and move it to dnstapio where it belongs, remove higher level interfaces that are not used. This remove dnstap.Tapper and dnstap.IORoutines. Use the standard mechanism for getting access to a plugin and remove shuffling the plugin into the context. Signed-off-by: Miek Gieben <miek@miek.nl> * use opts to get the correct proto Signed-off-by: Miek Gieben <miek@miek.nl> * Various fixes Signed-off-by: Miek Gieben <miek@miek.nl> * Remove bad addr test, as dnstap is only called from within coredns where these fields have been preparsed Signed-off-by: Miek Gieben <miek@miek.nl> * dnstap: remove saving the error all these fields have been preparsed, no need for dnstap to be pedantic and check (and save!) this error again. Simplifies it a bit more. Signed-off-by: Miek Gieben <miek@miek.nl> * Update plugin/forward/dnstap.go Co-authored-by: Ruslan Drozhdzh <30860269+rdrozhdzh@users.noreply.github.com> * Code review Signed-off-by: Miek Gieben <miek@miek.nl> * add back in preferUDP Signed-off-by: Miek Gieben <miek@miek.nl> * nit Signed-off-by: Miek Gieben <miek@miek.nl> Co-authored-by: Ruslan Drozhdzh <30860269+rdrozhdzh@users.noreply.github.com>
2020-10-12 19:10:35 +02:00
return 0, nil
}
// Check if we have a failover Rcode defined, check if we match on the code
tryNext := false
for _, failoverRcode := range f.failoverRcodes {
// if we match, we continue to the next upstream in the list
if failoverRcode == ret.Rcode {
if fails < len(f.proxies) {
tryNext = true
}
}
}
if tryNext {
fails++
continue
}
// Check if we have an alternate Rcode defined, check if we match on the code
for _, alternateRcode := range f.nextAlternateRcodes {
if alternateRcode == ret.Rcode && f.Next != nil { // In case we do not have a Next handler, just continue normally
if _, ok := f.Next.(*Forward); ok { // Only continue if the next forwarder is also a Forworder
return plugin.NextOrFailure(f.Name(), f.Next, ctx, w, r)
}
}
}
w.WriteMsg(ret)
plugin/dnstap: various cleanups (#4179) * plugin/dnstap: various cleanups A recent issue made me look into this plugin, I suspect various other cleanups (hopefully deletion of code) can be made as well Remove identical functions ToClientQuery etc, and just use tap.Message as the base type in plugin. Keep msg/ for a few helper functions that may proof useful. This remove the whole test directory as we will just check the things we are interested in which gives much better feedback and keeps that code closer together. tapwr dir is also not needed, writer_test.go was just duplicating the tests already done. This moves writer.go to the top directory. Make the only user of dnstap, the forward plugin, use the newer code also remove the test, a better test there would be a full e2e test to see the correct thing happens. Cleanup the Tapper interface and move it to dnstapio where it belongs, remove higher level interfaces that are not used. This remove dnstap.Tapper and dnstap.IORoutines. Use the standard mechanism for getting access to a plugin and remove shuffling the plugin into the context. Signed-off-by: Miek Gieben <miek@miek.nl> * use opts to get the correct proto Signed-off-by: Miek Gieben <miek@miek.nl> * Various fixes Signed-off-by: Miek Gieben <miek@miek.nl> * Remove bad addr test, as dnstap is only called from within coredns where these fields have been preparsed Signed-off-by: Miek Gieben <miek@miek.nl> * dnstap: remove saving the error all these fields have been preparsed, no need for dnstap to be pedantic and check (and save!) this error again. Simplifies it a bit more. Signed-off-by: Miek Gieben <miek@miek.nl> * Update plugin/forward/dnstap.go Co-authored-by: Ruslan Drozhdzh <30860269+rdrozhdzh@users.noreply.github.com> * Code review Signed-off-by: Miek Gieben <miek@miek.nl> * add back in preferUDP Signed-off-by: Miek Gieben <miek@miek.nl> * nit Signed-off-by: Miek Gieben <miek@miek.nl> Co-authored-by: Ruslan Drozhdzh <30860269+rdrozhdzh@users.noreply.github.com>
2020-10-12 19:10:35 +02:00
return 0, nil
}
if upstreamErr != nil {
return dns.RcodeServerFailure, upstreamErr
}
return dns.RcodeServerFailure, ErrNoHealthy
}
func (f *Forward) match(state request.Request) bool {
if !plugin.Name(f.from).Matches(state.Name()) || !f.isAllowedDomain(state.Name()) {
return false
}
return true
}
func (f *Forward) isAllowedDomain(name string) bool {
if dns.Name(name) == dns.Name(f.from) {
return true
}
for _, ignore := range f.ignored {
if plugin.Name(ignore).Matches(name) {
return false
}
}
return true
}
// ForceTCP returns if TCP is forced to be used even when the request comes in over UDP.
func (f *Forward) ForceTCP() bool { return f.opts.ForceTCP }
// PreferUDP returns if UDP is preferred to be used even when the request comes in over TCP.
func (f *Forward) PreferUDP() bool { return f.opts.PreferUDP }
// List returns a set of proxies to be used for this client depending on the policy in f.
func (f *Forward) List() []*proxyPkg.Proxy { return f.p.List(f.proxies) }
var (
// ErrNoHealthy means no healthy proxies left.
ErrNoHealthy = errors.New("no healthy proxies")
// ErrNoForward means no forwarder defined.
ErrNoForward = errors.New("no forwarder defined")
// ErrCachedClosed means cached connection was closed by peer.
ErrCachedClosed = errors.New("cached connection was closed by peer")
)
// Options holds various Options that can be set.
type Options struct {
// ForceTCP use TCP protocol for upstream DNS request. Has precedence over PreferUDP flag
ForceTCP bool
// PreferUDP use UDP protocol for upstream DNS request.
PreferUDP bool
// HCRecursionDesired sets recursion desired flag for Proxy healthcheck requests
HCRecursionDesired bool
// HCDomain sets domain for Proxy healthcheck requests
HCDomain string
}
var defaultTimeout = 5 * time.Second