Commit Graph

11 Commits

Author SHA1 Message Date
cangming
500707c43a plugin/forward: add max_age option to enforce an absolute connection lifetime (#7903)
* plugin/pkg/proxy: add max_age for per-connection lifetime cap

Introduce a max_age setting on Transport that closes connections based
on creation time, independent of idle-timeout (expire).

Background: PR #7790 changed the connection pool from LIFO to FIFO for
source-port diversity. Under FIFO, every connection is cycled through
the pool and its used timestamp is refreshed continuously. When request
rate is high enough that pool_size / request_rate < expire, no
connection ever becomes idle and expire never fires. This prevents
CoreDNS from opening new connections to upstreams that scale out (e.g.
new Kubernetes pods behind a ClusterIP service with conntrack pinning).

max_age addresses this by enforcing an absolute upper bound on
connection lifetime regardless of activity:
- persistConn gains a created field set at dial time.
- Transport gains maxAge (default 0 = unlimited, preserving existing
  behaviour).
- Dial(): rejects cached connections whose creation age exceeds max_age.
- cleanup(): when maxAge > 0, uses a linear scan over both idle-timeout
  and max-age predicates; when maxAge == 0, preserves the original
  binary-search path on used time (sorted by FIFO insertion order).
- Both hot paths pre-compute the deadline outside any inner loop to
  avoid repeated time.Now() calls.

Tests added:
- TestMaxAgeExpireByCreation: connection with old created but fresh used
  must be rejected even though idle-timeout would pass.
- TestMaxAgeFIFORotation: three FIFO-rotated connections (old created,
  fresh used) must all be rejected, confirming that continuous rotation
  cannot prevent max-age expiry.

Signed-off-by: cangming <cangming@cangming.app>

* plugin/forward: add max_age option

Expose Transport.SetMaxAge through the forward plugin so operators can
set an absolute upper bound on connection lifetime via the Corefile.

Usage:
  forward . 1.2.3.4 {
      max_age 30s
  }

Default is 0 (unlimited), which preserves existing behaviour.
A positive value causes connections older than max_age to be closed and
re-dialled on the next request, ensuring CoreDNS reconnects to newly
scaled-out upstream pods even under sustained load where the idle
timeout (expire) would never fire.

If max_age is set, it must not be less than expire; the parser rejects
this combination at startup with a clear error message.

Signed-off-by: cangming <cangming@cangming.app>

---------

Signed-off-by: cangming <cangming@cangming.app>
2026-03-09 11:50:03 -07:00
Ville Vesilehto
f3983c1111 perf(proxy): use mutex-based connection pool (#7790)
* perf(proxy): use mutex-based connection pool

The proxy package (used for example by the forward plugin) utilized
an actor model where a single connManager goroutine managed
connection pooling via unbuffered channels (dial, yield, ret). This
design serialized all connection acquisition and release operations
through a single goroutine, creating a bottleneck under high
concurrency. This was observable as a performance degradation when
using a single upstream backend compared to multiple backends
(which sharded the bottleneck).

Changes:
- Removed dial, yield, and ret channels from the Transport struct.
- Removed the connManager goroutine's request processing loop.
- Implemented Dial() and Yield() using a sync.Mutex to protect the
  connection slice, allowing for fast concurrent access without
  context switching.
- Downgraded connManager to a simple background cleanup loop that
  only handles connection expiration on a ticker.
- Updated plugin/pkg/proxy/connect.go to use direct method calls
  instead of channel sends.
- Updated tests to reflect the removal of internal channels.

Benchmarks show that this change eliminates the single-backend
bottleneck. Now a single upstream backend performs on par with
multiple backends, and overall throughput is improved.

The implementation aligns with standard Go patterns for connection
pooling (e.g., net/http.Transport).

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* fix: address PR review for persistent.go

- Named mutex field instead of embedding, to not expose
  Lock() and Unlock()
- Move stop check outside of lock in Yield()
- Close() without a separate goroutine
- Change stop channel to struct

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* fix: address code review feedback for conn pool

- Switch from LIFO to FIFO connection selection for source port
  diversity, reducing DNS cache poisoning risk (RFC 5452).
- Remove "clear entire cache" optimization as it was LIFO-specific.
  FIFO naturally iterates and skips expired connections.
- Remove all goroutines for closing connections; collect connections
  while holding lock, close synchronously after releasing lock.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* fix: remove unused error consts

No longer utilised after refactoring the channel based approach.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* feat(forward): add max_idle_conns option

Add configurable connection pool limit for the forward plugin via
the max_idle_conns Corefile option.

Changes:
- Add SetMaxIdleConns to proxy
- Add maxIdleConns field to Forward struct
- Add max_idle_conns parsing in forward plugin setup
- Apply setting to each proxy during configuration
- Update forward plugin README with new option

By default the value is 0 (unbounded). When set, excess
connections returned to the pool are closed immediately
rather than cached.

Also add a yield related test.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* chore(proxy): simple Dial by closing conns inline

Remove toClose slice collection to reduce complexity. Instead close
expired connections directly while iterating. Reduces complexity with
negligible lock-time impact.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

* chore: fewer explicit Unlock calls

Cleaner and less chance of forgetting to unlock on new possible
code paths.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>

---------

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
2026-01-13 17:49:46 -08:00
Syed Azeez
7b38eb8625 plugin: fix gosec G115 integer overflow warnings (#7799)
Fix integer overflow conversion warnings (G115) by adding appropriate
suppressions where values are provably bounded.

Fixes: https://github.com/coredns/coredns/issues/7793

Changes:
- Updated 56 G115 annotations to use consistent // #nosec G115 format
- Added 2 //nolint:gosec suppressions for conditional expressions
- Removed G115 exclusion from golangci.yml (now explicitly handled per-line)

Suppressions justify why each conversion is safe (e.g., port numbers
are bounded 1-65535, DNS TTL limits, pool lengths, etc.)

Signed-off-by: Azeez Syed <syedazeez337@gmail.com>
2026-01-01 10:20:29 +02:00
Ville Vesilehto
39abf5aeba chore(lint): modernize Go (#7536)
Use modern Go constructs through the modernize analyzer from the
golang.org/x/tools package.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
2025-09-10 13:08:27 -07:00
Ville Vesilehto
8cac83dfb5 lint: enable wastedassign linter (#7340) 2025-06-01 16:30:41 -07:00
Ville Vesilehto
0a48523083 fix(proxy): avoid Dial hang after Transport stopped (#7321)
Ensure Dial exits early or returns error when Transport has been
stopped, instead of blocking on the dial or ret channels. This removes
a potential goroutine leak where callers could pile up waiting
forever under heavy load.

Add select guards before send and receive, and propagate clear error
values so callers can handle shutdown gracefully.

Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
2025-05-28 06:58:48 -07:00
Sri Harsha
4c69549832 Handle UDP responses that overflow with TC bit with test case (#6277)
Signed-off-by: SriHarshaBS001 <SriHarshaBS009@gmail.com>
2023-09-07 15:01:45 -04:00
Chris O'Haver
678d0333af Revert "plugin/forward: Continue waiting after receiving malformed responses (#6014)" (#6270)
This reverts commit 604a902e2c.
2023-08-14 20:33:37 -04:00
Pat Downey
ea293da1d6 Fix forward metrics for backwards compatibility (#6178) 2023-07-04 16:35:55 +02:00
Chris O'Haver
604a902e2c plugin/forward: Continue waiting after receiving malformed responses (#6014)
* forward: continue waiting after malformed responses

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* add test

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* fix test

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* clean up

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* clean up

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* move test to /test/. Add build tag.

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* install libpcap-dev for e2e tests

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* sudo the test

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* remove stray err check

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* disable the test

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* use -exec flag to run test binary as root

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* run new test by itself in a new workflow

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* fix test name

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* only for udp

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* remove libpcap test workflow action

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* remove test, since it cant run in ci

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

* and remove gopacket package

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>

---------

Signed-off-by: Chris O'Haver <cohaver@infoblox.com>
2023-04-29 11:52:00 +02:00
Pat Downey
f823825f8a plugin/forward: Allow Proxy to be used outside of forward plugin. (#5951)
* plugin/forward: Move Proxy into pkg/plugin/proxy, to allow forward.Proxy to be used outside of forward plugin.

Signed-off-by: Patrick Downey <patrick.downey@dioadconsulting.com>
2023-03-24 08:55:51 -04:00