* plugin/pkg/proxy: add max_age for per-connection lifetime cap
Introduce a max_age setting on Transport that closes connections based
on creation time, independent of idle-timeout (expire).
Background: PR #7790 changed the connection pool from LIFO to FIFO for
source-port diversity. Under FIFO, every connection is cycled through
the pool and its used timestamp is refreshed continuously. When request
rate is high enough that pool_size / request_rate < expire, no
connection ever becomes idle and expire never fires. This prevents
CoreDNS from opening new connections to upstreams that scale out (e.g.
new Kubernetes pods behind a ClusterIP service with conntrack pinning).
max_age addresses this by enforcing an absolute upper bound on
connection lifetime regardless of activity:
- persistConn gains a created field set at dial time.
- Transport gains maxAge (default 0 = unlimited, preserving existing
behaviour).
- Dial(): rejects cached connections whose creation age exceeds max_age.
- cleanup(): when maxAge > 0, uses a linear scan over both idle-timeout
and max-age predicates; when maxAge == 0, preserves the original
binary-search path on used time (sorted by FIFO insertion order).
- Both hot paths pre-compute the deadline outside any inner loop to
avoid repeated time.Now() calls.
Tests added:
- TestMaxAgeExpireByCreation: connection with old created but fresh used
must be rejected even though idle-timeout would pass.
- TestMaxAgeFIFORotation: three FIFO-rotated connections (old created,
fresh used) must all be rejected, confirming that continuous rotation
cannot prevent max-age expiry.
Signed-off-by: cangming <cangming@cangming.app>
* plugin/forward: add max_age option
Expose Transport.SetMaxAge through the forward plugin so operators can
set an absolute upper bound on connection lifetime via the Corefile.
Usage:
forward . 1.2.3.4 {
max_age 30s
}
Default is 0 (unlimited), which preserves existing behaviour.
A positive value causes connections older than max_age to be closed and
re-dialled on the next request, ensuring CoreDNS reconnects to newly
scaled-out upstream pods even under sustained load where the idle
timeout (expire) would never fire.
If max_age is set, it must not be less than expire; the parser rejects
this combination at startup with a clear error message.
Signed-off-by: cangming <cangming@cangming.app>
---------
Signed-off-by: cangming <cangming@cangming.app>
* perf(proxy): use mutex-based connection pool
The proxy package (used for example by the forward plugin) utilized
an actor model where a single connManager goroutine managed
connection pooling via unbuffered channels (dial, yield, ret). This
design serialized all connection acquisition and release operations
through a single goroutine, creating a bottleneck under high
concurrency. This was observable as a performance degradation when
using a single upstream backend compared to multiple backends
(which sharded the bottleneck).
Changes:
- Removed dial, yield, and ret channels from the Transport struct.
- Removed the connManager goroutine's request processing loop.
- Implemented Dial() and Yield() using a sync.Mutex to protect the
connection slice, allowing for fast concurrent access without
context switching.
- Downgraded connManager to a simple background cleanup loop that
only handles connection expiration on a ticker.
- Updated plugin/pkg/proxy/connect.go to use direct method calls
instead of channel sends.
- Updated tests to reflect the removal of internal channels.
Benchmarks show that this change eliminates the single-backend
bottleneck. Now a single upstream backend performs on par with
multiple backends, and overall throughput is improved.
The implementation aligns with standard Go patterns for connection
pooling (e.g., net/http.Transport).
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: address PR review for persistent.go
- Named mutex field instead of embedding, to not expose
Lock() and Unlock()
- Move stop check outside of lock in Yield()
- Close() without a separate goroutine
- Change stop channel to struct
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: address code review feedback for conn pool
- Switch from LIFO to FIFO connection selection for source port
diversity, reducing DNS cache poisoning risk (RFC 5452).
- Remove "clear entire cache" optimization as it was LIFO-specific.
FIFO naturally iterates and skips expired connections.
- Remove all goroutines for closing connections; collect connections
while holding lock, close synchronously after releasing lock.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: remove unused error consts
No longer utilised after refactoring the channel based approach.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* feat(forward): add max_idle_conns option
Add configurable connection pool limit for the forward plugin via
the max_idle_conns Corefile option.
Changes:
- Add SetMaxIdleConns to proxy
- Add maxIdleConns field to Forward struct
- Add max_idle_conns parsing in forward plugin setup
- Apply setting to each proxy during configuration
- Update forward plugin README with new option
By default the value is 0 (unbounded). When set, excess
connections returned to the pool are closed immediately
rather than cached.
Also add a yield related test.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* chore(proxy): simple Dial by closing conns inline
Remove toClose slice collection to reduce complexity. Instead close
expired connections directly while iterating. Reduces complexity with
negligible lock-time impact.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* chore: fewer explicit Unlock calls
Cleaner and less chance of forgetting to unlock on new possible
code paths.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
---------
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Ensure Dial exits early or returns error when Transport has been
stopped, instead of blocking on the dial or ret channels. This removes
a potential goroutine leak where callers could pile up waiting
forever under heavy load.
Add select guards before send and receive, and propagate clear error
values so callers can handle shutdown gracefully.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* plugin/forward: Move Proxy into pkg/plugin/proxy, to allow forward.Proxy to be used outside of forward plugin.
Signed-off-by: Patrick Downey <patrick.downey@dioadconsulting.com>