* plugin/pkg/proxy: add max_age for per-connection lifetime cap
Introduce a max_age setting on Transport that closes connections based
on creation time, independent of idle-timeout (expire).
Background: PR #7790 changed the connection pool from LIFO to FIFO for
source-port diversity. Under FIFO, every connection is cycled through
the pool and its used timestamp is refreshed continuously. When request
rate is high enough that pool_size / request_rate < expire, no
connection ever becomes idle and expire never fires. This prevents
CoreDNS from opening new connections to upstreams that scale out (e.g.
new Kubernetes pods behind a ClusterIP service with conntrack pinning).
max_age addresses this by enforcing an absolute upper bound on
connection lifetime regardless of activity:
- persistConn gains a created field set at dial time.
- Transport gains maxAge (default 0 = unlimited, preserving existing
behaviour).
- Dial(): rejects cached connections whose creation age exceeds max_age.
- cleanup(): when maxAge > 0, uses a linear scan over both idle-timeout
and max-age predicates; when maxAge == 0, preserves the original
binary-search path on used time (sorted by FIFO insertion order).
- Both hot paths pre-compute the deadline outside any inner loop to
avoid repeated time.Now() calls.
Tests added:
- TestMaxAgeExpireByCreation: connection with old created but fresh used
must be rejected even though idle-timeout would pass.
- TestMaxAgeFIFORotation: three FIFO-rotated connections (old created,
fresh used) must all be rejected, confirming that continuous rotation
cannot prevent max-age expiry.
Signed-off-by: cangming <cangming@cangming.app>
* plugin/forward: add max_age option
Expose Transport.SetMaxAge through the forward plugin so operators can
set an absolute upper bound on connection lifetime via the Corefile.
Usage:
forward . 1.2.3.4 {
max_age 30s
}
Default is 0 (unlimited), which preserves existing behaviour.
A positive value causes connections older than max_age to be closed and
re-dialled on the next request, ensuring CoreDNS reconnects to newly
scaled-out upstream pods even under sustained load where the idle
timeout (expire) would never fire.
If max_age is set, it must not be less than expire; the parser rejects
this combination at startup with a clear error message.
Signed-off-by: cangming <cangming@cangming.app>
---------
Signed-off-by: cangming <cangming@cangming.app>
* feat(secondary): Send NOTIFY messages after zone transfer
- Modified TransferIn() method to accept a transfer.Transfer parameter
- Added NOTIFY message sending after successful zone transfer in secondary plugin
- Updated Update() method to pass the transfer handler through the zone update cycle
- Added comprehensive tests for the secondary notify functionality
Closes#5669
Signed-off-by: liucongran <liucongran327@gmail.com>
* fix(secondary): Fix TransferIn method call in test
Update test to pass nil parameter to TransferIn method after signature change
Signed-off-by: liucongran <liucongran327@gmail.com>
* refactor(secondary): Clean up imports and add helper methods
- Reorder imports for consistency
- Add hasSOA() and getSOA() helper methods to Zone
- Remove unnecessary blank lines in tests
Signed-off-by: liucongran <liucongran327@gmail.com>
* fix(test): Fix variable declaration in secondary test
Change corefile variable assignment to use short declaration syntax (:=)
to fix compilation error.
Signed-off-by: liucongran <liucongran327@gmail.com>
* refactor(secondary): Use getSOA helper method in shouldTransfer
Replace direct SOA access with getSOA() helper method for consistency.
Signed-off-by: liucongran <liucongran327@gmail.com>
---------
Signed-off-by: liucongran <liucongran327@gmail.com>
Co-authored-by: liucongran <liucongran@cestc.cn>
- Added nolint to plugin/auto/walk.go to avoid a symlink/TOCTOU
warning, as it needs to follow symlink.
- Replaced a few flagged integer conversions with safe equivalents in
cache hashing, reuseport socket setup, and TLS arg handling
- Preallocated response rule slices in plugin/rewrite/name.go
- Replaced WriteString(fmt.Sprintf/Sprintln(...)) with direct
fmt.Fprint* calls
- Removed stale nolint directives from code and tests that are no
longer needed
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix(rewrite): fix cname target rewrite for CNAME chains
This fix corrects the cname target rewrite to handle CNAME chains:
- Preserves only the CNAME records before matching the rule
- Rewrites only the CNAME target that matches the rule
- Includes all records from the re-resolved upstream response
Signed-off-by: hide <hide@hide.net.eu.org>
* docs(rewrite): document how answer records are handled in CNAME target rewrite
Signed-off-by: hide <hide@hide.net.eu.org>
* fix(rewrite): simplify slice append per staticcheck S1011
Signed-off-by: hide <hide@hide.net.eu.org>
* docs(rewrite): add extra line between code and paragraph
Signed-off-by: hide <hide@hide.net.eu.org>
---------
Signed-off-by: hide <hide@hide.net.eu.org>
Co-authored-by: hide <hide@hide.net.eu.org>
* fix: return SOA and NS records when queried for a record CNAMEd to origin
Signed-off-by: Shiv Tyagi <shivtyagi3015@gmail.com>
* chore(test): add test for covering cname to origin scenario in file plugin
Signed-off-by: Shiv Tyagi <shivtyagi3015@gmail.com>
---------
Signed-off-by: Shiv Tyagi <shivtyagi3015@gmail.com>
* perf(proxy): use mutex-based connection pool
The proxy package (used for example by the forward plugin) utilized
an actor model where a single connManager goroutine managed
connection pooling via unbuffered channels (dial, yield, ret). This
design serialized all connection acquisition and release operations
through a single goroutine, creating a bottleneck under high
concurrency. This was observable as a performance degradation when
using a single upstream backend compared to multiple backends
(which sharded the bottleneck).
Changes:
- Removed dial, yield, and ret channels from the Transport struct.
- Removed the connManager goroutine's request processing loop.
- Implemented Dial() and Yield() using a sync.Mutex to protect the
connection slice, allowing for fast concurrent access without
context switching.
- Downgraded connManager to a simple background cleanup loop that
only handles connection expiration on a ticker.
- Updated plugin/pkg/proxy/connect.go to use direct method calls
instead of channel sends.
- Updated tests to reflect the removal of internal channels.
Benchmarks show that this change eliminates the single-backend
bottleneck. Now a single upstream backend performs on par with
multiple backends, and overall throughput is improved.
The implementation aligns with standard Go patterns for connection
pooling (e.g., net/http.Transport).
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: address PR review for persistent.go
- Named mutex field instead of embedding, to not expose
Lock() and Unlock()
- Move stop check outside of lock in Yield()
- Close() without a separate goroutine
- Change stop channel to struct
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: address code review feedback for conn pool
- Switch from LIFO to FIFO connection selection for source port
diversity, reducing DNS cache poisoning risk (RFC 5452).
- Remove "clear entire cache" optimization as it was LIFO-specific.
FIFO naturally iterates and skips expired connections.
- Remove all goroutines for closing connections; collect connections
while holding lock, close synchronously after releasing lock.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* fix: remove unused error consts
No longer utilised after refactoring the channel based approach.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* feat(forward): add max_idle_conns option
Add configurable connection pool limit for the forward plugin via
the max_idle_conns Corefile option.
Changes:
- Add SetMaxIdleConns to proxy
- Add maxIdleConns field to Forward struct
- Add max_idle_conns parsing in forward plugin setup
- Apply setting to each proxy during configuration
- Update forward plugin README with new option
By default the value is 0 (unbounded). When set, excess
connections returned to the pool are closed immediately
rather than cached.
Also add a yield related test.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* chore(proxy): simple Dial by closing conns inline
Remove toClose slice collection to reduce complexity. Instead close
expired connections directly while iterating. Reduces complexity with
negligible lock-time impact.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* chore: fewer explicit Unlock calls
Cleaner and less chance of forgetting to unlock on new possible
code paths.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
---------
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Replace http.Serve() with http.Server{} configured with timeouts to
address G114 gosec findings (HTTP server without timeouts). This
prevents potential slowloris attacks and resource exhaustion.
Changes:
- Add ReadTimeout, WriteTimeout, IdleTimeout (5s each) to HTTP servers
- Use srv.Shutdown(ctx) for graceful shutdown instead of ln.Close()
- Follow existing pattern from plugin/metrics
Fixes part of #7793
Signed-off-by: Azeez Syed <syedazeez337@gmail.com>
Remove expensive runtime.Caller calls from metrics Recorder.WriteMsg
by tracking the responding plugin through the plugin chain instead.
- Add PluginTracker interface and pluginWriter wrapper in plugin.go
- Modify NextOrFailure to wrap ResponseWriter with plugin name
- Update metrics Recorder to implement PluginTracker
- Remove authoritativePlugin method using filepath inspection
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Enable "gosec" linter.
Exclude:
- All G115 (integer overflow) findings, to be fixed separately.
Add targeted gosec annotations for:
- non-crypto math/rand usage
- md5 used only for file change detection
- G114 ("net/http serve with no timeout settings"), to be fixed
separately.
Other findings fixed.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
* Improve SOA error handling/reporting.
Signed-off-by: Ross Golder <ross@golder.org>
* Add tests for malformed SOA records
Signed-off-by: Ross Golder <ross@golder.org>
* Address review comments: assert exact parse errors in SOA tests and fix gofmt
Signed-off-by: Ross Golder <ross@golder.org>
---------
Signed-off-by: Ross Golder <ross@golder.org>
Add configurable resource limits to prevent potential DoS vectors
via connection/stream exhaustion on gRPC, HTTPS, and HTTPS/3 servers.
New configuration plugins:
- grpc_server: configure max_streams, max_connections
- https: configure max_connections
- https3: configure max_streams
Changes:
- Use netutil.LimitListener for connection limiting
- Use gRPC MaxConcurrentStreams and message size limits
- Add QUIC MaxIncomingStreams for HTTPS/3 stream limiting
- Set secure defaults: 256 max streams, 200 max connections
- Setting any limit to 0 means unbounded/fallback to previous impl
Defaults are applied automatically when plugins are omitted from
config.
Includes tests and integration tests.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
The test "TestDial_TransportStoppedDuringRetWait" replaced
tr.dial and tr.ret with test-controlled channels, then called
tr.Start(). Since connManager reads from t.dial, both the test
and connManager were racing to read from the same channel.
Remove tr.Start() since the test manually simulates connManager
behavior.
Also changed some test log formatting to align with other tests.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
Add optional show_first flag to consolidate directive that logs
the first error immediately and then consolidates subsequent errors.
When show_first is enabled:
- The first matching error is logged immediately with full details
(rcode, domain, type, error message) using the configured log level
- Subsequent matching errors are consolidated during the period
- At period end:
- If only one error occurred, no summary is printed (already logged)
- If multiple errors occurred, summary shows the total count
Syntax:
consolidate DURATION REGEXP [LEVEL] [show_first]
Example with 3 errors:
[WARNING] 2 example.org. A: read udp 10.0.0.1:53->8.8.8.8:53: i/o timeout
[WARNING] 3 errors like '^read udp .* i/o timeout$' occurred in last 30s
Example with 1 error:
[WARNING] 2 example.org. A: read udp 10.0.0.1:53->8.8.8.8:53: i/o timeout
Implementation details:
- Add showFirst bool to pattern struct
- Rename inc() to consolidateError(), return false for showFirst case
- Use function pointer in ServeDNS to unify log calls with proper level
- Simplify logPattern() with single condition (cnt > 1 || !showFirst)
- Refactor parseLogLevel() to parseOptionalParams() with map-based dispatch
- Validate parameter order: log level must come before show_first
- Update README.md with show_first documentation and examples
- Add comprehensive test cases for show_first functionality
Signed-off-by: cangming <cangming@cangming.app>
A very large regex for the auto plugin in the Corefile could cause
CoreDNS to OOM. This change adds an artificial limit of 10k characters
for the regex pattern. Fixes OSS-Fuzz finding #466745384.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
The plugin dropped the actual error message from the log, so the log
becomes completely useless.
Before:
```
[ERROR] plugin/kubernetes: error Failed to watch
```
After:
```
[ERROR] plugin/kubernetes: Failed to watch: failed to list *v1.Namespace: Get "https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0": tls: failed to parse certificate from server: x509: SAN dNSName is malformed
```
Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
Add RWMutex to protect concurrent map access in Set, Unset, and ForEach methods.
Change New() to return *U pointer type for proper synchronization.
Signed-off-by: Cangming H <cangmingh@gmail.com>
Fixes a bug in the forward plugin where an immediate connection
failure (e.g., TCP RST) could trigger an infinite busy loop. The
retry logic failed to increment the "fails" counter when a
connection error occurred, causing the loop condition to
remain permanently true. This patch fixes it and adds a
regression test.
Signed-off-by: Ville Vesilehto <ville@vesilehto.fi>
This commit removes superfluous allocations of the Answer, Ns, and Extra
slices when copying a cached a dns.Msg. The allocations are superfluous
because we immediately overwrite the newly copied slices with
filterRRSlice. It also updates filterRRSlice to pre-calculate the size
of the slice being copied into.
Benchmark results:
goos: darwin
goarch: arm64
pkg: github.com/coredns/coredns/plugin/cache
cpu: Apple M4 Pro
│ base.10.txt │ new.10.txt │
│ sec/op │ sec/op vs base │
CacheResponse-14 471.1n ± 0% 462.9n ± 2% -1.74% (p=0.009 n=10)
│ base.10.txt │ new.10.txt │
│ B/op │ B/op vs base │
CacheResponse-14 672.0 ± 0% 656.0 ± 0% -2.38% (p=0.000 n=10)
│ base.10.txt │ new.10.txt │
│ allocs/op │ allocs/op vs base │
CacheResponse-14 13.00 ± 0% 12.00 ± 0% -7.69% (p=0.000 n=10)
Signed-off-by: Charlie Vieth <charlie.vieth@gmail.com>
This commit changes the CNAME rewrite rule to use a pre-compiled regexp
when the match type is RegexMatch instead of compiling it on-the-fly for
each request. This will also allow for invalid regexp patterns to be
identified during setup instead of causing a panic when the rule is
first invoked.
Signed-off-by: Charlie Vieth <charlie.vieth@gmail.com>
* plugin/file: improve performance of function tree.less(..)
PrevLabel always begins its iteration from the tail of domain name.
less(..) loop can improve its performance by calling PrevLabel starting
from the last processed label.
As the benchmark results showed, the performance is improved by about 15%.
$ go test -bench=Less -run=^$
goos: linux
goarch: amd64
pkg: github.com/coredns/coredns/plugin/file/tree
cpu: Intel(R) Xeon(R) Platinum 8336C CPU @ 2.30GHz
BenchmarkLess/base-16 99003 12105 ns/op
BenchmarkLess/optimized-16 114522 10590 ns/op
PASS
ok github.com/coredns/coredns/plugin/file/tree 2.416s
Signed-off-by: yuwenchao <ywc689@163.com>
* plugin/file: performance enhancement for nameFromRight(..)
Similar to tree.less(..), performance of function nameFromRight
can boost by utilizing dns.PrevLabel more efficiently.
As benchmark tests shown, performance of this function with the
optimization is gained by double or triple.
* Benchmark test result for the original implementation:
BenchmarkNameFromRight/i0_origin-16 430719652 2.794 ns/op
BenchmarkNameFromRight/eq_origin_i1_shot-16 30933135 37.52 ns/op
BenchmarkNameFromRight/two_labels_i1-16 29375857 40.71 ns/op
BenchmarkNameFromRight/two_labels_i2-16 18556830 63.97 ns/op
BenchmarkNameFromRight/two_labels_i3_shot-16 14678812 84.73 ns/op
BenchmarkNameFromRight/ten_labels_i5-16 8522132 133.0 ns/op
BenchmarkNameFromRight/ten_labels_i11_shot-16 3154410 378.2 ns/op
BenchmarkNameFromRight/not_subdomain_shot-16 35297224 33.59 ns/op
BenchmarkNameFromRightRandomized-16 10638702 113.4 ns/op 0 B/op 0 allocs/op
* Benchmark test result with this optimization:
BenchmarkNameFromRight/i0_origin-16 425864671 2.808 ns/op
BenchmarkNameFromRight/eq_origin_i1_shot-16 60903428 19.53 ns/op
BenchmarkNameFromRight/two_labels_i1-16 50209297 24.21 ns/op
BenchmarkNameFromRight/two_labels_i2-16 42483711 27.88 ns/op
BenchmarkNameFromRight/two_labels_i3_shot-16 40898925 29.24 ns/op
BenchmarkNameFromRight/ten_labels_i5-16 27916532 44.54 ns/op
BenchmarkNameFromRight/ten_labels_i11_shot-16 17540040 67.59 ns/op
BenchmarkNameFromRight/not_subdomain_shot-16 67180514 17.46 ns/op
BenchmarkNameFromRightRandomized-16 32692081 38.21 ns/op 0 B/op 0 allocs/op
Signed-off-by: yuwenchao <yuwenchao@bytedance.com>
---------
Signed-off-by: yuwenchao <ywc689@163.com>
Signed-off-by: yuwenchao <yuwenchao@bytedance.com>
Co-authored-by: yuwenchao <yuwenchao@bytedance.com>