plugin/traffic/README.md

# traffic

## Name

*traffic* - handout addresses according to assignments from Envoy's xDS.

## Description

The *traffic* plugin is a balancer that allows traffic steering, weighted responses and draining of
clusters.

The cluster information is retrieved from a service discovery manager that implements the service
discovery [protocols from Envoy implements](https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol).
It connects to the manager using the Aggregated Discovery Service (ADS) protocol. Endpoints and
clusters are discovered every 10 seconds. The plugin hands out responses that adhere to these
assignments. Only endpoints that are *healthy* are handed out.

If *traffic*'s `locality` has been set the answers can be localized.

A cluster in Envoy is defined as: "A group of logically similar endpoints that Envoy connects to."
Each cluster has a name, which *traffic* extends to be a domain name. See "Naming Clusters" below.

The use case for this plugin is when a cluster has endpoints running in multiple (Kubernetes?)
clusters and you need to steer traffic to (or away) from these endpoints, i.e. endpoint A needs to
be upgraded, so all traffic to it is drained. Or the entire Kubernetes needs to upgraded, and *all*
endpoints need to be drained from it.

For A and AAAA queries each DNS response contains a single IP address that's considered the best
one. The TTL on these answer is set to 5s. It will only return successful responses either with an
answer or otherwise a NODATA response. Queries for non-existent clusters get a NXDOMAIN, where the
minimal TTL is also set to 5s.

For SRV queries all healthy backends will be returned - assuming the client doing the query is smart
enough to select the best one. When SRV records are returned, the endpoint DNS names are synthesized
`endpoint-<N>.<cluster>.<zone>` that carries the IP address. Querying for these synthesized names
works as well.

Load reporting is not supported for the following reason: A DNS query is done by a resolver.
Behind this resolver (which can also cache) there may be many clients that will use this reply. The
responding server (CoreDNS) has no idea how many clients use this resolver. So reporting a load of
+1 on the CoreDNS side can results in anything from 1 to 1000+ of queries on the endpoint, making
the load reporting from *traffic* highly inaccurate.

*Traffic* implements version 3 of the xDS API.

## Syntax

~~~
traffic TO...
~~~

This enabled the *traffic* plugin, with a default node ID of `coredns` and no TLS.

 *  **TO...** are the control plane endpoints to connect to. These must start with `grpc://`. The
    port number defaults to 443, if not specified.

The extended syntax is available if you want more control.

~~~
traffic TO... {
    node ID
    locality REGION[,ZONE[,SUBZONE]] [REGION[,ZONE[,SUBZONE]]]...
    tls CERT KEY CA
    tls_servername NAME
    ignore_health
}
~~~

 *  `node` **ID** is how *traffic* identifies itself to the control plane. This defaults to
    `coredns`.

 *  `locality` has a list of **REGION,ZONE,SUBZONE** sets. These tell *traffic* where its running
    and what should be considered local traffic. Each **REGION,ZONE,SUBZONE** set will be used
    to match clusters against while generating responses. The list should descend in proximity.
    **ZONE** or **ZONE** *and* **SUBZONE** may be omitted. This signifies a wild card match.
    I.e. when there are 3 regions, US, EU, ASIA, and this CoreDNS is running in EU, you can use:
    `locality EU US ASIA`. Each list must be separated using spaces. The elements within a set
    should be separated with only a comma.

 *  `tls` **CERT** **KEY** **CA** define the TLS properties for gRPC connection. If this is omitted
    an insecure connection is attempted. From 0 to 3 arguments can be provided with the meaning as
    described below

     -  `tls` - no client authentication is used, and the system CAs are used to verify the server
        certificate

     -  `tls` **CA** - no client authentication is used, and the file CA is used to verify the
        server certificate

     -  `tls` **CERT** **KEY** - client authentication is used with the specified cert/key pair. The
        server certificate is verified with the system CAs.

     -  `tls` **CERT** **KEY** **CA** - client authentication is used with the specified cert/key
        pair. The server certificate is verified using the specified CA file.

 *  `tls_servername` **NAME** allows you to set a server name in the TLS configuration. This is
    needed because *traffic* connects to an IP address, so it can't infer the server name from it.

 *  `ignore_health` can be enabled to ignore endpoint health status, this can aid in debugging.

## Naming Clusters

When a cluster is named this usually consists out of a single word, i.e. "cluster-v0", or "web".
The *traffic* plugins uses the name(s) specified in the Server Block to create fully qualified
domain names. For example if the Server Block specifies `lb.example.org` as one of the names,
and "cluster-v0" is one of the load balanced cluster, *traffic* will respond to query asking for
`cluster-v0.lb.example.org.` and the same goes for `web`; `web.lb.example.org`.

## Localized Endpoints

Endpoints can be grouped by location, this location information is used if the `locality` property
is used in the configuration.

## Matching Algorithm

How are clients match against the data we receive from xDS endpoint? Ignoring `locality` for now, it
will go through the following steps:

1.  Does the cluster exist? If not return NXDOMAIN, otherwise continue.

2.  Run through the endpoints, discard any endpoints that are not HEALTHY. If we are left with no
    endpoint return a NODATA response, otherwise continue.

3.  If weights are assigned use those to pick an endpoint, otherwise randomly pick one and return a
    response to the client.

If `locality` *has* been specified there is an extra step between 2 and 3.

2a. Match the endpoints using the locality that groups several of them, it's the most specific
match from left to right in the `locality` list; if no **REGION,ZONE,SUBZONE** matches then try
**REGION,ZONE** and then **REGION**. If still not match, move on the to next one. If we found none,
we continue with step 4 above, ignoring any locality.

## Metrics

If monitoring is enabled (via the *prometheus* plugin) then the following metric are exported:

 *  `coredns_traffic_clusters_tracked{}` the number of tracked clusters.

## Ready

This plugin report readiness to the *ready* plugin. This will happen after a gRPC stream has been
established to the control plane.

## Examples

~~~
lb.example.org {
    traffic grpc://127.0.0.1:18000 {
        node test-id
    }
    debug
    log
}
~~~

This will load balance any names under `lb.example.org` using the data from the manager running on
localhost on port 18000. The node ID will be `test-id` and no TLS will be used.

## Bugs

Priority and locality information from ClusterLoadAssignments is not used. Multiple **TO** addresses
is not implemented. Credentials are not implemented.
Add new plugin: traffic Traffic is a plugin that communicates via the xDS protocol to an Envoy control plane. Using the data from this control plane it hands out IP addresses. This allows you (via controlling the data in the control plane) to drain or send more traffic to specific endpoints. The plugin itself only acts upon this data; it doesn't do anything fancy by itself. Code used here is copied from grpc-go and other places, this is clearly marked in the source files. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-05 11:45:45 +01:00			`# traffic`

			`## Name`

			`traffic - handout addresses according to assignments from Envoy's xDS.`

			`## Description`

docs Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-25 08:52:10 +01:00			`The traffic plugin is a balancer that allows traffic steering, weighted responses and draining of`
			`clusters.`
Add new plugin: traffic Traffic is a plugin that communicates via the xDS protocol to an Envoy control plane. Using the data from this control plane it hands out IP addresses. This allows you (via controlling the data in the control plane) to drain or send more traffic to specific endpoints. The plugin itself only acts upon this data; it doesn't do anything fancy by itself. Code used here is copied from grpc-go and other places, this is clearly marked in the source files. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-05 11:45:45 +01:00
docs Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-25 08:52:10 +01:00			`The cluster information is retrieved from a service discovery manager that implements the service`
			`discovery [protocols from Envoy implements](https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol).`
			`It connects to the manager using the Aggregated Discovery Service (ADS) protocol. Endpoints and`
			`clusters are discovered every 10 seconds. The plugin hands out responses that adhere to these`
			`assignments. Only endpoints that are healthy are handed out.`

			If traffic's `locality` has been set the answers can be localized.

			`A cluster in Envoy is defined as: "A group of logically similar endpoints that Envoy connects to."`
docs Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-18 08:04:01 +01:00			`Each cluster has a name, which traffic extends to be a domain name. See "Naming Clusters" below.`
Add new plugin: traffic Traffic is a plugin that communicates via the xDS protocol to an Envoy control plane. Using the data from this control plane it hands out IP addresses. This allows you (via controlling the data in the control plane) to drain or send more traffic to specific endpoints. The plugin itself only acts upon this data; it doesn't do anything fancy by itself. Code used here is copied from grpc-go and other places, this is clearly marked in the source files. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-05 11:45:45 +01:00
			`The use case for this plugin is when a cluster has endpoints running in multiple (Kubernetes?)`
			`clusters and you need to steer traffic to (or away) from these endpoints, i.e. endpoint A needs to`
			`be upgraded, so all traffic to it is drained. Or the entire Kubernetes needs to upgraded, and all`
			`endpoints need to be drained from it.`

Return all records for SRV queries Return all SRV records and assume the client is smart enough to make the call. Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-19 08:30:13 +01:00			`For A and AAAA queries each DNS response contains a single IP address that's considered the best`
			`one. The TTL on these answer is set to 5s. It will only return successful responses either with an`
			`answer or otherwise a NODATA response. Queries for non-existent clusters get a NXDOMAIN, where the`
			`minimal TTL is also set to 5s.`

			`For SRV queries all healthy backends will be returned - assuming the client doing the query is smart`
			`enough to select the best one. When SRV records are returned, the endpoint DNS names are synthesized`
			`endpoint-<N>.<cluster>.<zone>` that carries the IP address. Querying for these synthesized names
			`works as well.`
Add new plugin: traffic Traffic is a plugin that communicates via the xDS protocol to an Envoy control plane. Using the data from this control plane it hands out IP addresses. This allows you (via controlling the data in the control plane) to drain or send more traffic to specific endpoints. The plugin itself only acts upon this data; it doesn't do anything fancy by itself. Code used here is copied from grpc-go and other places, this is clearly marked in the source files. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-05 11:45:45 +01:00
docs Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-25 08:52:10 +01:00			`Load reporting is not supported for the following reason: A DNS query is done by a resolver.`
docs Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-18 08:04:01 +01:00			`Behind this resolver (which can also cache) there may be many clients that will use this reply. The`
			`responding server (CoreDNS) has no idea how many clients use this resolver. So reporting a load of`
			`+1 on the CoreDNS side can results in anything from 1 to 1000+ of queries on the endpoint, making`
Implement SRV records Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-18 20:12:25 +01:00			`the load reporting from traffic highly inaccurate.`
docs Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-18 08:04:01 +01:00
Move to version 3 of the API Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-30 22:58:52 +01:00			`Traffic implements version 3 of the xDS API.`

Add new plugin: traffic Traffic is a plugin that communicates via the xDS protocol to an Envoy control plane. Using the data from this control plane it hands out IP addresses. This allows you (via controlling the data in the control plane) to drain or send more traffic to specific endpoints. The plugin itself only acts upon this data; it doesn't do anything fancy by itself. Code used here is copied from grpc-go and other places, this is clearly marked in the source files. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-05 11:45:45 +01:00			`## Syntax`

			`~~~`
			`traffic TO...`
			`~~~`

docs Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-18 08:04:01 +01:00			This enabled the traffic plugin, with a default node ID of `coredns` and no TLS.
Add new plugin: traffic Traffic is a plugin that communicates via the xDS protocol to an Envoy control plane. Using the data from this control plane it hands out IP addresses. This allows you (via controlling the data in the control plane) to drain or send more traffic to specific endpoints. The plugin itself only acts upon this data; it doesn't do anything fancy by itself. Code used here is copied from grpc-go and other places, this is clearly marked in the source files. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-05 11:45:45 +01:00
Fix documentation and start parsing localities Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-24 12:00:07 +01:00			* TO... are the control plane endpoints to connect to. These must start with `grpc://`. The
			`port number defaults to 443, if not specified.`
Add new plugin: traffic Traffic is a plugin that communicates via the xDS protocol to an Envoy control plane. Using the data from this control plane it hands out IP addresses. This allows you (via controlling the data in the control plane) to drain or send more traffic to specific endpoints. The plugin itself only acts upon this data; it doesn't do anything fancy by itself. Code used here is copied from grpc-go and other places, this is clearly marked in the source files. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-05 11:45:45 +01:00
docs Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-18 08:04:01 +01:00			`The extended syntax is available if you want more control.`
Add new plugin: traffic Traffic is a plugin that communicates via the xDS protocol to an Envoy control plane. Using the data from this control plane it hands out IP addresses. This allows you (via controlling the data in the control plane) to drain or send more traffic to specific endpoints. The plugin itself only acts upon this data; it doesn't do anything fancy by itself. Code used here is copied from grpc-go and other places, this is clearly marked in the source files. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-05 11:45:45 +01:00
			`~~~`
			`traffic TO... {`
			`node ID`
Fix documentation and start parsing localities Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-24 12:00:07 +01:00			`locality REGION[,ZONE[,SUBZONE]] [REGION[,ZONE[,SUBZONE]]]...`
Add new plugin: traffic Traffic is a plugin that communicates via the xDS protocol to an Envoy control plane. Using the data from this control plane it hands out IP addresses. This allows you (via controlling the data in the control plane) to drain or send more traffic to specific endpoints. The plugin itself only acts upon this data; it doesn't do anything fancy by itself. Code used here is copied from grpc-go and other places, this is clearly marked in the source files. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-05 11:45:45 +01:00			`tls CERT KEY CA`
			`tls_servername NAME`
more Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-19 09:14:09 +01:00			`ignore_health`
Add new plugin: traffic Traffic is a plugin that communicates via the xDS protocol to an Envoy control plane. Using the data from this control plane it hands out IP addresses. This allows you (via controlling the data in the control plane) to drain or send more traffic to specific endpoints. The plugin itself only acts upon this data; it doesn't do anything fancy by itself. Code used here is copied from grpc-go and other places, this is clearly marked in the source files. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-05 11:45:45 +01:00			`}`
			`~~~`

Fix documentation and start parsing localities Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-24 12:00:07 +01:00			* `node` ID is how traffic identifies itself to the control plane. This defaults to
			`coredns`.

			* `locality` has a list of REGION,ZONE,SUBZONE sets. These tell traffic where its running
			`and what should be considered local traffic. Each REGION,ZONE,SUBZONE set will be used`
			`to match clusters against while generating responses. The list should descend in proximity.`
			`ZONE or ZONE and SUBZONE may be omitted. This signifies a wild card match.`
			`I.e. when there are 3 regions, US, EU, ASIA, and this CoreDNS is running in EU, you can use:`
			`locality EU US ASIA`. Each list must be separated using spaces. The elements within a set
			`should be separated with only a comma.`

			* `tls` CERT KEY CA define the TLS properties for gRPC connection. If this is omitted
			`an insecure connection is attempted. From 0 to 3 arguments can be provided with the meaning as`
			`described below`

			- `tls` - no client authentication is used, and the system CAs are used to verify the server
			`certificate`

			- `tls` CA - no client authentication is used, and the file CA is used to verify the
			`server certificate`

			- `tls` CERT KEY - client authentication is used with the specified cert/key pair. The
			`server certificate is verified with the system CAs.`

			- `tls` CERT KEY CA - client authentication is used with the specified cert/key
			`pair. The server certificate is verified using the specified CA file.`

			* `tls_servername` NAME allows you to set a server name in the TLS configuration. This is
			`needed because traffic connects to an IP address, so it can't infer the server name from it.`

			* `ignore_health` can be enabled to ignore endpoint health status, this can aid in debugging.
Add new plugin: traffic Traffic is a plugin that communicates via the xDS protocol to an Envoy control plane. Using the data from this control plane it hands out IP addresses. This allows you (via controlling the data in the control plane) to drain or send more traffic to specific endpoints. The plugin itself only acts upon this data; it doesn't do anything fancy by itself. Code used here is copied from grpc-go and other places, this is clearly marked in the source files. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-05 11:45:45 +01:00
			`## Naming Clusters`

			`When a cluster is named this usually consists out of a single word, i.e. "cluster-v0", or "web".`
			`The traffic plugins uses the name(s) specified in the Server Block to create fully qualified`
			domain names. For example if the Server Block specifies `lb.example.org` as one of the names,
			`and "cluster-v0" is one of the load balanced cluster, traffic will respond to query asking for`
			`cluster-v0.lb.example.org.` and the same goes for `web`; `web.lb.example.org`.

Update and specificy metadata and locality Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-23 15:36:09 +01:00			`## Localized Endpoints`

			Endpoints can be grouped by location, this location information is used if the `locality` property
			`is used in the configuration.`

			`## Matching Algorithm`

Fix documentation and start parsing localities Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-24 12:00:07 +01:00			How are clients match against the data we receive from xDS endpoint? Ignoring `locality` for now, it
			`will go through the following steps:`

			`1. Does the cluster exist? If not return NXDOMAIN, otherwise continue.`

			`2. Run through the endpoints, discard any endpoints that are not HEALTHY. If we are left with no`
			`endpoint return a NODATA response, otherwise continue.`
Update and specificy metadata and locality Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-23 15:36:09 +01:00
Fix documentation and start parsing localities Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-24 12:00:07 +01:00			`3. If weights are assigned use those to pick an endpoint, otherwise randomly pick one and return a`
			`response to the client.`
Update and specificy metadata and locality Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-23 15:36:09 +01:00
remove the metadata foo Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-23 16:49:47 +01:00			If `locality` has been specified there is an extra step between 2 and 3.
Update and specificy metadata and locality Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-23 15:36:09 +01:00
Fix documentation and start parsing localities Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-24 12:00:07 +01:00			`2a. Match the endpoints using the locality that groups several of them, it's the most specific`
			match from left to right in the `locality` list; if no REGION,ZONE,SUBZONE matches then try
			`REGION,ZONE and then REGION. If still not match, move on the to next one. If we found none,`
			`we continue with step 4 above, ignoring any locality.`
Update and specificy metadata and locality Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-23 15:36:09 +01:00
Add new plugin: traffic Traffic is a plugin that communicates via the xDS protocol to an Envoy control plane. Using the data from this control plane it hands out IP addresses. This allows you (via controlling the data in the control plane) to drain or send more traffic to specific endpoints. The plugin itself only acts upon this data; it doesn't do anything fancy by itself. Code used here is copied from grpc-go and other places, this is clearly marked in the source files. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-05 11:45:45 +01:00			`## Metrics`

finish docs Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-19 09:43:20 +01:00			`If monitoring is enabled (via the prometheus plugin) then the following metric are exported:`

Fix documentation and start parsing localities Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-24 12:00:07 +01:00			* `coredns_traffic_clusters_tracked{}` the number of tracked clusters.
Add new plugin: traffic Traffic is a plugin that communicates via the xDS protocol to an Envoy control plane. Using the data from this control plane it hands out IP addresses. This allows you (via controlling the data in the control plane) to drain or send more traffic to specific endpoints. The plugin itself only acts upon this data; it doesn't do anything fancy by itself. Code used here is copied from grpc-go and other places, this is clearly marked in the source files. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-05 11:45:45 +01:00
			`## Ready`

emph plugin name Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-18 08:41:21 +01:00			`This plugin report readiness to the ready plugin. This will happen after a gRPC stream has been`
docs Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-18 08:04:01 +01:00			`established to the control plane.`
Add new plugin: traffic Traffic is a plugin that communicates via the xDS protocol to an Envoy control plane. Using the data from this control plane it hands out IP addresses. This allows you (via controlling the data in the control plane) to drain or send more traffic to specific endpoints. The plugin itself only acts upon this data; it doesn't do anything fancy by itself. Code used here is copied from grpc-go and other places, this is clearly marked in the source files. Signed-off-by: Miek Gieben <miek@miek.nl> 2019-10-05 11:45:45 +01:00
			`## Examples`

			`~~~`
			`lb.example.org {`
			`traffic grpc://127.0.0.1:18000 {`
			`node test-id`
			`}`
			`debug`
			`log`
			`}`
			`~~~`

			This will load balance any names under `lb.example.org` using the data from the manager running on
			localhost on port 18000. The node ID will be `test-id` and no TLS will be used.

			`## Bugs`

docs Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-18 08:04:01 +01:00			`Priority and locality information from ClusterLoadAssignments is not used. Multiple TO addresses`
finish docs Signed-off-by: Miek Gieben <miek@miek.nl> 2020-01-19 09:43:20 +01:00			`is not implemented. Credentials are not implemented.`