| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  | # traffic
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Name
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | *traffic* - handout addresses according to assignments from Envoy's xDS. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ## Description
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-03-04 15:38:29 +01:00
										 |  |  | The *traffic* plugin is a balancer that allows traffic steering, weighted responses and draining | 
					
						
							| 
									
										
										
										
											2020-03-05 17:33:35 +01:00
										 |  |  | of clusters. A cluster is defined as: "A group of logically similar endpoints that Envoy | 
					
						
							| 
									
										
										
										
											2020-03-04 15:38:29 +01:00
										 |  |  | connects to." Each cluster has a name, which *traffic* extends to be a domain name. See "Naming | 
					
						
							|  |  |  | Clusters" below. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The use case for this plugin is when a cluster has endpoints running in multiple (Kubernetes?) | 
					
						
							|  |  |  | clusters and you need to steer traffic to (or away) from these endpoints, i.e. endpoint A needs to | 
					
						
							|  |  |  | be upgraded, so all traffic to it is drained. Or the entire Kubernetes needs to upgraded, and *all* | 
					
						
							|  |  |  | endpoints need to be drained from it. | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-01-25 08:52:10 +01:00
										 |  |  | The cluster information is retrieved from a service discovery manager that implements the service | 
					
						
							| 
									
										
										
										
											2020-03-05 17:33:35 +01:00
										 |  |  | discovery [protocols from Envoy](https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol). | 
					
						
							| 
									
										
										
										
											2020-01-25 08:52:10 +01:00
										 |  |  | It connects to the manager using the Aggregated Discovery Service (ADS) protocol. Endpoints and | 
					
						
							|  |  |  | clusters are discovered every 10 seconds. The plugin hands out responses that adhere to these | 
					
						
							|  |  |  | assignments. Only endpoints that are *healthy* are handed out. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-03-04 15:38:29 +01:00
										 |  |  | Note that the manager *itself* is also a cluster that is managed *by the management server*. This is | 
					
						
							|  |  |  | the *management cluster* (see `cluster` below in "Syntax"). By default the name for cluster is `xds`. | 
					
						
							| 
									
										
										
										
											2020-03-05 17:33:35 +01:00
										 |  |  | When bootstrapping *traffic* tries to retrieve the cluster endpoints for the management cluster, | 
					
						
							|  |  |  | when the cluster is not found *traffic* will return a fatal error. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-03-06 09:13:27 +01:00
										 |  |  | The *traffic* plugin handles A, AAAA, SRV and TXT queries. TXT queries are purely used for debugging | 
					
						
							|  |  |  | as health status of the endpoints is ignored in that case. | 
					
						
							|  |  |  | Queries for non-existent clusters get a NXDOMAIN, where the minimal TTL is also set to 5s. | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-01-19 08:30:13 +01:00
										 |  |  | For A and AAAA queries each DNS response contains a single IP address that's considered the best | 
					
						
							|  |  |  | one. The TTL on these answer is set to 5s. It will only return successful responses either with an | 
					
						
							| 
									
										
										
										
											2020-03-05 17:33:35 +01:00
										 |  |  | answer or, otherwise, a NODATA response. | 
					
						
							| 
									
										
										
										
											2020-01-19 08:30:13 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-03-06 09:13:27 +01:00
										 |  |  | TXT replies will use the SRV record format augmented with the health status of each backend, as this | 
					
						
							|  |  |  | is useful for debugging. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ~~~ | 
					
						
							|  |  |  | web.lb.example.org.	5	IN	TXT	"100" "100" "18008" "endpoint-0.web.lb.example.org." "HEALTHY" | 
					
						
							|  |  |  | ~~~ | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-03-05 17:33:35 +01:00
										 |  |  | For SRV queries *all* healthy backends will be returned - assuming the client doing the query | 
					
						
							|  |  |  | is smart enough to select the best one. When SRV records are returned, the endpoint DNS names | 
					
						
							|  |  |  | are synthesized `endpoint-<N>.<cluster>.<zone>` that carries the IP address. Querying for these | 
					
						
							|  |  |  | synthesized names works as well. | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-06-04 13:36:27 +02:00
										 |  |  | *Traffic* implements version 2 of the xDS API. It works with the management server as written in | 
					
						
							| 
									
										
										
										
											2020-02-07 13:28:10 +01:00
										 |  |  | <https://github.com/miekg/xds>. | 
					
						
							| 
									
										
										
										
											2020-01-30 22:58:52 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  | ## Syntax
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ~~~ | 
					
						
							|  |  |  | traffic TO... | 
					
						
							|  |  |  | ~~~ | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-01-18 08:04:01 +01:00
										 |  |  | This enabled the *traffic* plugin, with a default node ID of `coredns` and no TLS. | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-03-04 15:38:29 +01:00
										 |  |  |  *  **TO...** are the control plane endpoints to bootstrap from. These must start with `grpc://`. The | 
					
						
							| 
									
										
										
										
											2020-03-06 09:13:27 +01:00
										 |  |  |     port number defaults to 443, if not specified. These endpoints will be tried in the order given. | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-01-18 08:04:01 +01:00
										 |  |  | The extended syntax is available if you want more control. | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | ~~~ | 
					
						
							|  |  |  | traffic TO... { | 
					
						
							| 
									
										
										
										
											2020-03-04 15:38:29 +01:00
										 |  |  |     cluster CLUSTER | 
					
						
							| 
									
										
										
										
											2020-03-04 11:55:42 +01:00
										 |  |  |     id ID | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  |     tls CERT KEY CA | 
					
						
							|  |  |  |     tls_servername NAME | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | ~~~ | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-03-04 15:38:29 +01:00
										 |  |  |  *  `cluster` **CLUSTER** define the name of the management cluster. By default this is `xds`. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-03-05 17:33:35 +01:00
										 |  |  |  *  `id` **ID** is how *traffic* identifies itself to the control plane. This defaults to `coredns`. | 
					
						
							| 
									
										
										
										
											2020-01-24 12:00:07 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  |  *  `tls` **CERT** **KEY** **CA** define the TLS properties for gRPC connection. If this is omitted | 
					
						
							|  |  |  |     an insecure connection is attempted. From 0 to 3 arguments can be provided with the meaning as | 
					
						
							|  |  |  |     described below | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |      -  `tls` - no client authentication is used, and the system CAs are used to verify the server | 
					
						
							|  |  |  |         certificate | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |      -  `tls` **CA** - no client authentication is used, and the file CA is used to verify the | 
					
						
							|  |  |  |         server certificate | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |      -  `tls` **CERT** **KEY** - client authentication is used with the specified cert/key pair. The | 
					
						
							|  |  |  |         server certificate is verified with the system CAs. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |      -  `tls` **CERT** **KEY** **CA** - client authentication is used with the specified cert/key | 
					
						
							|  |  |  |         pair. The server certificate is verified using the specified CA file. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |  *  `tls_servername` **NAME** allows you to set a server name in the TLS configuration. This is | 
					
						
							|  |  |  |     needed because *traffic* connects to an IP address, so it can't infer the server name from it. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  | ## Naming Clusters
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | When a cluster is named this usually consists out of a single word, i.e. "cluster-v0", or "web". | 
					
						
							|  |  |  | The *traffic* plugins uses the name(s) specified in the Server Block to create fully qualified | 
					
						
							|  |  |  | domain names. For example if the Server Block specifies `lb.example.org` as one of the names, | 
					
						
							| 
									
										
										
										
											2020-03-04 15:38:29 +01:00
										 |  |  | and "cluster-v0" is one of the load balanced cluster, *traffic* will respond to queries asking for | 
					
						
							| 
									
										
										
										
											2020-07-09 22:02:07 +02:00
										 |  |  | `cluster-v0.lb.example.org.` and the same goes for "web"; `web.lb.example.org`. | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-03-04 11:55:42 +01:00
										 |  |  | For SRV queries all endpoints are returned, the SRV target names are synthesized: | 
					
						
							|  |  |  | `endpoint-<N>.web.lb.example.org` to take the example from above. *N* is an integer starting with 0. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-01-23 15:36:09 +01:00
										 |  |  | ## Matching Algorithm
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-03-05 17:33:35 +01:00
										 |  |  | How are clients match against the data we receive from xDS endpoint? | 
					
						
							| 
									
										
										
										
											2020-01-24 12:00:07 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | 1.  Does the cluster exist? If not return NXDOMAIN, otherwise continue. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 2.  Run through the endpoints, discard any endpoints that are not HEALTHY. If we are left with no | 
					
						
							|  |  |  |     endpoint return a NODATA response, otherwise continue. | 
					
						
							| 
									
										
										
										
											2020-01-23 15:36:09 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-03-05 17:33:35 +01:00
										 |  |  | 3.  If weights are assigned, use those to pick an endpoint, otherwise randomly pick one and return a | 
					
						
							| 
									
										
										
										
											2020-01-24 12:00:07 +01:00
										 |  |  |     response to the client. | 
					
						
							| 
									
										
										
										
											2020-01-23 15:36:09 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  | ## Metrics
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-01-19 09:43:20 +01:00
										 |  |  | If monitoring is enabled (via the *prometheus* plugin) then the following metric are exported: | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-02-05 14:58:14 +01:00
										 |  |  |  *  `coredns_traffic_clusters_tracked{}` the number of tracked clusters. | 
					
						
							| 
									
										
										
										
											2020-03-05 11:07:33 +01:00
										 |  |  |  *  `coredns_traffic_endpoints_tracked{}` the number of tracked endpoints. | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | ## Ready
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-01-18 08:41:21 +01:00
										 |  |  | This plugin report readiness to the *ready* plugin. This will happen after a gRPC stream has been | 
					
						
							| 
									
										
										
										
											2020-01-18 08:04:01 +01:00
										 |  |  | established to the control plane. | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | ## Examples
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ~~~ | 
					
						
							|  |  |  | lb.example.org { | 
					
						
							|  |  |  |     traffic grpc://127.0.0.1:18000 { | 
					
						
							| 
									
										
										
										
											2020-03-04 11:55:42 +01:00
										 |  |  |         id test-id | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  |     } | 
					
						
							|  |  |  |     debug | 
					
						
							|  |  |  |     log | 
					
						
							|  |  |  | } | 
					
						
							|  |  |  | ~~~ | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This will load balance any names under `lb.example.org` using the data from the manager running on | 
					
						
							| 
									
										
										
										
											2020-03-04 11:55:42 +01:00
										 |  |  | localhost on port 18000. The node ID will be `test-id` and no TLS will be used. Assuming a | 
					
						
							|  |  |  | management server returns config for `web` cluster, you can query CoreDNS for it, below we do an | 
					
						
							|  |  |  | address lookup, which returns an address for the endpoint. The second example shows a SRV lookup | 
					
						
							| 
									
										
										
										
											2020-07-09 22:02:07 +02:00
										 |  |  | which returns all endpoints. | 
					
						
							| 
									
										
										
										
											2020-03-04 11:55:42 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | ~~~ sh | 
					
						
							| 
									
										
										
										
											2020-03-05 11:02:54 +01:00
										 |  |  | $ dig web.lb.example.org +noall +answer | 
					
						
							| 
									
										
										
										
											2020-03-04 11:55:42 +01:00
										 |  |  | web.lb.example.org.	5	IN	A	127.0.1.1 | 
					
						
							| 
									
										
										
										
											2020-03-05 11:02:54 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | $ dig web.lb.example.org SRV +noall +answer +additional | 
					
						
							| 
									
										
										
										
											2020-03-04 11:55:42 +01:00
										 |  |  | web.lb.example.org.	5	IN	SRV	100 100 18008 endpoint-0.web.lb.example.org. | 
					
						
							|  |  |  | web.lb.example.org.	5	IN	SRV	100 100 18008 endpoint-1.web.lb.example.org. | 
					
						
							|  |  |  | web.lb.example.org.	5	IN	SRV	100 100 18008 endpoint-2.web.lb.example.org. | 
					
						
							| 
									
										
										
										
											2020-03-05 11:02:54 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-03-04 11:55:42 +01:00
										 |  |  | endpoint-0.web.lb.example.org. 5 IN	A	127.0.1.1 | 
					
						
							|  |  |  | endpoint-1.web.lb.example.org. 5 IN	A	127.0.1.2 | 
					
						
							|  |  |  | endpoint-2.web.lb.example.org. 5 IN	A	127.0.2.1 | 
					
						
							|  |  |  | ~~~ | 
					
						
							| 
									
										
										
										
											2019-10-05 11:45:45 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | ## Bugs
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-03-05 17:33:35 +01:00
										 |  |  | Credentials are not implemented. Bootstrapping is not fully implemented, *traffic* will connect to | 
					
						
							| 
									
										
										
										
											2020-07-14 21:21:06 +02:00
										 |  |  | the first working **TO** address, but then stops short of re-connecting to the endpoints of the | 
					
						
							|  |  |  | management **CLUSTER**. | 
					
						
							| 
									
										
										
										
											2020-02-07 13:28:10 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-03-04 11:55:42 +01:00
										 |  |  | Load reporting is not supported for the following reason: A DNS query is done by a resolver. | 
					
						
							|  |  |  | Behind this resolver (which can also cache) there may be many clients that will use this reply. The | 
					
						
							|  |  |  | responding server (CoreDNS) has no idea how many clients use this resolver. So reporting a load of | 
					
						
							|  |  |  | +1 on the CoreDNS side can results in anything from 1 to 1000+ of queries on the endpoint, making | 
					
						
							| 
									
										
										
										
											2020-07-14 21:21:06 +02:00
										 |  |  | the load reporting from the *traffic* plugin highly inaccurate. Hence it is not done. | 
					
						
							| 
									
										
										
										
											2020-03-04 11:55:42 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2020-02-07 13:28:10 +01:00
										 |  |  | ## Also See
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A Envoy management server and command line interface can be found on | 
					
						
							|  |  |  | [GitHub](https://github.com/miekg/xds). |