How to setup Nodelocal DNS cache with Rancher, RKE1 and RKE2
This document (000020174) is provided subject to the disclaimer at the end of this document.
Situation
Why use Nodelocal DNS cache?
Like many applications in a containerised architecture, CoreDNS or kube-dns runs in a distributed fashion. In certain circumstances, DNS reliability and latency can be impacted with this approach. The causes of this relate notably to conntrack race conditions or exhaustion, cloud provider limits, and the unreliable nature of the UDP protocol.
A number of workarounds exist, however long term mitigation of these and other issues has resulted in a redesign of the Kubernetes DNS architecture, and the result being the Nodelocal DNS cache project.
Requirements
- A Kubernetes cluster provisioned by Rancher v2.x, or directly with RKE1 and RKE2
- A Linux cluster, Windows is currently not supported
- Access to the cluster
Resolution
Installing
Once installed, pods will begin to resolve using the node-local-dns pod on the same node, below are details for RKE1 and RKE2 when provisioning using Rancher. These same steps can be applied in a similar way when directly provisioning a cluster.
RKE1
When provisioning or configuring an existing cluster, edit the cluster configuration in the Rancher dashboard, and click the 'Edit as YAML' button. When provisioning an RKE cluster directly, edit the cluster.yaml file instead.
Note: Updating the cluster using the below will create the node-local-dns
Daemonset, and restart the kubelet
container on each node.
As in the documentation, update or add the dns.nodelocal.ip_address
field using the following as an example:
dns:
[..]
nodelocal:
ip_address: "169.254.20.10"
The kubelet will be updated to use the new IP address when configuring pod DNS resolution. Pods using the CoreDNS service address (default: 10.43.0.10) as the nameserver in /etc/resolv.conf
will still resolve using the node-local-dns pod on the node. This is due to the way node-local-dns manages it's own interface and iptables rules.
RKE2
Update the default HelmChart for CoreDNS, the nodelocal.enabled: true value will install node-local-dns in the cluster.
When provisioning or configuring an existing cluster, edit the cluster configuration in the Rancher dashboard, and select Add-On Config. At the bottom of the page paste the following into the Additional Manifest text area:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-coredns
namespace: kube-system
spec:
valuesContent: |-
nodelocal:
enabled: true
Save the changes, please see the documentation here for more details.
When provisioning an RKE2 cluster directly, this file can be copied into the /var/lib/rancher/rke2/server/manifests directory on each rke2-server node, manually or with user-data/configuration management.
Testing
Once installed, start a new pod to test DNS queries, for example:
kubectl run --restart=Never --rm -it --image=tutum/dnsutils dns-test -- dig google.com
To verify node-local-dns is available and handling DNS queries, here are some ways to confirm:
- Check for a nodelocaldns interface on a node, for example:
# ip addr show nodelocaldns
21: nodelocaldns: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether e2:a9:45:f9:29:94 brd ff:ff:ff:ff:ff:ff
inet 169.254.20.10/32 scope global nodelocaldns
valid_lft forever preferred_lft forever
inet 10.43.0.10/32 scope global nodelocaldns
valid_lft forever preferred_lft forever
-
Temporarily enable query logging for node-local-dns:
-
Edit the node-local-dns ConfigMap to add the log plugin, locate and edit the ConfigMap in the kube-system namespace in the Rancher dashboard, or use kubectl edit configmap -n kube-system node-local-dns
-
Add log to the cluster.local and :53 objects in the Corefile, for example for :53 (external queries):
-
[...] .:53 { log errors cache 30
- Check the node-local-dns pod logs once some DNS queries have been performed, the logs should indicate queries are being answered
- Perform the reverse of steps 1-2 to disable query logging
Removing Nodelocal DNS cache
To remove from a cluster, the reverse steps are needed:
RKE1
Remove the dns.nodelocal
field from the cluster configuration in the Rancher dashboard and save the change. When provisioning a cluster directly, run rke up to reconcile the change.
RKE2
Remove the additional manifest in the Rancher dashboard, or delete the manifest file from all of the rke2-server nodes when provisioning the cluster directly.
Additional Information
Troubleshooting
Nodelocal DNS will perform external lookups on behalf of pods, this lookup occurs from the node-local-dns DaemonSet pod running on the same node as the pod.
For internal lookups, CoreDNS will be used, node-local-dns pods will cache successful queries (30s), and negative queries (5s) by default. For an architecture overview please see the diagram here.
In no specific order, the following can help understand a DNS issue further.
Check all kube-dns and node-local-dns objects
Ensure there are no obvious issues with scheduling CoreDNS and node-local-dns pods in the cluster.
kubectl get all -n kube-system -l k8s-app=node-local-dns
kubectl get all -n kube-system -l k8s-app=kube-dns
All node-local-dns and kube-dns pods should be ready and running, the kube-dns Service should exist. Check the events if needed to locate any warning or failed event messages.
kubectl describe ds -n kube-system -l k8s-app=node-local-dns
kubectl describe rs -n kube-system -l k8s-app=kube-dns
Check the logs and ConfigMap of kube-dns and node-local-dns pods
kubectl logs -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=node-local-dns
kubectl get configmap -n kube-system coredns -o yaml
kubectl get configmap -n kube-system node-local-dns -o yaml
Enable logging and perform a DNS test
Note, query logging can increase the log output from CoreDNS, enabling this temporarily while investigating is suggested.
- Enable query logging for node-local-dns with the following steps:
- Edit the node-local-dns ConfigMap to add the log plugin, locate and edit the ConfigMap in the kube-system namespace in the Rancher dashboard, or use kubectl edit configmap -n kube-system node-local-dns
-
Add log to the cluster.local and :53 objects in the Corefile, for example for :53 (external queries):
[...] .:53 { log errors cache 30
- Check the node-local-dns pod logs once some DNS queries have been performed, the logs should indicate queries are being answered
- Perform the reverse of steps 1-2 to disable query logging
- Query logging for CoreDNS can be enabled in a similar way, when Nodelocal DNS is enabled, this will only log internal (cluster.local) queries that were not already cached
- Run a DaemonSet to perform queries from a pod running on each node in the cluster
Ask questions to further eliminate the issue
- Is it only DNS that is affected, or is all connectivity affected?
- Are internal, external, or all DNS queries failing?
- Are all nodes and workloads experiencing the issue, or a specific node or workload? Nodes use the upstream DNS configured in
/etc/resolv.conf
, queries failing from a node could indicate the issue is with upstream DNS - What is the error reported by applications? If logs are aggregated, investigate the rate of the error in logs to identify timelines and impact
- Is the issue intermittent or constantly occurring? If the issue is intermittent, configure monitoring or a loop to identify when the issue occurs, when it does - what is the error? are internal, external or all queries affected?
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.