How to troubleshoot Overlay Network Connectivity issues
This document (000020831) is provided subject to the disclaimer at the end of this document.
Environment
Rancher, Kubernetes
Situation
pod-to-pod communication not happening
Resolution
Pod-to-Pod communication should depend on multiple factors. Mainly, network communication should be allowed between the nodes. The following checkpoints help us trace the root cause of the problem.
- Check that the ports for their overlay are open between nodes (if they have multiple subnets/VLANs/DCs); testing from just one node to nodes in the other network should be good enough, e.g.,
`nc -uvz <node IP> 8472`
(if they use the canal, change the port as needed). Please refer to this article [https://rancher.com/docs/rancher/v2.6/en/installation/requirements/ports/#commonly-used-ports]
- Check the DNS from a test pod with suitable tools (not busybox, it has nslookup issues), The `rancherlabs/swiss-army-knife` image is ideal for this.
# Do this for all coredns pod IPs.
`dig <hostname> @<coredns pod IP>`
# Use the same test pod to test their upstream nameservers (all 3, over a few retries),
`dig <hostname> @<upstream ns IP>`
Refer to this article [ https://docs.ranchermanager.rancher.io/v2.7/troubleshooting/other-troubleshooting-tips/dns ]
[Note: In an air-gap environment, Swiss-army-knife is not available. You can try a specific busy box image with network tools like busybox image v1.28.]
- Check whether the overlay network test is successful or not.
The overlay network procedure tests the pod-to-pod connectivity between the nodes. Refer to this article. [https://docs.ranchermanager.rancher.io/v2.5/troubleshooting/other-troubleshooting-tips/networking#check-if-overlay-network-is-functioning-correctly]
[Note: This overlay test performs the pod-to-pod communication using ICMP protocol, which means you will still see networking issues because TCP communication might be blocked even though the test passes. So you have to test with good network tools like NC and iperf.]
- Check the Infra VMS known issues and ensure that overlay network ports are allowed at the switch level.
# , e.g., In the case of Vmware vSphere version 6.7u2.
1. Change the VXLAN port to 8472 (when NSX is not used) or 4789 (when NSX is used)
2. Disable the VXLAN hardware offload feature on the VMXNET3 NIC (which the recent Linux driver version enabled by default. [https://docs.vmware.com/en/VMware-vSphere/6.7/rn/esxi670-202111001.html -Refer PR 2766401 , https://github.com/projectcalico/calico/issues/4727 ]
Additional Information
Reference Artiles& Links:
https://docs.vmware.com/en/VMware-vSphere/6.7/rn/esxi670-202111001.html -Refer PR 2766401
https://github.com/projectcalico/calico/issues/4727
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.