Best Practices Rancher
This document (000020105) is provided subject to the disclaimer at the end of this document.
Environment
Rancher 2.x
Situation
This article aims to provide a number of checks that can be evaluated to ensure best practices are in place when planning, building or preparing a Rancher 2.x and Kubernetes environment.
Resolution
1. Architecture
1.1 Nodes
Understanding workload resource needs in downstream clusters upfront can help choose an appropriate node configuration; some nodes may need different configurations; however, all nodes of the same role are generally configured the same.
Checks
Standardize on supported versions and ensure minimum requirements are met:
- Confirm the OS is covered in the supported versions
- Resource needs can vary based on cluster size and workload, however, in general, no less than 8GB of memory and 2 vCPUs is recommended
- SSD storage is recommended, and should be considered a minimum requirement for server nodes, or nodes with the
etcd
role - Firewall rules allow connectivity for nodes ( RKE, RKE2, k3s)
- A static IP for all nodes is required, if using DHCP, all nodes should have a reserved address
- Swap is disabled on the nodes
- Unique hostnames are used for every node within a cluster
- NTP is enabled on the nodes
1.2 Separation of concerns
The Rancher management cluster should be dedicated to running the Rancher deployment, additional workloads added to the cluster can contend for resources and impact the performance and predictability of Rancher.
Applications that are part of the rancher catalog (e.g. rancher-monitoring, rancher-logging, neuvector, rancher-cis-benchmark) may be deployed into the Rancher Management Cluster as well, but it is important to ensure there are sufficient resources (cpu, memory, disk, network).
This is also important to consider in downstream clusters, the etcd and control plane nodes (RKE), and server nodes (RKE2/k3s) should be dedicated to the purpose. For large clusters it may also be appropriate that each node has a single role, for example, separate nodes for the etcd and control plane roles.
Checks
Using the following commands on each cluster, check and confirm for any unexpected workloads running on the Rancher management cluster, or running on the server or etcd/control plane nodes of a downstream cluster.
Rancher management (local) cluster
- Check for any unexpected pods running in the cluster:
kubectl get pods --all-namespaces
- Check for any single points of failure or discrepancies in OS, kernel and CRI version:
kubectl get nodes -o wide
Downstream cluster
- Check for any unexpected pods running on server nodes:
for n in $(kubectl get nodes -l node-role.kubernetes.io/master=true --no-headers | cut -d " " -f1)
do
kubectl get nodes --field-selector metadata.name=${n} --no-headers
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${n}; echo
done
Note: RKE does not use the node-role.kubernetes.io/master=true label used in the above command, the below commands select with labels in use by all distributions.
- Check for any unexpected pods running on etcd nodes:
for n in $(kubectl get nodes -l node-role.kubernetes.io/etcd=true --no-headers | cut -d " " -f1)
do
kubectl get nodes --field-selector metadata.name=${n} --no-headers
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${n}; echo
done
- Check for any unexpected pods running on control plane nodes:
for n in $(kubectl get nodes -l node-role.kubernetes.io/controlplane=true --no-headers | cut -d " " -f1)
do
kubectl get nodes --field-selector metadata.name=${n} --no-headers
kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${n}; echo
done
1.3 High Availability
Ensure nodes within a cluster are spread across separate failure boundaries where possible. This could mean VMs running on separate physical hosts, data centres, switches, storage pools, etc. If running in a cloud environment, instances in separate availability zones.
For High Availability in Rancher, a Kubernetes install is required.
Checks
- When deploying the Rancher management (local) cluster it is recommended to use the following configuration:
Distribution | Recommendation |
---|---|
RKE | 3 nodes with all roles |
RKE2 | 3 server nodes (all roles) |
k3s (external datastore) | 2 server nodes |
k3s (embedded etcd) | 3 server nodes (all roles) |
- Confirm the components of all clusters and external datastores (k3s) are satisfying minimum HA requirements:
RKE / RKE2
Component | Minimum | Recommended | Notes |
---|---|---|---|
etcd nodes | 3 | 3 | To maintain quorum it is important to have an uneven # of nodes, and to provide tolerance for at least 1 node failure |
control plane nodes | 2 | 2 | Allow tolerance for at least 1 node failure |
worker nodes | 2 | N/A | Allow tolerance for at least 1 worker node failure, scale up to meet the workload needs |
k3s
Component | Minimum | Recommended | Notes |
---|---|---|---|
external datastore | 2 | 2 or greater | (optional) The external datastore should provide failover to a standby using the datastore-endpoint |
server nodes | 2 | 2 or greater | (external datastore) Allow tolerance for at least 1 server node failure |
server nodes | 3 | 3 | (embedded etcd) To maintain quorum it is important to have an uneven # of nodes, and to provide tolerance for at least 1 node failure |
agent nodes | 2 | N/A | Allow tolerance for at least 1 agent node failure, scale up to meet the workload needs |
K3s allows for external (SQL) and embedded (etcd) datastore options, please refer to the appropriate notes in the table.
Cloud provider
The following commands can also be used with clusters configured with a cloud provider to review the instance type and availability zones of each node and identify any high availability concerns.
kubectl get nodes --show-labels
Labels may not be available on all cloud providers.
1.4 Load balancer
To provide a consistent endpoint for the Rancher management cluster, a load balancer is highly recommended to ensure the Rancher agents, UI, and API connectivity can effectively reach the Rancher deployment.
Checks
The load balancer is configured:
- Within close proximity of the Rancher management cluster to reduce latency
- For high availability, with all Rancher management nodes configured as upstream targets
- With a health check to one of the following paths:
Distribution | Health check path |
---|---|
RKE | /healthz |
RKE2 | /healthz |
k3s (traefik) | /ping |
A health check interval is generally recommended at 30 seconds or less
1.5 Proximity and latency
For performance reasons, it is recommended to avoid spreading cluster nodes over long distances and unreliable networks. For example, nodes could be in separate AZs in the same region, the same datacenter, or separate nearby data centres.
This is particularly important for etcd nodes which are sensitive to network latency, the RTT between etcd nodes in the cluster will determine the minimum time to complete a commit.
Checks
- Network latency and bandwidth is adequate between locations that the cluster nodes will be provisioned
A tool like
mtr
to gather connectivity statistics between locations over a long sample period can be useful to report on the packet loss and latency.Generally latency between etcd nodes is recommended at 5s or less
1.6 Datastore
It is important to ensure that the chosen datastore is capable of handling requests inline with the workload of the cluster.
Allocation of resources, storage performance, and tuning of the datastore may be needed over time, this could be due to an increase in churn in a cluster, downstream clusters growing in size, or the number of downstream clusters Rancher is managing increases.
Checks
Confirm the recommended options are met for the distribution in use:
k3s (external datastore)
With an external datastore the general performance requirements include:
- SSD or similar storage providing 1,000 IOPs or greater performance
- Datastore servers are assigned 2 vCPUs and 4GB memory or greater
- A low latency connection to the datastore endpoint from all k3s server nodes
MySQL 5.7 is recommended. If running in a cloud provider, you may wish to utilise a managed database service.
RKE, RKE2 and k3s (embedded etcd)
To confirm the storage performance of etcd nodes is capable of handling the workload:
- A benchmark tool like
fio
can be used to accurately test the underlying disk for fsync latency. Alternatively, a basic self-test can be run on RKE and RKE2 with the respective commands below:
RKE
docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd /bin/sh -c "etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ','") etcd etcdctl check perf
RKE2
export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
etcdcontainer=$(/var/lib/rancher/rke2/bin/crictl ps --label io.kubernetes.container.name=etcd --quiet)
crictl exec $etcdcontainer sh -c "ETCDCTL_CACERT='/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/rke2/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/rke2/server/tls/etcd/server-client.key' ETCDCTL_API=3 etcdctl check perf
- Nodes with the
etcd
role have SSD or similar storage providing high IOPs and low latency
On large downstream or Rancher environments, tuning etcd may be needed, including adding dedicated disk for etcd.
1.7 CIDR selection
The cluster, node, and service CIDRs cannot be changed once a cluster is provisioned.
For this reason, it is important to future proof by reviewing the ranges to avoid routing overlaps with other areas of the network, and potential cluster IP exhaustion if the defaults are not suitable.
Checks
- The default CIDR ranges do not overlap with any area of the network that needs to be routable from clusters
The default CIDRs are below which often don't need to be changed, to ensure the are no issues with routing from pods you may wish to adjust the range and/or mask when creating clusters ( RKE, RKE2, k3s).
Network | Default CIDR |
---|---|
Cluster | 10.42.0.0/16 |
Service | 10.43.0.0/16 |
Node Mask | /24 |
Reducing the CIDR mask can lower the number of IPs available and therefore total number of pods and services in the cluster, or on each node. In a large cluster, the CIDR ranges may need to be increased.
1.8 Authorized cluster endpoint
At times connecting directly to a downstream cluster may be desired, this could be to reduce latency, avoid interruption if Rancher is unavailable, or avoid proxying the requests when a high frequency of API calls are expected - for example, external monitoring, automation or a CI/CD pipeline.
Checks
- Check for any use cases where an authorized cluster endpoint is needed
Access directly to the downstream cluster kube-apiserver can be configured using the secondary context in the kubeconfig file.
2. Best Practices
2.1 Installing Rancher
It is highly encouraged to install Rancher on a Kubernetes cluster in an HA configuration.
When starting with a single node for the Rancher management (local) cluster, at the minimum it is highly recommended to install on a single node Kubernetes cluster to improve configuration and management of the Rancher environment.
The intended use case of the single node Docker install is for short-lived testing environments, migration from a Docker to a Kubernetes install is not possible and will require migrating Rancher to the new cluster using the Backup Operator.
Checks
- Rancher is installed on a Kubernetes cluster, even if that is a single node cluster
2.2 Rancher Resources
The minimum resource requirements for nodes in the Rancher management (local) cluster need to scale to match the number of downstream clusters and nodes, this may change over time and need reviewing as changes occur in the environment.
Checks
- Verify that nodes in the Rancher management cluster meet at least the minimum requirements:
Resource |
---|
CPU/Memory |
Port requirements |
2.3 Chart options
When installing the Rancher helm chart, the default options may not always be the best fit for specific environments.
Checks
-
The Rancher helm chart is installed with the desired options
-
replicas
- the default number of Rancher replicas (3
) may not suit your cluster, for example, a k3s cluster with an external datastore may only need areplicas
value of2
to ensure only one Rancher pod is running per k3s server node.
Note: it is generally recommended to not configure more than 3 Rancher replicas
antiAffinity
- the defaultpreferred
scheduling can mean Rancher pods become imbalanced during the lifetime of a cluster, usingrequired
can ensure Rancher is always scheduled on unique nodes
To confirm the options provided on an existing Rancher install, the following command can be used
helm get values rancher -n cattle-system
2.4 Supported versions
When choosing or maintaining the components for Rancher and Kubernetes clusters the product lifecycle and support matrix can be used to ensure the versions and OS configurations are certified and maintained.
Checks
- Current Rancher, OS and Kubernetes versions are under maintenance and certified
As versions are a moving target, checking the current stable releases and planning for future upgrades on a schedule is recommended.
The Rancher Upgrade Checklist can be a useful refresher when planning an upgrade.
2.5 Recurring snapshots and backups
It is important to configure snapshots on a recurring schedule and store these externally to the cluster for disaster recovery.
Checks
- Recurring snapshots are configured for the distribution in use
Distribution | Configuration |
---|---|
RKE | Confirm recurring snapshots are enabled with an S3 compatible endpoint for off-node copies |
RKE2 | Confirm recurring snapshots are enabled with an S3 compatible endpoint for off-node copies |
k3s (embedded etcd) | Confirm recurring snapshots are enabled with an S3 compatible endpoint for off-node copies |
k3s (external datastore) | Confirm backups on the external datastore are configured, this can differ depending on the chosen database |
In addition to a recurring schedule, it's important to take one-time snapshots of etcd (RKE, RKE2 and k3s (embedded)) , or datastore (k3s) before and after significant changes.
The Rancher backup operator can also be used on any distribution or hosted provider to backup the related objects that Rancher needs to function, this can be used to restore to a previous backup point or migrate Rancher between clusters.
2.6 Provisioning
Provisioning clusters, nodes and workloads for Rancher and downstream clusters in a repeatable and automated way can improve the supportability of Rancher and Kubernetes. When configuration is stored in source control it can also assist with auditing changes.
Checks
Consider the below points for the Application, Rancher and Kubernetes environments:
- Manifests and configuration data for application workloads are stored in source control, treated as the source of truth for all containerized applications and deployed to clusters
- Infrastructure as Code for provisioning and configuration of Kubernetes clusters and workloads
- CI/CD pipeline to automate deployments and configuration changes
The rancher2 terraform provider and pulumi package can be used to manage cluster provisioning and configuration as code with Rancher.
The helm and kubernetes providers for terraform can be useful to deploy and manage application workloads, similar packages are available for pulumi.
2.7 Managing node lifecycle
When making significant planned changes it is important to drain nodes that are being affected to avoid disrupting in-flight connections, such as restarting Docker, patching, shutting down or removing nodes.
For example, the kube-proxy
component manages iptables rules on nodes to manage service endpoints, if a node is suddenly shutdown, stale endpoints and orphaned pods can be left in place for a period of time causing connectivity issues.
In some cases during an unplanned issue, draining can be automated, such as when a node may be terminated, restarted, or shutdown.
Checks
- A process is in place to drain before planned disruptive changes are performed on a node
- Where possible, node draining during the shutdown sequence is automated, for example, with a systemd or similar service
3. Operating Kubernetes
3.1 Capacity planning and Monitoring
It is recommended to measure resource usage of all clusters by enabling monitoring in Rancher, or your chosen solution. It is recommended to alert on critical resource thresholds and events in the cluster.
On supported platforms, Cluster Autoscaler can be used to ensure the number of nodes is right-sized for shceduled workloads. Combining this with Horizontal Pod Autoscaler provides both application and infrastructure scaling capabilities.
Checks
- Monitoring is enabled for the Rancher and downstream clusters
- A receiver is configured to stay informed if an alarm or event occurs
- A process for adding/removing nodes is established, automated if possible
3.2 Probes
In the defence against service and pod related failures, liveness and readiness probes are very useful, these can be in the form of HTTP requests, commands, or TCP connections.
Checks
- Liveness and Readiness probes are configured where necessary
- Probes do not rely on the success of upstream dependencies, only the running application in the pod
3.3 Resources
Assigning resource requests to pods allows the kube-scheduler
to make more informed placement decisions, avoiding the "bin packing" of pods onto nodes and resource contention. For CPU requests, this also allocates a share of CPU time based on the request when the node is under contention.
Limits also offer value in the form of a safety net against pods consuming an undesired amount of resources. One caveat specific to CPU limits, these can introduce CPU throttling when over used.
Additionally, it can also be useful to reserve capacity on nodes to prevent allocating resources that may be consumed by the kubelet and other system daemons, like Docker.
Checks
- Workloads define resource CPU and Memory requests where applicable, use CPU limits sparingly
- Nodes have system and daemon reservations where necessary
When Rancher Monitoring is enabled, the graphs in Grafana can be used to find a baseline of CPU and Memory for resource requests
3.4 OS Limits
Containerized applications can consume high amounts of OS resources, such as open files, connections, processes, filesystem space and inodes.
Often the defaults are adequate; however, establishing a standardized image for all nodes can help establish a baseline for all configuration and tuning.
Checks
In general, the below can be used to confirm the OS limits allow for adequate headroom for the workloads
- File descriptor usage:
cat /proc/sys/fs/file-nr
-
User ulimits:
ulimit -a
Or, a particular process can be checked:cat /proc/PID/limits
-
Conntrack limits:
cat /proc/sys/net/netfilter/nf_conntrack_max
cat /proc/sys/net/netfilter/nf_conntrack_count
- Filesystem space and inode usage:
df -h
anddf -ih
Requirements for Linux can differ slightly depending on the distribution, refer to the requirements ( RKE , RKE2 , k3s) for more information.
3.5 Log rotation
To prevent large log files from accumulating, it is recommended to rotate OS and container log files. Optionally an external log service can be used to stream logs off the nodes for a longer-term lifecycle and easier searching.
Checks
Containers
- Log rotation is configured for the container logs
- An external logging service is configured as needed
To rotate container logs with RKE, configure log rotation in the /etc/daemon.json
file with a size and retention configuration.
The below can be used as an example for RKE2 and k3s in the config.yaml file, alternatively these can also be set directly when installing a standalone cluster using the install script:
kubelet-arg:
- container-log-max-files=4
- container-log-max-size=50Mi
- Log rotation is configured for the container logs
- An external logging service is configured as needed
OS
Rotation of log files on nodes is also important, especially if a long node lifecycle is expected.
3.6 DNS scalability
DNS is a critical service running within the cluster, as CoreDNS pods are scheduled throughout the cluster, the service availability depends on the accessibility of all CoreDNS pods in the service.
This is where Nodelocal DNS cache is recommended for clusters that may service a high amount of DNS requests, or clusters very senstive to DNS issues.
Checks
If a cluster has experienced a DNS issue, or is known to handle a high amount of DNS queries:
- Check the output of
conntrack -S
on related nodes.
High amounts of the insert_failed
counter can be indicative of a conntrack race condition, deploying Nodelocal DNS cache is recommended to mitigate this.
Status
Top Issue
Additional Information
For further information see the documentation for Best Practices
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.