Best Practices Rancher

This document (000020105) is provided subject to the disclaimer at the end of this document.

Environment

Rancher 2.x

Situation

This article aims to provide a number of checks that can be evaluated to ensure best practices are in place when planning, building or preparing a Rancher 2.x and Kubernetes environment.

Resolution

1. Architecture

1.1 Nodes

Understanding workload resource needs in downstream clusters upfront can help choose an appropriate node configuration; some nodes may need different configurations; however, all nodes of the same role are generally configured the same.

Checks

Standardize on supported versions and ensure minimum requirements are met:

Confirm the OS is covered in the supported versions
Resource needs can vary based on cluster size and workload, however, in general, no less than 8GB of memory and 2 vCPUs is recommended
SSD storage is recommended, and should be considered a minimum requirement for server nodes, or nodes with the etcd role
Firewall rules allow connectivity for nodes ( RKE, RKE2, k3s)
A static IP for all nodes is required, if using DHCP, all nodes should have a reserved address
Swap is disabled on the nodes
Unique hostnames are used for every node within a cluster
NTP is enabled on the nodes

1.2 Separation of concerns

The Rancher management cluster should be dedicated to running the Rancher deployment, additional workloads added to the cluster can contend for resources and impact the performance and predictability of Rancher.

Applications that are part of the rancher catalog (e.g. rancher-monitoring, rancher-logging, neuvector, rancher-cis-benchmark) may be deployed into the Rancher Management Cluster as well, but it is important to ensure there are sufficient resources (cpu, memory, disk, network).

This is also important to consider in downstream clusters, the etcd and control plane nodes (RKE), and server nodes (RKE2/k3s) should be dedicated to the purpose. For large clusters it may also be appropriate that each node has a single role, for example, separate nodes for the etcd and control plane roles.

Checks

Using the following commands on each cluster, check and confirm for any unexpected workloads running on the Rancher management cluster, or running on the server or etcd/control plane nodes of a downstream cluster.

Rancher management (local) cluster

Check for any unexpected pods running in the cluster: kubectl get pods --all-namespaces
Check for any single points of failure or discrepancies in OS, kernel and CRI version: kubectl get nodes -o wide

Downstream cluster

Check for any unexpected pods running on server nodes:

for n in $(kubectl get nodes -l node-role.kubernetes.io/master=true --no-headers | cut -d " " -f1)
  do
    kubectl get nodes --field-selector metadata.name=${n} --no-headers
    kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${n}; echo
done

Note: RKE does not use the node-role.kubernetes.io/master=true label used in the above command, the below commands select with labels in use by all distributions.

Check for any unexpected pods running on etcd nodes:

for n in $(kubectl get nodes -l node-role.kubernetes.io/etcd=true --no-headers | cut -d " " -f1)
  do
    kubectl get nodes --field-selector metadata.name=${n} --no-headers
    kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${n}; echo
done

Check for any unexpected pods running on control plane nodes:

for n in $(kubectl get nodes -l node-role.kubernetes.io/controlplane=true --no-headers | cut -d " " -f1)
  do
    kubectl get nodes --field-selector metadata.name=${n} --no-headers
    kubectl get pods --all-namespaces -o wide --field-selector spec.nodeName=${n}; echo
done

1.3 High Availability

Ensure nodes within a cluster are spread across separate failure boundaries where possible. This could mean VMs running on separate physical hosts, data centres, switches, storage pools, etc. If running in a cloud environment, instances in separate availability zones.

For High Availability in Rancher, a Kubernetes install is required.

Checks

When deploying the Rancher management (local) cluster it is recommended to use the following configuration:

Distribution	Recommendation
RKE	3 nodes with all roles
RKE2	3 server nodes (all roles)
k3s (external datastore)	2 server nodes
k3s (embedded etcd)	3 server nodes (all roles)

Confirm the components of all clusters and external datastores (k3s) are satisfying minimum HA requirements:

RKE / RKE2

Component	Minimum	Recommended	Notes
etcd nodes	3	3	To maintain quorum it is important to have an uneven # of nodes, and to provide tolerance for at least 1 node failure
control plane nodes	2	2	Allow tolerance for at least 1 node failure
worker nodes	2	N/A	Allow tolerance for at least 1 worker node failure, scale up to meet the workload needs

k3s

Component	Minimum	Recommended	Notes
external datastore	2	2 or greater	(optional) The external datastore should provide failover to a standby using the datastore-endpoint
server nodes	2	2 or greater	(external datastore) Allow tolerance for at least 1 server node failure
server nodes	3	3	(embedded etcd) To maintain quorum it is important to have an uneven # of nodes, and to provide tolerance for at least 1 node failure
agent nodes	2	N/A	Allow tolerance for at least 1 agent node failure, scale up to meet the workload needs

K3s allows for external (SQL) and embedded (etcd) datastore options, please refer to the appropriate notes in the table.

Cloud provider

The following commands can also be used with clusters configured with a cloud provider to review the instance type and availability zones of each node and identify any high availability concerns.

kubectl get nodes --show-labels

Labels may not be available on all cloud providers.

1.4 Load balancer

To provide a consistent endpoint for the Rancher management cluster, a load balancer is highly recommended to ensure the Rancher agents, UI, and API connectivity can effectively reach the Rancher deployment.

Checks

The load balancer is configured:

Within close proximity of the Rancher management cluster to reduce latency
For high availability, with all Rancher management nodes configured as upstream targets
With a health check to one of the following paths:

Distribution	Health check path
RKE	`/healthz`
RKE2	`/healthz`
k3s (traefik)	`/ping`

A health check interval is generally recommended at 30 seconds or less

1.5 Proximity and latency

For performance reasons, it is recommended to avoid spreading cluster nodes over long distances and unreliable networks. For example, nodes could be in separate AZs in the same region, the same datacenter, or separate nearby data centres.

This is particularly important for etcd nodes which are sensitive to network latency, the RTT between etcd nodes in the cluster will determine the minimum time to complete a commit.

Checks

Network latency and bandwidth is adequate between locations that the cluster nodes will be provisioned

A tool like mtr to gather connectivity statistics between locations over a long sample period can be useful to report on the packet loss and latency.

Generally latency between etcd nodes is recommended at 5s or less

1.6 Datastore

It is important to ensure that the chosen datastore is capable of handling requests inline with the workload of the cluster.

Allocation of resources, storage performance, and tuning of the datastore may be needed over time, this could be due to an increase in churn in a cluster, downstream clusters growing in size, or the number of downstream clusters Rancher is managing increases.

Checks

Confirm the recommended options are met for the distribution in use:

k3s (external datastore)

With an external datastore the general performance requirements include:

SSD or similar storage providing 1,000 IOPs or greater performance
Datastore servers are assigned 2 vCPUs and 4GB memory or greater
A low latency connection to the datastore endpoint from all k3s server nodes

MySQL 5.7 is recommended. If running in a cloud provider, you may wish to utilise a managed database service.

RKE, RKE2 and k3s (embedded etcd)

To confirm the storage performance of etcd nodes is capable of handling the workload:

A benchmark tool like fio can be used to accurately test the underlying disk for fsync latency. Alternatively, a basic self-test can be run on RKE and RKE2 with the respective commands below:

RKE

docker exec -e ETCDCTL_ENDPOINTS=$(docker exec etcd etcdctl member list | cut -d, -f5 | sed -e 's/ //g' | paste -sd ',') etcd etcdctl check perf

RKE2

export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
etcdcontainer=$(/var/lib/rancher/rke2/bin/crictl ps --label io.kubernetes.container.name=etcd --quiet)
/var/lib/rancher/rke2/bin/crictl exec $etcdcontainer etcdctl --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt check perf

Nodes with the etcd role have SSD or similar storage providing high IOPs and low latency

On large downstream or Rancher environments, tuning etcd may be needed, including adding dedicated disk for etcd.

1.7 CIDR selection

The cluster, node, and service CIDRs cannot be changed once a cluster is provisioned.

For this reason, it is important to future proof by reviewing the ranges to avoid routing overlaps with other areas of the network, and potential cluster IP exhaustion if the defaults are not suitable.

Checks

The default CIDR ranges do not overlap with any area of the network that needs to be routable from clusters

The default CIDRs are below which often don't need to be changed, to ensure the are no issues with routing from pods you may wish to adjust the range and/or mask when creating clusters ( RKE, RKE2, k3s).

Network	Default CIDR
Cluster	10.42.0.0/16
Service	10.43.0.0/16
Node Mask	/24

Reducing the CIDR mask can lower the number of IPs available and therefore total number of pods and services in the cluster, or on each node. In a large cluster, the CIDR ranges may need to be increased.

1.8 Authorized cluster endpoint

At times connecting directly to a downstream cluster may be desired, this could be to reduce latency, avoid interruption if Rancher is unavailable, or avoid proxying the requests when a high frequency of API calls are expected - for example, external monitoring, automation or a CI/CD pipeline.

Checks

Check for any use cases where an authorized cluster endpoint is needed

Access directly to the downstream cluster kube-apiserver can be configured using the secondary context in the kubeconfig file.

2. Best Practices

2.1 Installing Rancher

It is highly encouraged to install Rancher on a Kubernetes cluster in an HA configuration.

When starting with a single node for the Rancher management (local) cluster, at the minimum it is highly recommended to install on a single node Kubernetes cluster to improve configuration and management of the Rancher environment.

The intended use case of the single node Docker install is for short-lived testing environments, migration from a Docker to a Kubernetes install is not possible and will require migrating Rancher to the new cluster using the Backup Operator.

Checks

Rancher is installed on a Kubernetes cluster, even if that is a single node cluster

2.2 Rancher Resources

The minimum resource requirements for nodes in the Rancher management (local) cluster need to scale to match the number of downstream clusters and nodes, this may change over time and need reviewing as changes occur in the environment.

Checks

Verify that nodes in the Rancher management cluster meet at least the minimum requirements:

Resource
CPU/Memory
Port requirements

2.3 Chart options

When installing the Rancher helm chart, the default options may not always be the best fit for specific environments.

Checks

The Rancher helm chart is installed with the desired options
replicas - the default number of Rancher replicas ( 3) may not suit your cluster, for example, a k3s cluster with an external datastore may only need a replicas value of 2 to ensure only one Rancher pod is running per k3s server node.

Note: it is generally recommended to not configure more than 3 Rancher replicas

antiAffinity - the default preferred scheduling can mean Rancher pods become imbalanced during the lifetime of a cluster, using required can ensure Rancher is always scheduled on unique nodes

To confirm the options provided on an existing Rancher install, the following command can be used helm get values rancher -n cattle-system

2.4 Supported versions

When choosing or maintaining the components for Rancher and Kubernetes clusters the product lifecycle and support matrix can be used to ensure the versions and OS configurations are certified and maintained.

Checks

Current Rancher, OS and Kubernetes versions are under maintenance and certified

As versions are a moving target, checking the current stable releases and planning for future upgrades on a schedule is recommended.

The Rancher Upgrade Checklist can be a useful refresher when planning an upgrade.

2.5 Recurring snapshots and backups

It is important to configure snapshots on a recurring schedule and store these externally to the cluster for disaster recovery.

Checks

Recurring snapshots are configured for the distribution in use

Distribution	Configuration
RKE	Confirm recurring snapshots are enabled with an S3 compatible endpoint for off-node copies
RKE2	Confirm recurring snapshots are enabled with an S3 compatible endpoint for off-node copies
k3s (embedded etcd)	Confirm recurring snapshots are enabled with an S3 compatible endpoint for off-node copies
k3s (external datastore)	Confirm backups on the external datastore are configured, this can differ depending on the chosen database

In addition to a recurring schedule, it's important to take one-time snapshots of etcd (RKE, RKE2 and k3s (embedded)) , or datastore (k3s) before and after significant changes.

The Rancher backup operator can also be used on any distribution or hosted provider to backup the related objects that Rancher needs to function, this can be used to restore to a previous backup point or migrate Rancher between clusters.

2.6 Provisioning

Provisioning clusters, nodes and workloads for Rancher and downstream clusters in a repeatable and automated way can improve the supportability of Rancher and Kubernetes. When configuration is stored in source control it can also assist with auditing changes.

Checks

Consider the below points for the Application, Rancher and Kubernetes environments:

Manifests and configuration data for application workloads are stored in source control, treated as the source of truth for all containerized applications and deployed to clusters
Infrastructure as Code for provisioning and configuration of Kubernetes clusters and workloads
CI/CD pipeline to automate deployments and configuration changes

The rancher2 terraform provider and pulumi package can be used to manage cluster provisioning and configuration as code with Rancher.

The helm and kubernetes providers for terraform can be useful to deploy and manage application workloads, similar packages are available for pulumi.

2.7 Managing node lifecycle

When making significant planned changes it is important to drain nodes that are being affected to avoid disrupting in-flight connections, such as restarting Docker, patching, shutting down or removing nodes.

For example, the kube-proxy component manages iptables rules on nodes to manage service endpoints, if a node is suddenly shutdown, stale endpoints and orphaned pods can be left in place for a period of time causing connectivity issues.

In some cases during an unplanned issue, draining can be automated, such as when a node may be terminated, restarted, or shutdown.

Checks

A process is in place to drain before planned disruptive changes are performed on a node
Where possible, node draining during the shutdown sequence is automated, for example, with a systemd or similar service

3. Operating Kubernetes

3.1 Capacity planning and Monitoring

It is recommended to measure resource usage of all clusters by enabling monitoring in Rancher, or your chosen solution. It is recommended to alert on critical resource thresholds and events in the cluster.

On supported platforms, Cluster Autoscaler can be used to ensure the number of nodes is right-sized for shceduled workloads. Combining this with Horizontal Pod Autoscaler provides both application and infrastructure scaling capabilities.

Checks

Monitoring is enabled for the Rancher and downstream clusters
A receiver is configured to stay informed if an alarm or event occurs
A process for adding/removing nodes is established, automated if possible

3.2 Probes

In the defence against service and pod related failures, liveness and readiness probes are very useful, these can be in the form of HTTP requests, commands, or TCP connections.

Checks

Liveness and Readiness probes are configured where necessary
Probes do not rely on the success of upstream dependencies, only the running application in the pod

3.3 Resources

Assigning resource requests to pods allows the kube-scheduler to make more informed placement decisions, avoiding the "bin packing" of pods onto nodes and resource contention. For CPU requests, this also allocates a share of CPU time based on the request when the node is under contention.

Limits also offer value in the form of a safety net against pods consuming an undesired amount of resources. One caveat specific to CPU limits, these can introduce CPU throttling when over used.

Additionally, it can also be useful to reserve capacity on nodes to prevent allocating resources that may be consumed by the kubelet and other system daemons, like Docker.

Checks

Workloads define resource CPU and Memory requests where applicable, use CPU limits sparingly
Nodes have system and daemon reservations where necessary

When Rancher Monitoring is enabled, the graphs in Grafana can be used to find a baseline of CPU and Memory for resource requests

3.4 OS Limits

Containerized applications can consume high amounts of OS resources, such as open files, connections, processes, filesystem space and inodes.

Often the defaults are adequate; however, establishing a standardized image for all nodes can help establish a baseline for all configuration and tuning.

Checks

In general, the below can be used to confirm the OS limits allow for adequate headroom for the workloads

File descriptor usage: cat /proc/sys/fs/file-nr
User ulimits: ulimit -a Or, a particular process can be checked: cat /proc/PID/limits
Conntrack limits:

cat /proc/sys/net/netfilter/nf_conntrack_max

cat /proc/sys/net/netfilter/nf_conntrack_count

Filesystem space and inode usage: df -h and df -ih

Requirements for Linux can differ slightly depending on the distribution, refer to the requirements ( RKE , RKE2 , k3s) for more information.

3.5 Log rotation

To prevent large log files from accumulating, it is recommended to rotate OS and container log files. Optionally an external log service can be used to stream logs off the nodes for a longer-term lifecycle and easier searching.

Checks

Containers

Log rotation is configured for the container logs
An external logging service is configured as needed

To rotate container logs with RKE, configure log rotation in the /etc/daemon.json file with a size and retention configuration.

The below can be used as an example for RKE2 and k3s in the config.yaml file, alternatively these can also be set directly when installing a standalone cluster using the install script:

kubelet-arg:
  - container-log-max-files=4
  - container-log-max-size=50Mi

Log rotation is configured for the container logs
An external logging service is configured as needed

OS

Rotation of log files on nodes is also important, especially if a long node lifecycle is expected.

3.6 DNS scalability

DNS is a critical service running within the cluster, as CoreDNS pods are scheduled throughout the cluster, the service availability depends on the accessibility of all CoreDNS pods in the service.

This is where Nodelocal DNS cache is recommended for clusters that may service a high amount of DNS requests, or clusters very senstive to DNS issues.

Checks

If a cluster has experienced a DNS issue, or is known to handle a high amount of DNS queries:

Check the output of conntrack -S on related nodes.

High amounts of the insert_failed counter can be indicative of a conntrack race condition, deploying Nodelocal DNS cache is recommended to mitigate this.

Status

Top Issue

Additional Information

For further information see the documentation for Best Practices

Disclaimer

This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.