Unintended terraform configuration drift caused by machine pool ordering

Article Number: 000022130

Environment

Terraform provisioned RKE2 downstream cluster.

Situation

When adding a new machine pool to an existing Rancher2 RKE2 cluster, Terraform may plan to modify existing machine pools that were provisioned in previous runs, causing unintended updates to control plane and worker nodes.

Cause

Terraform maps are unordered collections. When the Rancher provider converts a map into the provider's TypeList, keys are sorted lexicographically. Machine pools are matched by position (index) inside that list — not by name. Adding a new pool changes the alphabetical order, which shifts indexes. Terraform then incorrectly associates existing list positions with different pool configurations and plans in-place updates to the wrong pools.

Impact

Unexpected in-place modifications of existing pools.
Potential disruption of node roles (control/etcd/worker) and pod scheduling.

Illustrative Example

Assume a scenario where existing cluster has two machine pools: control and worker.

Initial Terraform run

Map input (unordered):

control -> control_plane: true, quantity: 3
worker -> worker_role: true, quantity: 2

Lexicographical sort creates ordered list:

[0] control -> creates control pool
[1] worker -> creates worker pool

Adding new pool NEW-POOL

Map input (unordered):

control -> control_plane: true, quantity: 3
worker -> worker_role: true, quantity: 2
NEW-POOL -> worker_role: true, quantity: 1

Lexicographical sort creates new ordered list:

[0] NEW-POOL -> (index 0 already exists)
[1] control -> (index 1 already exists)
[2] worker -> new

Result (incorrect):

Terraform sees index mismatches and plans to:
Modify existing pool at index 0 (previously control) to NEW-POOL configuration.
Modify existing pool at index 1 (previously worker) to control configuration.
Create a new worker pool at index 2.

Expected: Create one new NEW-POOL pool without modifying control or worker.

Example Terraform plan snippet (illustrative)

# rancher2_cluster_v2.cluster_rke2 will be updated in-place
~ resource "rancher2_cluster_v2" "cluster_rke2" {
~ rke_config {
# Index [0]: Existing "control" pool → incorrectly changed to "NEW-POOL"
~ machine_pools {
~ name = "control" -> "NEW-POOL"
~ control_plane_role = true -> false
~ etcd_role = true -> false
~ worker_role = false -> true
~ quantity = 3 -> 1
~ machine_labels = {
~ "nodepool" = "control" -> "worker"
}
}
# Index [1]: Existing "worker" pool → incorrectly changed to "control"
~ machine_pools {
~ name = "worker" -> "control"
~ control_plane_role = false -> true
~ etcd_role = false -> true
~ worker_role = true -> false
~ quantity = 2 -> 3
~ machine_labels = {
~ "nodepool" = "worker" -> "control"
}
}
# Index [2]: New pool created as "worker" (expected "NEW-POOL")
+ machine_pools {
+ name = "worker"
+ control_plane_role = false
+ worker_role = true
+ quantity = 2
}
}
}

Plan: 0 to add, 1 to change, 0 to destroy.

Resolution

Stable ordering of keys is the simplest and most reliable workaround until a provider-level fix is available.

Use ordered keys (prefix keys with numbers) in the map so that lexicographical sorting produces a stable list order. Example:

1-control:
name: control
control_plane_role: true
quantity: 3
2-worker:
name: worker
worker_role: true
quantity: 2
3-NEW-POOL:
name: NEW-POOL
worker_role: true
quantity: 1

When adding pools:
Add the new entry with a key that maintains the intended final alphabetical order (using numeric prefixes as above). Test in a non-production environment before applying to production clusters.