Azure AD - An error occurred logging in Server error while authenticating
Article Number: 000021599
Environment
- Rancher versions: 2.9.1, 2.9.2, 2.9.3
- Auth provider: Azure AD
Situation
When upgrading to Rancher 2.9.1 or 2.9.2 and use Azure AD as your main auth provider to login, after a certain amount of time users will be unable to login and will receive the following message:
An error occurred logging in Server error while authenticating
The reason is because in the local Rancher cluster, there is a secret called `azuread-access-token` in the `cattle-global-data` namespace that appends user login information whenever a user logs in. Over time, the secret will grow in size till eventually reaching Kubernetes max secret size: 1MB or 1048576 bytes.
Note: The secret can reach over this limit, and when it does, that's when we start to see users not able to login to Rancher. To verify it's size you can run a couple of commands:
kubectl get secret azuread-access-token -n <namespace> -o jsonpath="{.data}" | base64 -d | wc -c
or
kubectl describe secret azuread-access-token -n cattle-global-data | grep bytes
Cause
The cause is due to the azuread-access-token
filling up with user login information, till eventually hitting Kubernetes max limit. The Azure client login was using the access token cache, which led to additional tokens being cached.
Resolution
As a workaround, remove the azuread-access-token
in the cattle-global-data
namespace. Once deleted, verify that the secret is indeed deleted. The secret will get recreated when a user logs back into Rancher. And the size of the secret should decrease.
In the official patch, we changed the behavior to create a new client for every token authentication which doesn't use the cache. This patch will be included in Rancher 2.10 as well as backported to 2.9 more specifically in >=2.9.4:
- https://github.com/rancher/rancher/issues/47672 (For 2.10)
- https://github.com/rancher/rancher/issues/47688 (2.9 backport)