Fleet Git Repos stuck in "Git Updating" state due to NTP sync issues
This document (000021827) is provided subject to the disclaimer at the end of this document.
Environment
Rancher version: 2.10.x
Situation
Consistent time synchronization between the upstream Rancher cluster and its downstream clusters, ideally through a reliable NTP server, is crucial for stable operation. Time discrepancies can lead to unforeseen issues, including the inability of Fleet to synchronize with deployed resources on downstream clusters.
Resolution
Addressing the identified time synchronization discrepancies between the upstream and downstream clusters by implementing NTP or chrony on all nodes resolved the Git Repo synchronization failures. Subsequently, the Git Repo status returned to "Ready."
Cause
Fleet Git Repositories in a Continuous Delivery setup are experiencing a recurring issue where they transition to a "Git Updating" state and fail to synchronize automatically. Manual "Force Update" resolves the issue temporarily, indicating a breakdown in auto-synchronization. Notably, the fleet-controller
and fleet-agent
logs do not show any explicit error messages.
Observed Error in Rancher UI (After enabling Fleet Debug Log):
Job Failed. failed: 1/1
time="2025-04-30T09:58:25.679769+0000" level=info msg="Using in-cluster namespace" namespace=fleet-default
time="2025-04-30T09:58:25.680059+0000" level=info msg="Using in-cluster configuration"
time="2025-04-30T09:58:25.681524+0000" level=debug msg="Request Body" body="{\"kind\":\"DeleteOptions\",\"apiVersion\":\"fleet.cattle.io/v1alpha1\"}"
time="2025-04-30T09:58:25.681699+0000" level=debug msg="curl -v -XDELETE -H \"Accept: application/json,*/*\" -H \"Content-Type: application/json\" -H \"Authorization: Bearer <masked>\" -H \"User-Agent: fleet/v0.0.0 (linux/amd64) kubernetes/$Format\" https://10.43.0.1:443/api/v1/namespaces/fleet-default/secrets/ocx-xx-xxxx-001-xx-xxx-t-001-xxxxxxxx-web-ui"
time="2025-04-30T09:58:25.682309+0000" level=debug msg="HTTP Trace: Dial to tcp: 10.43.0.1:443 succeed"
time="2025-04-30T09:58:25.686097+0000" level=debug msg="DELETE https://10.43.0.1:443/api/v1/namespaces/fleet-default/secrets/ocx-xx-xxxx-001-xx-xxx-t-001-xxxxxxxx-web-ui 401 Unauthorized in 4 milliseconds"
time="2025-04-30T09:58:25.686143+0000" level=debug msg="HTTP Statistics: DNSLookup 0 ms Dial 0 ms TLSHandshake 2 ms ServerProcessing 1 ms Duration 4 ms"
time="2025-04-30T09:58:25.686149+0000" level=debug msg="Response Headers:"
time="2025-04-30T09:58:25.686156+0000" level=debug msg="Audit-Id: fe86d238-783-4232-a646-8bb165dbfa3"
time="2025-04-30T09:58:25.686161+0000" level=debug msg="Cache-Control: no-cache, private"
time="2025-04-30T09:58:25.686165+0000" level=debug msg="Content-Type: application/json"
time="2025-04-30T09:58:25.686173+0000" level=debug msg="Content-Length: 129"
time="2025-04-30T09:58:25.686182+0000" level=debug msg="Date: Wed, 30 Apr 2025 09:56:58 GMT"
time="2025-04-30T09:58:25.686226+0000" level=debug msg="Response Body" body="{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"Unauthorized\",\"reason\":\"Unauthorized\",\"code\":401}"
time="2025-04-30T09:58:25Z" level=fatal msg=Unauthorized
Observations:
-
Bundles, resources, and associated clusters are reported as healthy.
-
The debug log indicates a successful initial connection to the Kubernetes API server (10.43.0.1:443)
Error in Gitjob Pod (Triggered by Force Update):
{"level":"error","ts":"2025-03-27T07:22:02Z","logger":"gitops-status","msg":"Reconcile failed update to git repo status","controller":"GitRepoStatus","controllerGroup":"fleet.cattle.io","controllerKind":"GitRepo","GitRepo":{"name":"xxx-xx-test-001","namespace":"fleet-default"},"namespace":"fleet-default","name":"xxx-xx-test-001","reconcileID":"8e6da18a-e838-4be8-8855-692aec99eb89","generation":15,"commit":"2ce6239c21fa08a9d8746dd6c812f45xxxxxxxxx","conditions":[{"type":"Ready","status":"True","lastUpdateTime":"2025-03-27T06:26:03Z"},{"type":"GitPolling","status":"True","lastUpdateTime":"2025-03-25T09:35:15Z"},{"type":"Reconciling","status":"False","lastUpdateTime":"2025-02-21T20:44:49Z"},{"type":"Stalled","status":"True","lastUpdateTime":"2025-03-27T07:20:05Z","reason":"Stalled","message":"Job Failed. failed: 1/1time=\"2025-03-27T07:20:02Z\" level=fatal msg=Unauthorized\n"}]}
kube-apiserver Logs (During Force Update):
E0430 10:07:19.141522 1 authentication.go:73] "Unable to authenticate the request" err="[invalid bearer token, service account token is not valid yet]"
The " Unauthorized" error in Rancher UI for the Git Repo and the kube-apiserver logs, specifically the message "service account token is not valid yet", strongly suggests an issue with token validity, likely due to time synchronisation problems between the upstream Rancher cluster and the downstream clusters managed by Fleet.
Verification:
Checking the time across the upstream and downstream clusters confirmed the presence of time synchronisation discrepancies.
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.