How to test Alertmanager in rancher-monitoring

Article Number: 000022097

Environment

A Kubernetes cluster managed by Rancher v2.6+ with rancher-monitoring installed

Procedure

This guide demonstrates how to test Alertmanager and PrometheusRule configuration, to validate that alerts are sent successfully by Alertmanager.

With this objective in mind, and for this test to be self-contained, a webhook receiver is configured in Alertmanager. A webhook-receiver pod is deployed to receive these webhook alert requests and print them to stdout, such that they are visible in the Pod logs for verification. All of these resources are created in the cattle-monitoring-system.

Navigate to a Rancher-managed cluster with rancher-monitoring installed.
Apply the following YAMLs:

ConfigMap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: webhook-receiver-configmap-script
  namespace: cattle-monitoring-system
data:
  receiver.sh: |
    #!/bin/bash
    TIMEOUT_SEC=1

    echo "Starting nc listener with $TIMEOUT_SEC second timeout..."

    while true; do
      REQUEST_FILE="/tmp/request.log"

      echo "Waiting for connection..."

      # Reads the HTTP POST request and prints it to a file.
      (
        printf "HTTP/1.1 200 OK\r\nConnection: close\r\nContent-Type: text/plain\r\n\r\nRequest successfully logged.\n"
      ) | nc -l -p 8080 -w $TIMEOUT_SEC > $REQUEST_FILE


      # Prints to stdout
      echo -e "\n--- RECEIVED FULL REQUEST ---\n"
      cat $REQUEST_FILE
      echo -e "\n--- END OF REQUEST ---\n"

      # Erases the temp file to end the loop
      rm -f $REQUEST_FILE
      sleep 0.1
    done

2. Pod:

apiVersion: v1
kind: Pod
metadata:
  name: webhook-receiver
  namespace: cattle-monitoring-system
  labels:
    app: webhook-receiver
spec:
  containers:
  - name: receiver-container
    image: rancherlabs/swiss-army-knife:latest
    command: ["/bin/bash", "/script/receiver.sh"]
    ports:
    - containerPort: 8080
    volumeMounts:
    - name: receiver-script-volume
      mountPath: /script
  volumes:
  - name: receiver-script-volume
    configMap:
      name: webhook-receiver-configmap-script
      defaultMode: 0744

3. Service:

apiVersion: v1
kind: Service
metadata:
  name: webhook-receiver-service
  namespace: cattle-monitoring-system
spec:
  selector:
    app: webhook-receiver
  ports:
    - protocol: TCP
      port: 80
      targetPort: 8080
  type: ClusterIP

3. Ensure that the pod is up and tail the log, you should see a couple of lines stating that the netcat listener is ready and waiting for a connection. The Alertmanager alert configured below will be visible in these logs. 4. Apply the following AlertmanagerConfig to configure Alertmanager to send any alerts with the label "severity=critical" to the webhook-receiver pod (the Alertmanager configuration documentation can be found here). Note that the URL used is that of the service created above:

apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
  name: webhook-receiver-am-config
  namespace: cattle-monitoring-system
spec:
  receivers:
    - name: webhook-receiver-pod
      webhookConfigs:
        - url: http://webhook-receiver-service/
          sendResolved: true
  route:
    receiver: webhook-receiver-pod
    routes:
      - matchers:
          - name: severity
            value: critical
        receiver: webhook-receiver-pod
        continue: false

5. Create a PrometheusRule with an alert expression. This example uses vector(1) as the expression, such that its value will be always "1" and the alert will be trigged continuously:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: test-rule
  namespace: cattle-monitoring-system
spec:
  groups:
    - name: test-rule
      rules:
        - alert: test-alert
          expr: vector(1)
          for: 0s
          labels:
            namespace: cattle-monitoring-system
            severity: critical

6. Wait for the alert to appear in the Alertmanager Alerts UI. 7. Check the log of the webhook-receiver pod and observe that the test-rule alert is received, similar to the following:

Starting nc listener with 1 second timeout...
Waiting for connection...

--- RECEIVED FULL REQUEST ---

POST / HTTP/1.1
Host: webhook-receiver-service
User-Agent: Alertmanager/0.28.1
Content-Length: 1214
Content-Type: application/json

{"receiver":"cattle-monitoring-system/webhook-receiver-am-config/webhook-receiver-pod","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"test-alert","namespace":"cattle-monitoring-system","prometheus":"cattle-monitoring-system/rancher-monitoring-prometheus","severity":"critical"},"annotations":{},"startsAt":"2025-10-14T09:04:11.437Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"https://142.93.230.60.nip.io/k8s/clusters/c-m-d2xdbdjr/api/v1/namespaces/cattle-monitoring-system/services/http:rancher-monitoring-prometheus:9090/proxy/graph?g0.expr=vector%281%29\u0026g0.tab=1","fingerprint":"163a7e819a18ef74"}],"groupLabels":{"namespace":"cattle-monitoring-system"},"commonLabels":{"alertname":"test-alert","namespace":"cattle-monitoring-system","prometheus":"cattle-monitoring-system/rancher-monitoring-prometheus","severity":"critical"},"commonAnnotations":{},"externalURL":"https://142.93.230.60.nip.io/k8s/clusters/c-m-d2xdbdjr/api/v1/namespaces/cattle-monitoring-system/services/http:rancher-monitoring-alertmanager:9093/proxy","version":"4","groupKey":"{}/{namespace=\"cattle-monitoring-system\"}/{severity=\"critical\"}:{namespace=\"cattle-monitoring-system\"}","truncatedAlerts":0}

--- END OF REQUEST ---

Following this method, it is possible to test Alertmanager and PrometheusRule configurations without needing a third party app or configuring an external receiver. This is useful to see if the alerts arrive as expected or if they are not being sent. If you are struggling to correctly apply an AlertmanagerConfig, you can check the rancher-monitoring-operator pod logs, in order to check that the syntax is correct and was accepted; the Alertmanager pod logs; as well as the value of the PrometheusRule expression, using the Prometheus Query UI, to confirm whether the alert should currently trigger.