Troubleshooting high ingress request times
This document (000020014) is provided subject to the disclaimer at the end of this document.
Environment
- An RKE or RKE2 cluster
- Use of the Nginx Ingress Controller
Situation
This article aims to provide steps to gather and analyse data to help troubleshoot ingress performance issues.
Resolution
Review request times
To narrow down on which requests are taking the longest, analyzing the ingress-nginx logs is very helpful.
Retrieve requests with a high (>2s) upstream_response_time
. This log field represents the time taken for the response from the upstream target - the pod endpoint in the service.
kubectl logs -n ingress-nginx -l app=ingress-nginx -f --tail=2000 | awk '/- -/ && $(NF-2)>2.0'
The same can be done for request_time
, this represents the time taken to complete the entire request, including the above upstream_response_time
.
kubectl logs -n ingress-nginx -l app=ingress-nginx -f --tail=2000 | awk '/- -/ && $(NF-7)>2.0'
Please adjust the time to suit, where >2.0
will filter for any times greater than 2.0 seconds.
Comparing the diference in timings between request_time
and upstream_response_time
can help to understand the issue further:
- Locate any potential upstream targets (pods), or nodes these may be running on, that are frequently associated with a higher
upstream_response_time
- If all upstream targets in a particular ingress/service are experiencing higher response times:
- What dependencies does the application have? For example, external APIs, databases, other services, etc - Investigate the application logs - Simulate the same requests directly to pods to bypass ingress-nginx, are they also slow?
- If the
upstream_response_time
is much lower thanrequest_time
, the time is being spent elsewhere, check any tuning, performance or resource issues on the nodes
Note: The
request_time
metric is also used to create the ingress controller graphs when Cluster Monitoring is enabled.
Review request details
Along with the output in the previous step, it is also useful to analyse the request details, such as the request itself, source/destination IP address, response code, user agent, and the unique name for the ingress for common patterns.
You may need to review these with the related application teams. For example, a request to retrieve a large amount of data, or perform a complex query may genuinely take a long time, these can potentially be ignored.
Some requests may be opening a websocket, and in the scenario that the service scales up/down regularly, a small number of upstream targets could have a long-running connection creating an unfair distribution to occur on these targets.
It's also worthwhile to consider the time when the issue occurs, the number of pods in the service, performance metrics, and requests/limits in place. For example, do the requests occur during a peak load time? Is HPA configured to scale the deployment? Is monitoring data available to identify trends and correlate with the logs?
Check ingress-nginx logs
With the focus previously on requests themselves, it is also useful to exclude the access logs and ensure there are no fundamental issues with ingress-nginx.
The following command should exclude all access.log output, retrieving output from the ingress controller and the nginx error.log only.
kubectl logs -n ingress-nginx -l app=ingress-nginx -f --tail=100 | awk '!/- -/'
Please adjust the --tail
flag as needed, this example retrieves the last 100 lines from each ingress-nginx pod.
Real-time view of all requests
Another option to get a broader overview is using a tool like goaccess. After installing the package, the below can be used to feed ingress-nginx logs to goaccess to get a real-time view of the logs.
kubectl logs -f -n ingress-nginx -l app=ingress-nginx --tail=2000 | goaccess --log-format="%h - - [%d:%t] \"%m %r %H\" %s %b \"%R\" \"%u\" %^ %T [%v]" --time-format '%H:%M:%S %z' --date-format "%d/%b/%Y"
Please adjust the history of logs with the --tail
flag.
Measure requests to ingress-nginx
If you have isolated all areas so far, it might be worthwhile to focus on the Load Balancer or network devices that provide client access to ingress-nginx.
The following articles contain curl commands to perform SNI-compliant requests and measure statistics, these requests could also be compared from the ingress-nignx logs (as above) to understand what portion of the time was spend with ingress-nginx handling the request.
You may also be able to obtain metrics from your Load Balancer or infrastructure to troubleshoot this further.
Additional Information
- How to troubleshoot HTTP request performance with curl statistics
- How to troubleshoot SNI enabled endpoints with curl and openssl
Disclaimer
This Support Knowledgebase provides a valuable tool for SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.