How Do I Troubleshoot an Unhealthy Backend Server?

Symptom

If a client cannot access a backend server through a load balancer, the backend server is declared unhealthy. You can view the health check results for a backend server on the ELB console.

Background

To check the health of backend servers, dedicated load balancers use the IP addresses from the VPC where they work to send heartbeat requests to backend servers, while shared load balancers use IP addresses in 100.125.0.0/16.

Dedicated load balancers: To ensure that health checks can be performed normally, ensure that traffic is allowed from the VPC where the load balancer is working to the backend servers.

Shared load balancers: To ensure that health checks can be performed normally, ensure that traffic is allowed from 100.125.0.0/16 to the backend servers.

  • Shared load balancers: If Obtain Client IP Address is enabled for a TCP or UDP listener, there is no need to configure security group rules and firewall rules to allow traffic from 100.125.0.0/16 and client IP addresses to backend servers.

If a backend server is considered unhealthy, ELB will not route traffic to it until it is declared healthy again.

If you change the weight of a healthy backend server to 0, the health check result of this server becomes Unhealthy.

  • When a backend server is detected as unhealthy, the load balancer will stop routing requests to this server.
  • If health checks are disabled, the load balancer will consider the backend server healthy by default and still route requests to it.
  • If Obtain Client IP Address is enabled for TCP and UDP listeners of both dedicated and shared load balancers, client IP addresses instead of IP addresses in 100.125.0.0/16 are used to communicate with the backend server.
  • ELB uses IP addresses in 100.125.0.0/16 to perform health checks and route requests to backend servers.
  • Traffic will not be routed to a backend server with a weight of 0, so the health check result for this backend server is not relevant.

Troubleshooting

Possible causes are described here in order of their probability.

Check these causes one by one until the cause of the fault is determined.

If you need to change the health check configuration, it takes a while for the changes to be applied. The required time depends on the health check interval and timeout duration. You can find the health check results in the backend server list of the load balancer.

Figure 1 Troubleshooting process
Table 1 Troubleshooting process

Possible Cause

Solution

Backend server group

Checking Whether the Backend Server Group Is Associated with a Listener

EIP or private IP address

Checking Whether an EIP or a Private IP Address Is Bound to the Load Balancer

Health check configuration

Checking the Health Check Configuration

Security group rules

Checking Security Group Rules

Network ACL rules

Checking Firewall Rules

Backend server listening configuration

Checking the Backend Server

Backend server firewall configuration

Checking the Firewall on the Backend Server

Backend server route configuration

Checking the Backend Server Route

Backend server load

Checking the Backend Server Load

Backend server host.deny file

Checking the host.deny File

Checking Whether the Backend Server Group Is Associated with a Listener

Check whether the backend server group that the unhealthy backend server belongs to is associated with a listener.

Checking Whether an EIP or a Private IP Address Is Bound to the Load Balancer

  • Check this only when you add a TCP or UDP listener to the load balancer.
  • If you add an HTTP or HTTPS listener to the load balancer, health checks will not be affected no matter whether an EIP or private IP address is bound to the load balancer.

If you add a TCP or UDP listener to the load balancer, check whether the load balancer has an EIP or private IP address bound.

If the load balancer has no EIP or private IP address bound, bind one.

When you create a load balancer for the first time, if no EIP or private IP address is bound to the load balancer, the health check result of backend servers associated with a TCP or UDP listener is Unhealthy. After you bind an EIP or private IP address to the load balancer, the health check result becomes Healthy. If you unbind the EIP or private IP address from the load balancer, the health check result is still Healthy.

Checking the Health Check Configuration

For dedicated and shared load balancers, click the name of the load balancer to view its details. Click Backend Server Groups and then click the name of the server group. On the Basic Information page, to the right of Health Check, click Configure. Check the following parameters:

  • Protocol: The protocol used for health checks.
  • Port The port must be the one used on the backend server, and it cannot be changed. Check whether the health check port is in the listening state on the backend server. If the health check port is not in the listening state on the backend server, the backend server will be identified as unhealthy.
  • Check Path If HTTP is used for health checks, you must check this parameter. A simple static HTML file is recommended.
  • If the health check protocol is HTTP, the port and the path are used for health checks.
  • If the health check protocol is TCP, only the port is used for health checks.
  • If health check protocol is HTTP and the health check port is normal, change the path or change the health check protocol to TCP.
  • Enter an absolute path.

    For example:

    If the URL is http://www.example.com or http://192.168.63.187:9096, the health check path is /.

    If the URL is http://www.example.com/chat/try/, the health check path is /chat/try/.

    If the URL is http://192.168.63.187:9096/chat/index.html, the health check path is /chat/index.html.

Checking Security Group Rules

  • Access to the backend server from IP addresses in 100.125.0.0/16 must be allowed. This is because the load balancer communicates with backend servers using IP addresses from 100.125.0.0/16. After traffic is routed to backend servers, source IP addresses are converted to IP addresses from 100.125.0.0/16. In addition, the load balancer uses these IP addresses to send heartbeat requests to backend servers to check their health.
  • If you are not sure about the security group rules, change the protocol and port range to All for testing.
  • For UDP listeners, see How Does ELB Perform UDP Health Checks? What Are the Precautions for UDP Health Checks?

Checking Firewall Rules

Checking the Backend Server

If the backend server runs a Windows OS, use a browser to access https://{Backend server IP address}:{Health check port}. If a 2xx or 3xx code is returned, the backend server is running normally.

Checking the Firewall on the Backend Server

If the firewall or other security software is enabled in the backend server, the software may block the IP addresses in the VPC CIDR block or 100.125.0.0/16.

For dedicated load balancers, configure inbound firewall rules to allow traffic from the VPC to which the load balancers work to backend servers.

For shared load balancers, configure inbound firewall rules to allow traffic from 100.125.0.0/16 to backend servers.

Checking the Backend Server Route

Check whether the default route configured for the primary NIC has been manually modified. If the default route is changed, health check packets may fail to reach the backend server.

Run the following command on the backend server to check whether the default route points to the gateway (For Layer 3 communications, the default route must be configured to point to the gateway of the VPC subnet where the backend server resides):
ip route

Alternatively, run the following command:

route -n

Figure 10 shows the command output when the backend server route is normal.

Figure 10 Example default route pointing to the gateway
Figure 11 Example default route not pointing to the gateway

If the command output does not contain the first route, or the route does not point to the gateway, configure or modify the default route to point to the gateway.

Checking the Backend Server Load

View the vCPU usage, memory usage, network connections of the backend server on the Cloud Eye console to check whether the backend server is overloaded.

If the load is high, connections or requests for health checks may time out.

Checking the host.deny File

Verify that IP addresses in from VPC where the load balancers work and 100.125.0.0/16 are not written to the /etc/hosts.deny file on the backend server.

For dedicated load balancers, verify that the IP addresses from the VPC where the load balancers work are not written into the file.

For shared load balancers, verify that IP addresses from 100.125.0.0/16 are not written into the file.