Testing / troubleshooting the Zeus Load Balancer install
How can I check that I've set up the Load Balancer correctly?
Setting up the Load Balancer is fairly involved. As a result, there are several aspects of the install that can be tested. Check through the following areas in turn, to ensure that your setup is correctly installed.
Configuration files
A quick way to check that you have copied all the commkeys and loadbalancer config files across to all the machines in the cluster is to go to the clustering screen in the Admin Server, and click on the 'Summary' button. This brings up a page summarising any problems found with the cluster setup.
This page performs several tests on you loadbalancer configuration. It ensures that the Admin Server can communicate with all the machines (frontends and backends) in the cluster - you'll see error messages if any machine has the wrong commkey, for instance.
It also checks that the configuration of each machine is consistent with the configuration specified by the Admin Server. For instance, if you add a new backend machine in the Admin Server, but forget to copy across the updated config files to each frontend, then this will be detected.
If you are not using Zeus webservers as your backend machines, the summary page will be unable to perform many of its tests. It can still check that your load balancer(s) are configured correctly, however. Also, it is important to read the instructions for using non-Zeus servers carefully.
Checking webpage delivery
For this, you'll need to set up a running virtual server, so that there are webpages to be accessed on your cluster. From the Admin Server, click on the 'New server' button and create a simple virtual server. Use the following settings:
- Server Name: test (can be anything)
- Server address: enter the correct hostname of the publicly visible frontend (e.g. www.mysite.com)
- Document Root: / (or any existing readable directory)
Click the create server button, then click on the red traffic light to start the server running. Errors in your load balancer install will cause these stages to fail, as the Admin Server tries to send the configuration of this new site to all the frontends and backends. If the summary page did not find any problems with your install, then this stage should work fine.
If you are not using the Zeus webserver for your backends, at this point you must configure your own webservers, so that they are ready to deliver webpages. The virtual server that you have just configured will by default run on port 80. By default, the Zeus Load Balancer will contact the backend server on port 10080, i.e. an offset of 10000. So, either configure your backend webservers to listen on port 10080, or change the Load Balancer configuration
Now, start a web browser and try to access your newly-created site. You should be able to access a directory listing of the directory that you specified. Common errors:
Connection refused
This is most likely caused by errors in your DNS setup. Check that the server address given really does point to your frontend box(es).
Site down!
This error page is generated by the Load Balancer when it cannot communicate with any backend machines. The balancer is running OK but your backend webserver(s) are misconfigured (are they running?)
If you are using non-Zeus backends, then the most likely problem is that the Load Balancer is attempting to contact the backend webservers on a port that the backends are not listening on. Confirm that your backend webservers are running by contacting them directly. Check that the balancer is contacting the correct port (see above.)
If you can view the webpages, then your site is running correctly!
Checking automatic fail-over: backends
For this, you'll need at least two backend webservers configured, else there won't be any machines for the Load Balancer to fall back on...
The simple way to test this is to just pull out the network cable on one of the backend machines. Make sure you pull out the correct one! If this isn't viable, then shut down the webserver on the backend machine manually.
Go to the clustering page on the Admin Server. The machine that you unplugged should be shown in red - the Admin Server has tried to contact this machine and failed.
Try to access the website that you created previously. Reload the page several times, or browse several files - this is to make sure at least one request will be sent to the unplugged backend. You should never get a 'connection refused' message.
Next, click on a frontend machine in the network display. This takes you to the 'Traffic Distribution' page, which shows how this particular balancer will share out traffic at this moment in time.
Each backend machine is listed, together with its response time and the percentage load. (Click on the manual icon on the page for more information.) The unplugged machine should have zero percentage load and be marked red.
Note: The Load Balancer only contacts backend machines when it receives page requests. Consequently, if a backend machine fails, this display will not mark the machine as dead and re-allocate its load until a page request has been sent to the dead machine. Over time, dead machines will be brought back into use. So, if all the backends shown are 'alive', try fetching a few more pages from the website.
There is another complication here; your web browser will have picked one of the two frontend boxes to speak to. The other frontend will not have received any requests. So, if you see no server marked in red, go back to the network display and click on the other frontend.
Try pulling out and plugging in different backends. If at least one backend machine is plugged in, then the webpage will be reachable.
Checking automatic fail-over: frontends
The two frontends, via the 'flipper' program, send heartbeats to each other over the network. If one machine loses network connectivity, the other takes over its IP address.
So, to test the failover, you must remove one of the machines from the network. The simplest way to do this would be to pull out the network cable, just as before with the backends. After removing the cable, examine the flipper log file ($ZEUSHOME/balancer/log/flipper). If the flipper is working correctly, both machines should log that they lost contact with each other, and therefore are taking over the public IP address of the other machine.
From a separate machine, now try pinging both public IP addresses of the frontends. Both should still appear to be contactable. Try viewing the webpage again - it should still be accessible. View the cluster page on the Admin Server. The disconnected frontend Load Balancer will be marked in red.
If all these tests work as expected, you have a fully working load-balanced cluster. On the other hand, if any fail, please check the log files for the application ( in $ZEUSHOME/balancer/log) for information about what may have gone wrong, and review the install instructions.