Load Balancer

I’m looking for some specifics regarding the operation of a load balancer.

I have a large amount of data that I need to distribute in a short amount of time to a large number of clients. I was thinking of setting up 5 servers with a load balancer in front of it to make sure the load is evenly distributed. Each client will be pulling the data down from the servers using rsync. (there will be no web traffic at all)

First, I understand that a load balancer typically handles web traffic. Can it also handle rsync traffic reasonably well? I understand some load balancers deal with TCP traffic in general, but I’d like to make sure it won’t be a problem. I might also set it up to use rsync over SSH. Might that cause a problem with the load balancer?

Second, how exactly does the load balancer distribute the traffic? Does all of the traffic go directly through the balancer? Would I be introducing a single point of failure to my system? Or does the load balancer simply hand-off the packet to one of the 5 servers directly, similar to round-robin DNS? Some load balancers can cache static data, which sounds attractive to me. But, I’d rather not have the whole system go down if the load balancer crashes.

Any input would be appreciated.

Thanks!

Thought I’d share my recent discoveries regarding hardware based load balancers, in the hopes of helping out the throngs of people certain to search for help on this topic in the future.

First, it would appear that rsync over SSH could indeed be balanced by a load balancer.

However, it also seems that all traffic from all servers would go THROUGH the load balancer. So, not only is it introducing a single point of failure, but it also puts a cap on the total throughput of the system. For example, if I have 5 servers each with 100Mbps NIC, then I would be able to serve 500Mbps of data simultaneously. But, with a 100Mbps uplink load balancer in the mix, the throughput is artificially limited. UNLESS I get a high-end load balancer with a gigabit uplink. But that is cost prohibitive for the project I’m currently working on.

So, I’m currently investigating a weighted load balancing DNS solution.

Any other thoughts on this would be appreciated…