Typical compression ratios for backup software?

I have a small computer network I manage, and we had a problem this week with a net worm that deposited a Trojan Horse in one of the boxes. This did a real good job of rubbing my nose in the fact that we’ve been quite careless about doing stuff like backups.

I’m going to propose a lot of stuff to the PTB that needs to change, and we’ll need some bucks to implement. What I have in mind is a stand-alone disk server box and a set of hard drives to slide in and out as backup media. What I don’t have a handle on is the needed capacity of those drives (i.e. how many I need to implement my proposed schedule). What I do know is that I need to be able to store 280 GB of data for image backups of 18 systems, and there are reasonably priced 500 GB drives.

What I don’t know is what a typical compression ratio is for backup software. Anyone know this, or can point me in the general direction of some information on the subject?

To implement my propsed schedule with a 1:1 compression ratio will require 6 500GB drives, FWIW.

It totallly depends on the type of data you’re backing up.

If you have 500GB worth of Word, Excel, TXT and HTML documents without images, then you’d be talking about 80-95% compression. Most digital media (like movies, jpeg images and mp3s) are already compressed, so for those files, the ratio is 0%… or even worse, +5% as the compression scheme adds some overheard. For some file types - like complex BMP images - the ratio is somewhere in between, say 50%

In real life, most baclups are a mixture of the two, so it’s nigh impossible to guess how much compression you’ll get in your particular case until you actually compress the whole lot.

Just as a data point, Windows binaries (.exe, .dll, .ocx, etc.) will generally compress to about 50% if they’re not already compressed.

I happen to work with backup software and storage as a living. I echo what MyNameIsEarl mentioned - basically that it depends on the file being compressed. Certain files will compress more than others. However, from what I’ve seen in most enterprises is more hardware compression (done on the tape drives) rather than software, since software compression takes cpu cycles on the host machine and hardware compression tends to be better anyway.

So basic message - compression in software depends more on what is being compressed rather than the software doing the compression. Just an FYI - typically tape drives advertise in the ballpark of 2:1 or 2.5:1 compression, so I’d expect software compression to be somewhere lower then that.

I’m not sure you’re attacking the problem the right way. It’s your data that’s important, not the image of the machines. You might consider taking an image of one of the machines with something like Ghost and backing up the data only.

In my experience, tape drives actually manage about 1.5x compression with real-world data.

Don’t forget to consider the usability. Use the KISS principle. You want to design the system such that a green trainee can use it. You may manage the system but you have lots else to do and who do you think’s going to do it when you’re on holiday? Also, how much downtime and lost data can the organisation stand? Can you stand a weekly backup with daily incrementals or does it need to be a full backup every night? If the latter, can your network bandwidth cope with it? Don’t forget about off-site storage. Your backups are no good if they’ve been destroyed in the fire that gutted the site. When I looked after schools, the school secretary took the tape home with her. Simple and effective.

If you’ve got 280 GB of data you’re looking at an LTO unit. See http://www.ultriumlto.com/

Hope this helps.

Compression ratios should depend only on the algorithms being used and their input. Hardware or software should have nothing to do with it.