Linux software RAID any good?

I just ordered three new hard drives for my office Linux box. I’ll have about one drive’s worth of valuable data that needs redundant storage, and one drive’s worth of non-valuable (easily replaced) data. I figure I have two options:

1: Set up the three drives as software RAID level 5 and dump everything in there, or

2: Mount them as regular drives and use rsync to mirror the contents of drive 1 onto drive 3 every night.

Which option (or any others) would you recommend? The PC is used for scientific data analysis, which lots of random access to large (up to 200MB) data files.

I have a raid 5 software linux software raid (3x50GB scsi uw2 disks) running for years now, without a single problem - current uptime is 330 days.

But if you need mirroring, why not use raid mirroring and skip the rsync? just mirror 2 drives and keep the last disk seperated.

I spent some time trying to configure a acceptable-performance software RAID-5 on one of my NetBSD servers with 8x 18GB 10K RPM U160 SCSI drives.

NetBSD’s RaidFrame is very flexible. The code’s orgin was a RAID simulator, and as a result has many, many knobs: raid level, queuing discipline, queue length, sectors per stripe unit, stripe units per parity unit, stripe units per reconstruction unit, etc…

Since the RAID volume was intended to be used as /home, I needed both read and write performance. I spent a week of evenings reconstructing the RAID with different parameters, running disk benchmarks, etc. Read performance ranged from good to great, but write performance generally sucked. My ultimate conclusion that software RAID-5 is too I/O and compute intensive to use.

I replaced my SCSI controller with a Mylex hardware RAID card, set up a RAID-5 volume, and have been pleased ever since. Especially some 3 months later when one of my drives failed and everything continued working until I got the replacement. Writes are still not as fast as I’d like, but that shows up in benchmarks more than actual use.

I know that Linux != NetBSD, but if the bottlenecks are in the CPU and I/O bandwidth, I would expect similar (disapointing) results.

Thanks for the useful replies!

Good point, that’s an option. I wasn’t sure about the maturity of the RAID software and the difficulty of reconstructing the setup if one drive fails, but maybe I’m just worrying too much.

Interesting… Did you find that the write speed was slower than using the drives as non-RAID drives?

Write speed isn’t too important to me so I’ll still consider software RAID. I know hardware RAID is the best option but I can’t afford one right now - I used up my remaining budget for the drives.

If write speed isn’t critical, a software based RAID-5 should be fine. In my experience, the software based RAID-5’s are very slow when it comes to writes because of all the XOR operations it has to do to generate parity. The good news is that RAID-5 should help for the workload you described provided there’s a very high read to write ratio.

Big Time.

In RAID-5, “small” writes require a read-modify-write cycle. Any disk sectors that make up the stripe unit being written to or the parity stripe unit that aren’t already cached in memory must be read; then the old data is subtracted from the parity; the new data is added to the parity unit; and finally the stripe unit and the updated parity stripe unit must be written to disk. This is just a huge amount of I/O and CPU processing that needs to be done for what might be a few bytes of data.

“large” writes (bigger than the parity unit) can avoid a lot of the I/O overhead because the the RAID knows that all the existing data will be overwritten. In this case, the new parity stripe unit can be computed from the data as it’s written. I don’t know whether the typical software RAID code is sophisticated enough to make this optimization.

As blinx suggested, since you’re already thinking of copying the contents of one drive to another, a RAID-1 mirror may be the best bet. You’ll have redundancy; good read and write performance; and unlike rsync, the second drive will always be in sync.

One tip. Once you get things set up, deliberately fail a drive (my drives are in drive trays, all I needed to do was to turn a key to disconnect it) to get some hands-on experience on how to recover. When I had a real disk failure a few months later, I was happy that I had had that practice.

Thanks for the help. I’ll consider RAID-1.

I just noticed that Promise SX4000 RAID card has hardware XOR engine and Linux support for about $130 - I might be able to afford that. The best I can tell from on-line reviews, it’s not a 100% hardware solution but still faster than a 100% software solution.

I have used a software Raid-1 with no problem. There are several good how-to’s for setting up Raid on Linux. which I followed and it worked as advertised. I prefer a Raid-1 set up as I like to have full redundancy so if one drive craps out, I still have a complete system which can run until I add another drive. I am using this on systems which need to be up 100% of the time and it has been very reliable. Performance is not critical for the application I am running, so I don’t know what gives the best performance. I think the main question you have to ask is are your tasks disk-bound, ie. slowest part of the operation of your tasks require disk access. In that case, hardware is definietely the way to go.

My 2 cents on the subject.

Doctor