Some totally n00b RAID5 questions

I am a mere programmer, and I have been unwillingly thrust into the world of maintaining actual hardware. You know, those big, ugly boxes that my beautiful code runs on.

Anyhoo, we have this new server which has a hardware RAID controller and eight of those really cool hot-swappable drive bays. Four of the bays are filled and those four disks are set up in a RAID5. This works perfectly.

Now, our data set is growing fast, and we will likely need more disks in the future.

  1. If I put new disks in the remaining four empty bays, can they be added to the RAID5 without destroying the existing data? I’m guessing not.

  2. Assuming #1 is not doable, can I make the second set of four disks its own RAID5? I am guessing this probably depends on the specific RAID controller in the box. I will look that up if necessary.

  3. This machine isn’t actually live yet, since we’re still developing the code that will run on it, so I have a small time window in which I can wipe it and rebuild if I want. Is there an alternate configuration that would allow for more convenient future expandability, while also providing a small level of redundancy?

I am sure a bigger RAID guru will come along but IME, a RAID size is fairly static if you want redundancy.

If your raid-5 is 4 500G drives you get 1 usable 500G partition with redundancy and read/write accelleration.

Additional Storage would be provided by adding additional drives or additional Arrays, or additional machines for you clustering datacenter types.

4 500G drives using RAID5 results in about 3500G usable storage space with about 1500G used for parity. Data and parity is striped across all drives.

Writes are slower, reads can be faster.

This depends on the RAID Controller. I loved the Compaq SMART Raid Array Controllers. You could replace any model with a new one, and it would work. So when a customer with the entry level device (no online RAID set expansion) wanted new drives adding, I would go in with an advanced Raid controller (battery backed up with internal memory - even if the power went out, the rebuild would resume on power-up), swap the controller, reboot the server, add the new drive, rebuild the array with the additional space, then put the old controller back. The Raid set info was stored on the Controller, on the drives and in the server BIOS. Two out of three had to match.
So check with the manufacturer. If the RAID controller does not allow online expansion, get one that does, backup the old data and restore on the new controller.

You certainly can, but you lose more space if you do this, and you may have data distribution issues with two separate volumes.

As I say - if the RAID controller does not allow online expansion, upgrade to one that does. Or convince the powers that be to get an external SAN that gives the flexibility to add more storage as the data set grows.

Si

To give you some numbers that you can use to discuss with TPTB, the size of a RAID5 is **(n-1) * s ** where **n ** is the number of drives, and s is the size of the smallest drive in the RAID.

I don’t know much about expandable RAID arrays, and it looks like someone already addressed that, but the other thing to consider is the number of drives in the RAID array. If the probability of a hardware failure is p(t) for a length of time t, then the more drives, the higher the probability of a failure: 1 - [1 - p(t)]^n (close enough, I think), or worse, multiple failures, meaning kaput data: (1 - [1 - p(t)]^n) * (1 - [1 - p(t)]^[n - 1]) (again, close enough for the point). Namely, I have seen simultaneous drive failures on a few occassions, and they’ve always been on RAID5s with many drives, like two or three times with 8+, and only once with either 4 or 6 (I don’t remember exactly), but to put it in perspective, I’m talking about thousands of servers over several years and many in poor conditions (eg salty damp air, poor cooling, shaking) with poor maintenance (ie, the people responsible are inadequately trained and it is a collateral duty). Of course, the importance of this point depends on how vital the function of the server is.

A single failure is easily fixed with minimal impact, but multiple failures could mean restoring from backups. That is, if it’s not absolutely vital to have 100% uptime (where +99.9% meets the uptime requirement) and the server is in optimal conditions and well maintained, then it’s probably a moot point.

I’ve not seen multiple failures on well monitored systems (except a stupid HP controller that failed the array pointing to two drives, but only one was actually dead - I just had to find the actually dead drive and all was well).
Given that there are eights slots available, the OP could suggest a 7-disk raid array with a hot spare - you get an auto-failover to a full raid system on the first failure, and can still survive the second failure. And a RAID system is NOT a replacement for a decent backup system.

Si

I cannot emphasise this enough. If you’re using 1 TB disks, you could have 6 or 7 TB of data, and backing that up is a non-trivial exercise.

Do not stint on your backups. The amount of money you’ll spend on proper backups will likely be significant, but it will be miniscule compared to the cost of not doing proper backups.

I’ve been doing storage systems for many, many years and multiple failures are not unheard of. The best way I know to make multiple failures in a raid-set is to take a power hit.

I had a bunch of 30-disk arrays that would routinely come up missing more than one drive per raidset after a power drop.

Drive arrays based on SATA drives are considered to be less reliable than the older SCSI ones but, in general, the SATA gives you gobs more storage. It’s getting to be a common setup with SATA arrays to have a n+2 raidset, dual parity disks, so that three drive failures are necessary before data is lost.

I’d do dual parity, if your hardware supports it, before I’d do spare disk arrangements. Especially if you only have one raidset in the entire array. The spare is already spinning, might has well use it for some protection.

As to the ability to grow a raidset on the fly by adding more spindles: it depends on your raid controller AND OS. It doesn’t make any benefit to you if you can grow the effective disk size but not grow the file structure that is encoded onto it.

To echo previous posters, all this is no substitute for backups. Per the Gartner group, IIRC, 80% of all data loss is due to fat-fingered humans, not hardware failure. It doesn’t do any good to protect the data from disk loss but not from a “rm -rf *” (or del .).

Our controller supports double parity but unfortunately we can’t afford to sacrifice the space at this point. (We do have a backup system on a separate NAS, though.) This is a small non-profit so they can’t buy new disks on a whim, which is why I want to plan ahead as much as possible.

Thanks for all the comments so far.

Dual-Parity is a bit of a new one on me - but I haven’t done big storage for a while. Seems smarter in the circumstances, but I’d be getting a bit antsy with my storage provider if multiple drives dropped off after a power failure. The SCSI drives I’m used to were pretty good in that regard. But then again, 30 disk arrays were not really on the radar for our clients - maybe in a SAN box, but only just.

Si

With large arrays, a nifty technique is to RAID both across the RAID box and down through the rack: each element raided across is one element down through the rack. That way you’re protected against both individual drive failure and failure of a whole RAID box.



R1=ABCDEFGHS
R2=ABCDEFGHS
R3=ABCDEFGHS


where Rn is the RAID box, S is the spare. You RAID all the As, all the Bs, etc, then you RAID the RAID of A+B+C+D+etc.

Quite wasteful, of course.