RAID: how perfectly matched do the drives have to be?

Stoid · February 28, 2012, 12:40am

I have three perfectly matched Hitachi drives, model, manufacturer’s part number, month and year of production.

But I want at least 2-3 more drives in the array, and matching those three drives exactly is proving elusive, unless I want to spend way too much money.

If I get the same model and “close” part number, is that just terrible, or really no big deal?

As in, my three are:

Hitachi 500gb ****
Model: HDS725050KLA360
MPN: 0A32238
Nov 2005
Other similar drives are the same in except for MPNS, which are all over the map, really
**
0A31619 Dec 06

0A32904 Nov 06** **

0A32779 Jan 07** **

0A31004 Nov 05** **

0A32779 Aug 07** **

0A32780 Jan 07**
Etc. - if I stray from the exact part number, should I be more concerned with date of manufacture, or with how close the MPN is? (a 327 vs. 319, for instance?) Or is that degree of anal really unnecessary?

Musicat · February 28, 2012, 12:46am

I can’t answer your question, but have you considered a Drobo? Then you don’t need to worry about a match. You can even mix sizes.

(A Drobo is sort of a proprietary, super-RAID.)

drachillix · February 28, 2012, 1:28am

What kind of array are you trying to set up, RAID5, or JBOD (since you are using 3 drives)

RAID in general isn’t going to care about drives/firmware revisions mostly just what size the drives are. You are more likely to get more stability from matched sets, but not critical.

Stoid · February 28, 2012, 1:49am

RAID5- looking for speed and safety, and I would like to have 1.5-2 TB of workspace.

Stoid · February 28, 2012, 1:51am

Sounds great. And expensive. I’ve already invested in the setup I’m working on, but I’ll keep that in mind for down the road.

Digital_is_the_new_Analog · February 28, 2012, 2:11am

Back in the late 90s, I wrote SCSI drivers for RAID controllers. We used Seagate 4gb Baracuda drivers. (Yeah..things have grown since then). I could randomly choose from my drawer of 54 of them without worrying about them “matching” properly.

Later, we upgraded to using the 9gb ones. Same story.

If I had to mix manufacturers or models, I’d just try to get them close to the same size, and partition them to match the smallest of the lot.
Again..this was the late 90s. Technology has changed, but I got out of that business..Oddly, there isn’t a lot of call for deep RAID knowledge when designing a cell phone.

So…take it for what it’s worth.

-D/a

Musicat · February 28, 2012, 2:24am

Probably not as expensive as you might think. I got a Drobo FS (5 bay) model for $500 without drives. You can use from 1 to 5 SATA 3.5" drives, add them as you go, even hot swap. You can set the redundancy mode to protect against 1 drive failure or 2 simultaneous failures and as you add drives, it will resync to use all available capacity. Drives can be different sizes.

The unit I have now is five 3TB drives configured to allow for up to 2 failures, which reduces the data space from 15TB down to 8TB. This model interfaces thru an Ethernet network, but there are other schemes like USB, Firewire, etc.

Stoid · February 28, 2012, 3:28am

Umm, we have very different ideas of what is expensive, obviously. I bought two 5-bay e-sata RAID enclosures for $165 total. For both. And it hurt. The idea of spending $500 for a single driveless 5-bay enclosure gives me hives.

si_blakely · February 28, 2012, 8:12am

As already stated, you don’t need to match drives for RAID. Your raid set will be limited to an appropriate multiple of the size of the smallest drive in the set, and performance will be limited to the speed/transfer rate of the slowest drive, but that is generally not a major issue if all the disks are a reasonable speed.

Plenty of major manufacturers swap suppliers for hot-swap disks in the middle of support, and you just mix-and-match what is available.

So get the cheapest drives that match the specs of your current drives.

Si

Francis_Vaughan · February 28, 2012, 8:46am

Any need to worry about precise matching of drives disappeared when the drives gained significant cache, and controllers gained the ability to handle multiple outstanding requests (something that came late to the realm of personal computers.) Back when disk arrays were new, systems could be designed so that the disks rotated in lock-step and you were guaranteed that the heads all moved in unison. Even without this level of matching there was some gain to be had in guaranteeing that all the drives would complete an operation at about the same time. With internal bad block management this guarantee is harder to maintain anyway. Bottom line is that there is nothing to be gained with precise matching. Matching the rotation rate - and thus the dominant component of access latency will still help. Matching platter number and capacity also helps even out latencies, but that is about it. The problem you may face is that the newer drives are not used to their full potential because your older drives are the pacing item.

A RAID controller with a reasonable amount of local persistent cache will try to write stripes anyway, and the cache will even out performance .

All of this is only of concern of speed is the dominant driver. If it is data security then it matters very little.

Stoid · February 28, 2012, 6:30pm

Looking for both. I’ve become obsessed with doing everything I can to max my speed. I’m pretty happy with my new(to me) Mac Pro 8 core with 24g RAM, it’s already a nice big fat leap from the Spinning Ball of Death I’ve been dealing with for years. But my research about speed taught me that my drives are slowing me down, too. If and when I win the lottery I will create a RAID array made up of the biggest SSDs they make, in the meantime I have a 160gig SSD for booting and apps (which I haven’t installed yet because I got the wrong caddy… grr. Between my computer and my bike, the amount of time and money I blow getting stuff that doesn’t fit or work with my particular stuff…grrr…) and I’m going RAID5 for all my docs and storage.

But having lost a BRAND NEW 1.5TB drive out of the blue, beyond every tool I could find and even a new controller card, just dead dead dead and gone (WAH! Screw you, Seagate!) I’ve become more sensitive to protecting my data.

As I understand it, RAID 5 gives me both.

Thanks for the information, all, it’s evident that I’m in better shape than I thought. The salespeople at OWC had me convinced that I had to use identical drives or suffer the consequences. If size is really the main issue, I’m already golden because I’m up to my eyeballs in 500 gig drives already. And that makes me very happy because I’ve been nose-outta-joint over the fact that, according to OWC, if I ever wanted to add a drive or more to the array I’d have to start over. So I felt like I had to invest in as many drives as possible right now.

I am curious, though… if adding a drive means starting from scratch, is that what happens if a drive fails? If I lose one and I want as much space as I had before, am I going to be forced to back it all up, wipe it and set it up from scratch? Is that a function of the controller card (mine’s low-end, obviously)?

Stoid · February 28, 2012, 6:40pm

Meaning size, RPM and data rate? Anything I’m missing?

Caldazar · February 28, 2012, 8:13pm

Adding a drive to an existing, functional RAID5 array requires a rebuild of the array, unless your disk controller supports on-the-fly RAID5 expansion. Even if it does, you should still do a backup first for safety reasons.

Replacing a failed drive in a RAID5 set does not require a backup/restore operation. You physically remove the failed drive from the machine, replace the drive, and allow the RAID controller to rebuild the lost data on the new drive. That’s the point of RAID5; it stores additional parity information such that, if you lose one drive, it is able to use the data on the remaining drives to regenerate the lost data.

And the standard PSA: RAID is not a backup solution. Always perform regular backups of critical data, regardless of whether or not you are using RAID.

nesta · February 28, 2012, 8:22pm

The process for replacing a drive depends on the RAID controller and drive enclosure you use, but the simple answer is that you shouldn’t have to reinitialize the RAID if only one drive fails. That’s the whole point of RAID. All you should need to do is slap in a new drive of equal or greater size and the RAID controller can rebuild the data to the new drive without data loss.

If your controller supports it, and you have plenty of drives, you might want to configure one drive as a hot-spare. If one drive fails it will automatically start rebuilding to the hot-spare drive. If you aren’t closely watching for drive failures this can save you in case a second drive fails shortly after the first.

It should be stressed that RAID is not a substitution for proper backups. Although in theory it does make losing the data less likely it can still happen. With many cheap RAID implementations (grr Intel Matrix RAID) it almost seems to me that it causes more data loss than it prevents. If the controller freaks out and marks more than one disk offline, even if they are perfectly fine, it can result in the entire array being lost. With every extra drive in the array the chance of one of them failing goes up. For example, if you RAID0 (no redundancy) 3 drives it is then 3 times more likely that the array will be lost in a given time frame.

Quartz · February 28, 2012, 8:38pm

They’re 6 years old. Their reliability is suspect. Dump them and buy yourself a pair of 3TB drives and mirror them. It’ll be faster, quieter, and less costly to run.

Stoid · February 28, 2012, 9:15pm

But more costly to startup. Especially since I’ve already purchased these.

And after my 1.5tb loss I have no great affection for investing hundreds of dollars (and boatloads of data) in single drives that are perfectly capable of failing. The only drive that I have ever lost irretrievably, in the blink of an eye, was only a few months old. My old re-used drives just keep plugging along, and when I do have issues, they have been confined to minor sector failures. And if I do lose a 500gb irretrievably, its not as much data and it’s cheap to replace, free really considering what I’ve been told in this thread and how many I have. Investing $200 per 3tb drive becomes a costly matter if one dies.

pulykamell · February 28, 2012, 9:29pm

The general wisdom, from what I understand, is NOT to use drives from the same lot number. Apparently, drive failures seem to come in batches, so everything I’ve read about this topic suggests getting drives from different lot numbers to spread the risk.

And please make sure you are not using RAID5 as a substitute for properly backing up your data.

nesta · February 28, 2012, 9:52pm

Whether two new, large drives in RAID1 would be faster than older drives in RAID5 is debatable. Performance depends on how well the controller and associated drivers handle concurrent requests, and if the expected workloads can take advantage of this. In RAID1 you often basically have the speed of a single drive, where with RAID5 you sometimes have the combined read performance of all the drives in the array.

Newer drives generally have better performance due to higher densities. Hitachi claims the HDS725050KLA360 has a sustained read speed of 65 MB/s*, where for a current Western Digital 2 TB the claimed speed is 138 MB/s**. If you have a six drive RAID5, and a workload that takes advantage of it, you could theoretically get close to 400 MB/s.

The power consumption for those drives is probably going to average somewhere between 9 to 10 watts, so about a 40W difference between two drives and six. That’s 350 kWh, and at $0.15/kWh (prices may vary) is about $50 a year.

Hard drives are very expensive right now due to manufacturing problems. I recently checked the replacement cost of some 750 GB drives I bought two years ago, and they cost about double what I paid for them. Even if you only expect the older drives to last a year, it’s probably cheaper to use them for now and plan to replace them once drive prices come back down.

On a purely anecdotal note, I’ve been seeing a lot more drive failures in newer drives than I did three or four years ago. I think the higher densities and cheaper construction are reducing reliability. The old drives might outlast the new ones, depending on how roughly they have been previously used.

http://www.hitachigst.com/tech/techlib.nsf/techdocs/CE3F5756C827F35A86256F4F006B8AD4/$file/7K500v1.5.pdf
** http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701276.pdf

drachillix · February 28, 2012, 9:54pm

a new drive will usually have at least a 3 year warranty with the manufacturer making a failure fairly cheap to replace (shipping), by the time the OEM warranty runs out 3TB drives will be a dime a dozen on ebay.

drachillix · February 28, 2012, 10:00pm

Also, pricey or not, a drobo is an excellent investment if you have a need for redundancy. Drobo units can also do things like shift RAID levels on the fly. I am eyeing one of the 8 disk arrays for my offsite storage systems rather than the current raid system.

Topic		Replies	Views
Would You Trust A 2tb Hard Drive? In My Humble Opinion	55	5647	October 15, 2009
Recommend a RAID array for home In My Humble Opinion	28	3564	October 1, 2013
Building a computer help: hard drives In My Humble Opinion	18	1294	April 26, 2006
DO NOT BUY A DROBO - it does NOT protect your data! The BBQ Pit	44	11507	December 17, 2011
Software RAID on Linux Factual Questions	18	1967	March 23, 2016

RAID: how perfectly matched do the drives have to be?

Related topics