Building a Linux fileserver – hardware requirements?

(Mods, since minimum requirements have a factual base, I put this here. If this should be IMHO, please move and accept my apologies.)

Hi folks,

I probably should have started here. I’ve posted to a couple hardware forums and the Ubuntu forum, but got no replies. In short, I’m looking to build a Linux file server, but since I’ve no hardware experience for this application I don’t know the basic specs I need. In other words, a quad-core, OC’d, water-cooled, 8 GIG machine is overkill, while keeping it running on my PIII 550 test box is a bit underkill. For a two-user environment, what kind of power do I need? Also, because Linux can be temperamental with hardware, any specific advice in that direction will be much appreciated. Oh, I’m also very new to Linux, if that makes a difference.

Below are some exhaustive details about its function – perhaps they will help, perhaps it’s too much. Hope you can help!

Here’s how it will fit in to our home office:
Relevant hardware includes:
Mac OSX
WinXP box
Linksys WRT54G router (10/100)

Activity includes:
NAS file serving (1-3 MB Word documents, 50-100 MB graphics and Quark files) to both computers.
Testing Web page design and function (HTML/PHP/MySQL). This is for internal testing only – the box will never be seen outside the LAN.

That’s it. We basically just need a NAS, but I’d like to take advantage of the Web testing capabilities. No email serving, nothing else that I can think of.

On the Linux box, I’ll be running Ubuntu Server Edition (with LAAMP), SAMBA, etc.

I would like two drives in there running in RAID(1) mirroring.

Our two priorities are reliability and speed.

Reliability is paramount. One of the reasons for posting is because I understand Linux can be temperamental with some hardware (I also don’t know the hardware requirements of running a file server, so the general compatibility thread, while helpful, doesn’t provide complete guidance for me). We run an editorial and graphic design consultancy from our home office, so uptime and stability is of central importance. I tend to use Asus boards in my personal builds, but am willing to change vendors if there is a reason (i.e., noticeable difference in stability or better customer service).

Speed is important, but I have no idea how things compare for this use. I know I don’t need a wiz-bang processor, but don’t know at what point I’m sacrificing speed for cost savings. I want it to have enough processing power to run the server, and just a little bit of overhead to make me feel comfortable. Since my old PIII 550 in my learn-Linux platform does all of the above just fine, am I safe in assuming any currently available retail Intel processor will be more than enough? I’ve seen Celeron processors available at Newegg – will those be sufficient? Would a dual-core make any difference given the server’s limited use?

Since the router isn’t Gigabit capable, a Gigabit board is superfluous. Having it isn’t a problem, and may even be a good thing so future router replacement will increase speed. But unless the price point is very similar, it’s not necessary.

RAID (1) on the board is essential (I think), as from what I’ve read software-RAID is noticeably slower (feel free to disabuse me if that’s wrong). I don’t fully understand it yet, but a few sites discuss how some hardware RAID isn’t really true hardware raid. Again, if there is a significant difference in speed (remember we’ll only be dealing with 50-100 MB files) I’d opt for the faster solution, though don’t want to spend a bucketload on the difference.

In retrospect, I’m starting to wonder if RAID worth the hassle (e.g., performance hit, configuration, reading one drive separately from some controllers). I’m considering putting two drives in there, and having a backup scheduled every fifteen minutes. There’s only two of us here, so at most it will only need to copy one or two files every increment. (If you think so, feel free to tell this should be its own thread.)

I’ve always used Western Digital drives (have had great service experiences, which tends to make me loyal), but like with Asus I’ll switch (to Seagate?) if there is a significant difference.

I don’t need any high graphics capability, but I do have a working, just-pulled-to-upgrade video card (EVGA 7800 GTX PCI-e). If it will keep costs down (and Linux will work with it) I can use it. One benefit to the card is that it has an S-Video out, which means I can use my monitor’s PIP display – a convenience. I know S-Video is crapulent, but it will allow me to keep working on something and monitor progress on the Linux box with minimal input switching.

So, that’s it in a nutshell (I hope I didn’t go overboard with descriptions—I thought the more information the better). I’d of course like to keep the cost as minimal as possible, but don’t want to chintz out on this build in sacrifice of reliability or speed. Remember that this is just a two-person office, so hits to the server will be minimal – just us saving our work as we go along.

My budget is flexible, but given my assumption that I don’t need a lot of processing power, I don’t anticipate this reaching more than $3-500. I have an old Antec case, so a hundred bucks or so is saved (though I may get a new PSU to be safe).

Any thoughts will be much appreciated.

Thanks,

Rhythm

Honestly, for what you are doing, the PIII 550 probably is fast enough. Seriously. The processor isn’t processing the data. All it is doing is moving it around. Your main bottleneck is going to be the drives themselves (and the drive interface) and the ethernet network. Since you’ve got a 10/100 router, there’s not much you can do with the ethernet side of things either. If your router has the ability, hard code the link speed to 100 in the router and also in the linux box. Autodetecting the link speed slows things down a bit.

For reliability, you want a system that runs cool. You don’t want some whiz bang overclocked screamer. Those get good performance, but the excess heat makes them die an early death, and you don’t need that kind of processing performance anyway. Fast disks require more cooling or they will die an early death, too.

For the RAID, you are paying a lot of money to have a system that can handle a drive failure. The drives will be slower, even with a hardware RAID, though in your application I don’t think the speed difference will be all that significant. The disadvantage of a software backup is that if the drive goes poof, you have to rebuild the computer and then restore the backup. Whether or not you want a real RAID depends on how much that down time is worth to you compared to the cost of a RAID system.

I was running a perfectly adequate Samba/NFS/torrent box on a 500mhz, fanless, Via C3 mini-ITX board. I slapped a cheapo ATA-133 controller card in there with a bunch of big disks doing software RAID on Debian Etch.

The CPU rarely got above 3%. The only thing I ever thumped it with was stopping/starting torrent sessions for whatever reason.

Oh, and it booted off a CF card and ran headless. On 256MB of RAM. With no swap file.

Honestly, any off-the-shelf NAS you buy is going to have lower specs than that.

So yeah, I’m pretty sure your PIII will be plenty, though I’d throw a USB2 or Firewire card in the box in case you want to do external backups.

In terms of RAID, you probably won’t notice a significant performance difference between a consumer-grade card and software RAID - your bottleneck is going to be the network. Software RAID has the added benefit of being hardware independent - what do you do if the card craps out, and you can’t find a replacement? It’s one thing if you’re dropping 500 bones on an IBM ServeRAID adapter or something - that part is always going to be available somewhere in their distribution channel. It’s another thing entirely to buy a cheapo SATA card from some fly-by-night operation in Guangzhou.

On the other hand, your wariness of the whole RAID idea is well founded. RAID IS NOT A BACKUP. If you accidentally delete a file, it’s gone unless you have backups, no matter what kind of array you’re running.

I think your disk layouts should looks something like this:

sda1: /
sda2: /swap

sdb1 + sdc1: md0, your shared filesystem, software RAID1.

sdd1(a) and sdd1(b): Two external USB disks. Backups are performed weekly and moved off site on a rotating basis.

Your current box would be fine - and linux is generally far more tolerant of hardware than other OS’s.

One of the new Intel Atom mini-ITX boards would be really good.

I run a linux server (using SME server, an awesome distribution) with a 2GHz processor, and it runs at 2%-3% almost all the time (just peaks out when running an AV scan). I use software raid so I can swap the hardware if the system fails - that is really easy, I have a spare system from my last upgrade as a backup.

If you use LVM you can do disk space upgrades and stuff (I replaced my disk drives with bigger drives and expanded the volumes on the fly from work via VPN).

Si

For what it’s worth: my fileserver has been a 500MHz PIII running slackware for the past 7 years. The only reasons I’ve been thinking about upgrading it are 1) it’s noisy and sitting in my office, and 2) it’s probably about time to switch to a more modern linux distribution – I’m an ubuntu convert.

So let me get this straight – as long as I have a circa 2008 board/CPU, I can’t really go wrong (assuming Linux compatibility)? As in a barest of features Asus board and whatever Celeron is still available at the retail level? As long as I’m buying new, I really don’t have to worry about underperformance? (New hardware is definitely in the cards, as I don’t feel want to be dependent on eight-year-old hardware, plus SATA speed will help.) That would make it a bit easier, since all I’d have to do is look for an older board with a long history and strong reliability ratings, and I’ll be all set (as long as it’s on a compatibility list). Sorry if I’m being pedantic, but that makes choosing so … so … easy.

As for RAID/Dual dives, I’ve two priorities: speed and recovery. (Normal backups are a different story; they’ll be handed via nightly jobs to a remote device.) I have two things for comparison. A few years ago, we were both working off of my WinXP box, and more recently, we’ve been working off a Linksys NAS200 in RAID(1). Other issues aside, there is a dramatic difference in save and load times between the Linksys and local. Clearly some difference is to be expected, but a ten meg file over a 100 mps LAN should be trivial, no?

My most recent less-than-scientific test was transferring (via FExplorer copy) an 800 MB file back and forth from my desktop to the Linksys and the testing box I built. The Linksys has two modern WD SATA drives in RAID(1), the test box has a single, 8 year old WD EIDE drive. Both copying to and from times were the same: about four minutes to the Linksys, about two minutes to the test box. Is this confirmation bias or does the RAID really have that much of an impact?

So, speed and recovery…

Speed:
I may be violating some Aesop’s Fable or other, but I’ve been given the impression that under any configuration, RAID(1) will cause a noticeable delay in every save/load. But am I trading one performance hit for another of equal or greater magnitude? A backup program will run resident, but with the price of RAM and nothing else running, I can’t imagine I won’t have the room for it to run without a hit. In addition, every fifteen minutes the box will be copying a few files from one drive to another. Will that matter if we have one of the files open at the time? What will happen if it’s in the middle of a copy when we hit save? What if we’re in the middle of saving and it’s time to run the backup? Does the relatively small size of the files (2 to 10 MB) make a difference?

Recovery:
My thought with a backup was to find a program that will make uncompressed/unarchived full backups of the files. I don’t know if I’m being naive in assuming something like that exists, but I don’t see a program limitation on it. If there’s a failure, we’d just switch login/mapping to the second drive, copy what we need locally and keep working at the desktop level for a while. After installing a new primary drive (no uber-rush, since we’ll still have our archive to copy from), mirror from the backup and we’re good to go again. I can even partition the backup drive and install a boot loader, so repairing the box will take minimal effort. (There’s also the thought of pulling the second drive and putting it in an external enclosure, booting a laptop into Knoppix and copying files to a USB key, but that seems unnecessary given the above.) Of course, is there a backup program that will save uncompressed files?

I think your biggest issue is going to be the RAID: Linux doesn’t really like software RAID - the HOWTO calls it fakeraid. A 3Ware or Areca controller will set you back a hundred bucks or so. You might like to consider Windows Home Server now they’ve fixed the data corruption bug.

Not necessarily true.

‘fakeraid’ is a driver for certain purported RAID cards that are really dumb ATA cards with software RAID drivers. Kinda like the I/O version of a winmodem.

The ‘md’ kernel driver dispenses with the pseudo-hardware BS and does everything directly in software, for free. For file server with a couple (or a couple dozen) users, it’s more than adequate.

Hell, that’s all that those Buffalo devices use, and they can get up to two or three grand.

There’s virtually no advantage to buying a RAID card if you’re using ATA, especially if you’re setting up a 100Mbit file server. Assuming you stick the disks on separate channels, even an ancient UATA-33 pair in md RAID1 is going to smoke the network link once the platters spin up.

Honestly, if you’d like to go cheap, I recommend finding an older fanless Via or Pentium-M board with a couple of PCI slots. Buy a nice efficient CPU, and you can save a bunch on power bills.

The real test is this:

date; cp ./bigfile /remote/server/ && sync; date

That call to ‘sync’ is important. It’s entirely possible the Linksys unit (which likely has very little RAM) was writing all changes immediately to disk, while your test box was spooling up all those I/O changes in the buffers. sync flushes the buffers.

It’s why you have to “safely remove” flash media from Windows - to make sure all the data’s actually written to the filesystem.

That’s not even getting in to the cache or efficiency of each server’s network cards.

apt-get install rsync

man rsync

Or, more succinctly:

rsync -a /foo /bar

That’ll copy changed files only over from /foo to /bar.

ETA:

And finally, RAID1 should actually improve read times by a small, but measurable amount.

You know, I put it in the OP that I tried a couple other places but got nowhere. Why do I ever look elsewhere? Honestly, thanks so much for the guidance here.
Quartz: Windows Home Server might do the trick, but I made a huge mistake and looked Linux directly in the eyes. I knew I shouldn’t have, knew it at the time, but I just couldn’t help myself. All I wanted was to read a Linux-based drive in an external enclosure, and look where I ended up.

black rabbit: Not looking to go cheap per se, just not looking to overspend on capacity I don’t need. At the same time, I don’t want to underperform, hence one of the major questions of the OP. Given the problemless running of older equipment noted in the thread, it seems like I can focus more on reliability than capacity — if everything out there (retail) is likely to have more capacity than I need, then I can (relatively) ignore it in my putting components together.

Heh… “man rsync.” If I’m not mistaken, that’s a polite way of saying RTFM :slight_smile:

So I’d use chrontab (or just “chron.” Why do people call it just chron?) to set that to run at whatever interval I want. I don’t have to worry about any fancy-shmancy backup program, that’ll monitor the file allocation table (or the Linux equivalent) for me. Could there be any conflict/problem if I have a file open at the time? If I’m saving when it’s writing or vice versa? Or is that where the fancy-shamancy program comes in?

What what what?! I thought RAID0 did that by reading multiple drives at the same time. Or have I had my RAIDs backwards this whole time? How does RAID1 improve things? I can see it being transparent (writing two equal-speed drives at the same time), but how can it improve things? Or does it improve reading only, while we’ll still have a lag in saving?

If I’m reading that right, then “There’s virtually no advantage to buying a RAID card if you’re using ATA, especially if you’re setting up a 100Mbit file server. Assuming you stick the disks on separate channels, even an ancient UATA-33 pair in md RAID1 is going to smoke the network link once the platters spin up.” means I should pretty much stop looking for a board with on-board RAID, stick the two drives in separate SATA channels, and start figuring out how to get Ubuntu to handle the RAID for me. That’ll mean both drives can act as a boot drive in case of failure, no scheduling to do, just set it up and go.

ETA: Actually, I think save time is key. Loading a file for two or two and a half seconds isn’t really noteworthy. It’s the middle of working on something, almost compulsive and frequent saving that gives the hourglass/beachball.

Because in RAID 1 you can read two seperate parts of the file simultaneously. RAID 1 is slower on writing - because you have to write twice.

Sounds like you also want to upgrade your network to gigabit.

As noted, software raid using the linux md system is reliable, easy and performs well. The linux system I refer to above has been running in the same configuration on a variety of hardware since 2001. The OS has been upgraded several times, the hardware a few times as PC’s have fallen out of use/equipment has failed. Because it uses software RAID, I have never had an issue moving to a new motherboard. I’d like to see Windows Home Server cope with that. And when I had a physical disk failure last year, I just got two new bigger drives, replaced the dead one, synced the drives. Then I replaced the smaller drive and swapped the master/slave. After boot up and sync, I then did an expansion to utilise the additional capacity. And (apart from the physical disk swaps) most of this was done with the system online and via VPN.

I use DAR2 for backups, but for what you want, rsync looks fine. You can do volume shadows and snapshots using LVM, but you probably don’t need that.

The reason that RAID1 can be faster on read is that one of the disks can be in a better position to access the data than the other, so on average you can get the data a bit quicker.

Get new hardware, but don’t overspec it. I use 1Gb RAM in mine, but I run a number of java-based services (SSLExplorer, OpenXchange, and others) and with the extra memory I don’t use any swap. I still like the new Atom boards - a 1.6Mhz low power mini-ITX board for £45 ($88, but only one ethernet port built in and one PCI slot, I’d like two ethernet).

Good luck

Si

Oh man oh man… first it was Linux… now it’s Gigabit.

See, here’s the thing. For other purposes, I’ll I was going to be getting a Linksys WAP54G access point. So that’s about seventy bucks budgeted already. A WRT350N (4 port wireless gigabit router) is only $150, so while not a small jump, it’s certainly not so large as to not think about it. This assumes I can repurpose the WRT54G (regular wireless router) into acting like an access point. No need for gigabit there, just a wireless access point that I can then plug in a wired LAN device.

But since we’re not streaming or anything, will we notice a difference? Forgetting about overhead and other delays for a moment, if we save a 100 MB file over our current connection (the WRT is a 100 MPS router), in theory, shouldn’t it just take one second? Doubling it to overly account for overhead, shouldn’t we still be saving a 100 MB file in just two seconds? Since most of our files are 10 MB or less, are we really talking about one to two tenths of a second? Would gigabit really turn that down to one hundredth of a second? From barely noticeable to completely transparent?

Something must be really screwy with my math or my understanding, though, since a 100 MB file—whether transferring to the test box, the NAS, or each other, takes much longer than just one to two seconds.

Also, don’t I have to be extra careful with cables in gigabit? Not that I don’t do my best to keep everything neat as it is, but isn’t gigabit much more sensitive to turns and whatnot? All cables are CAT6 if it makes a difference, and the longest we have in the office is about twenty feet.
Now, compare that speed difference/upgrade to saving to a primary-rsync-secondary compared to saving with RAID1—are the differences comparable?

It’s going to take at least 8 seconds to transfer/save a 100 megabyte file over 100base-T Ethernet.

I haven’t noticed any cabling problems with 1000base-T Ethernet. It seems to be pretty resilient. I had a cable go bad due to water getting inside the cable. It wouldn’t work at any rate, even 10base-T. The important thing for 1000base-T is to have 4 good wire pairs inside the cable. It uses all 4 pairs, unlike 100base-T and 10base-T.

1000base-T hardware is so cheap these days that I don’t see any point in buying 100base-T hardware.

Much more like it. Not that I understand why per se, but much closer to my reality.

Would gigabit be 1/10th that time?

In an ideal world, yes. In reality, there may be system bottlenecks that reduce the data transfer rate to something less than the theoretical maximum. Still, you can expect a major improvement.

You can use the WRT54G as an access point - just disable DHCP and plug it in to the network via one of the client ports (not the internet side) - it will work just fine as a pure access point.

Si

You might also get a dedicated network attached storage box.

I personally have the DS207+. The boxes listed in that link are without drives which you buy separately and install. These boxes will probably be be quieter. They will use less power than a general purpose computer and they will be smaller. NAS boxes will probably be cheaper than a general purpose computer unless you already have some of the components to build the dedicated computer.

If you’re going wireless, forget about gigabit speeds over the wireless link. But why not buy yourself a seperate gigabit wired router? They’re dirt cheap, as are the NICs if the motherboard doesn’t support gigabit.

The answer is that files are megabytes; network traffic is measured in megabits (8 bits/byte, as I’m sure you know). Multiply (or divide) everything by 8, and don’t forget to account for some network packet overhead.