Friday, March 30, 2007

A file server for home (Part 2)

In this second part of "A file server for home" I will introduce a few factors which I am currently considering in making the decisions about what my file server will eventually look like. Monetary constraints: That money would be a factor is so obvious it is almost overlooked. It is the single most important contributor in any business decision, and while one might be tempted to think that power users have a large budget for computer spend, there exist a certain culture among many hackers, particularly in the "over-clocking" community, in which one try to make the most of the least through being clever in stead of by throwing money at the problem. While I am not into “over-clocking”, I can still respect the art from the sideline. I am looking for a cost-effective way to achieve resilience and stability, and the principle remains: I will try to build the best file server possible while spending as little as possible through (hopefully) being clever with configuration decisions. With this in mind I am going to try to do two things: I will attempt to reuse what I have, and when I cannot help it, I will buy components with an upgrade path and maximal life expectancy. Physical Constraints: One of the implied ideas of a file server is that it is going to be used to house a number of disk drives in the chassis. Keeping in mind that I want to reuse some of my existing hard drives and want future expansion capability, it is clear that the chassis must not be too limited in terms of the number of drives that it supports. I had a look at the cases that I have lying around in my garage, particularly a lovely, if somewhat older, full tower case. Unfortunately none of them are ATX form factor compatible, and I will have to use a motherboard which supports SATA hard drives, which implies that the motherboard will be fairly new and therefore will be an ATX form factor board, which effectively eliminates the use of any of the unused cases that I already have: Hence the need for a new chassis. I started searching and reading reviews on web sites, and one thing that frustrated me was the limited space for installing hard drives in many cases - Most top out at 8, sometimes no more than six drives are supported, and that is counting the bays to be used for CD/DVD and floppy-disk drives. Only the very expensive cases support ten or more disk drives. Additionally, I hoped to find a case in which the disk drives are side-accessible so that I would be able to service drives without having to resort to removing the graphics card, etc. As luck would have it I found someone who wanted to sell a case because it was too large and too heavy to carry around to LAN parties. The case in question turned out to be the Thermaltake VA8000. With a native capability to house 14 drives and the ability to be extended further by using special drive bay modules, it was an almost perfect fit to my requirements. On checking the condition of the case I found that there was some scratches on the side panel, the keys are missing and the front doors would no longer latch into the closed position. These are all really minor issues, and while cleaning the case I discovered that the little magnets meant to keep the doors from swinging open had just fallen out and were stuck to metal parts elsewhere in the chassis, so that was easily sorted with a bit of crazy-glue. What is more is that the case's monstrous size makes it easy to access the drives even with a bunch of cards installed, despite the fact that the drives are not side accessible, so this chassis is going to form the perfect basis for my file server. In fact this case deserves a full review on its own, so watch this space for an upcoming post on this topic. I still need to finalize some of the other components for this server, but currently the items that I am considering are the following:
  • Motherboard: I have a Gigabyte GA-K8NXP-SLI motherboard which I am currently using as my desktop "workstation". The SLI part is way overkill for a server, but this motherboard sports no less than eight SATA ports and two IDE controllers, for a maximum of 12 drives.
  • CPU and Memory: Seeing that I will need to replace the motherboard in my desktop and the new motherboard is unlikely to use the same socket type, it means that my existing processor and memory will be freed up, and will serve just fine in the server. Therefore the server will inherit my existing AMD Athlon 64 3000+ processor and 2 GB of DDR400 memory.
  • Graphics card: This motherboard does not come with an on-board graphics card, so I will have to provide one. My existing card is a Gigabyte GV-3D1 dual 6600GT, which would be totally overkill for any server, but might not be compatible with whatever new motherboard I eventually get for my desktop. I will test the GV-3D1 on my new motherboard and if it works it will stay in my desktop and I will just use an old PCI graphics card in the server. Otherwise I will use it in the file server purely to prevent it from being wasted, and then I would need to start evaluating what to buy for my desktop, though that is an entirely different discussion in itself.
  • DVD-Writer: The drive I have right now is a very capable LG Lightscribe, but is slower on some media formats, particularly the DVD "minus" variety of disks. I will buy an extra DVD-writer, and then decide which one of the two goes into my desktop, and put the other one in the server. This is not something that needs to be decided now, but it will depend on what my usage pattern turns out to be: Will I be writing more disks on my desktop or will I need it more often in the server to make backups...?
  • Tape Drive: I am hoping to find a tape drive to ease the backup process, ideally either an LTO 3 or LTO 4 drive. This is something that I will add to the file server when I find a suitable device.
  • Hard Drives: The server will start off by inheriting the six disk drives currently in my desktop system, these being: an 80 GB IDE drive, a 320 GB IDE drive and four 160 GB SATA drives. More drives will be added over time as and when required.
  • Power Supply: Hard drives do not actually use huge amounts of power. It is typical for a drive to use around the 10W mark, while during peak usage or start up they drives can use up to about 30W each. Hard disk spin up is staggered, and if I assume that out of about 10 potential drives I am unlikely to often have more than one drive operating at peak performance at any one point in time, with maybe two more drives operating at around the 20W mark and the rest mostly idle at 10W, I will realize around (7*10) + (2*20) + 30 = 140 W for the disk drives. The graphics card could potentially peak at anything from 20W for an old PCI card to around 100W if I were to use the GV-3D1 card (I don't expect to ever see the graphics card in the server peak but I will use peak values here just to be safe). The CPU, RAM and Motherboard chip set adds another 150W during peak usage, bringing the total to around 400W. This is peak usage, but a 500W power supply gives me some head-room for movement and in any case you do not want to operate at the limit of a power supply's capacity as it is a likely source of instability. A good power supply is something that I will need to purchase.
The Gigabyte motherboard has got one caveat that I need to mention: Four of the SATA ports are provided by a Silicon Image SI3114 controller, which does something non-standard to the disks connected to it. What I have found is that with drives that I have previously used on the SI3114 the partition table is not recognized when you try to use them on a different SATA controller. This also happens if I try to use one of these drives in an external USB enclosure like the NexStar III SATA enclosure that I have. On the other hand, disks which were previously partitioned while connected on the nVidia chipset's built-in SATA controller is recognized on other motherboards and in my external enclosure without any problems. The reason why I mention this is that if the motherboard ever should fail, I will be in trouble because these controllers are not present on to many motherboards, and so, unless I can get another motherboard with the same type of controller, I would have a hard time reading the data from the disks that were connected to the SI3114 controller. I must admit that this is not something that I have been able to verify yet, but there is this risk. Note: All of the motherboard SATA ports are set to operate in JBOD mode, which allows me to select how I want to set up the RAID in software, and also allows me to change the RAID layout later. I also still have to make a final decision on what operating system to use. I would be a poor Solaris fan if I did not at least consider Solaris as an option, but in my experience it is just more suited to a server environment where the applications you want to use are unlikely to change much and all well supported under Solaris. OpenSolaris offers me the same capability to set up zones and ZFS file systems, but many applications are easier to get to run under OpenSolaris. Seeing that I will want to use the server for several other purposes besides simple file sharing and because I have not yet identified the actual software that I will use, I will eliminate Solaris at this point just because it will be too restrictive in future. If file serving was the only purpose, then it would have been perfect. I will also eliminate Windows because it is not free, too difficult to configure, doesn’t have enough free software, and is just generally a pain to work with. If I were to use Linux I would have trouble getting ZFS to work - the Linux ZFS port is not nearly stable enough for my liking. This means I will probably have to use OpenSolaris. The decision to use OpenSolaris is not yet cast in stone, but it looks like the right choice because I will be able to set up Zones/Containers and I will be able to use ZFS features like snapshots and raid-z. The four 160 GB disks that I have will nicely make up one raid-z "stripe". For the rest of the disks I will need to find an optimal way to configure them, but I still need to investigate the optimal way to set up the disk. I am also considering is to get the free Foundation Suite Basic disk management software as I know it very well and it too provides simple and online management of disk space. In part three of this series I will go in depth in how I selected an operating system for this file server.

Monday, March 19, 2007

Choosing between Solaris, Linux and MS-Windows

Today I will take this very contentious, highly flame-war-provoking issue head on. Many people are likely to disagree with me, but that is OK - opinions are a big part of what makes the Internet tick. Naturally you need to start off by considering your specific requirements and the resources that you have available: Many RISC based computers support only a very specific Unix operating system, while many consumer devices such as web-cams and WiFi adapters are difficult if not impossible to get to work with anything other than MS Windows. It will make sense to take stock of your existing skills to support that environment when choosing an operating system for a server application if multiple options exist. To start off with I would like to group operating systems into three categories the way I see it. These are intentionally vague, and operating systems can migrate between the categories. The categories are:
  • The End-User operating system: The question "Does it do the job?" comes to mind. I will include MS Windows and MacOS into this category. For reasons I will get into soon, this includes the Server, Pro, and Home editions of MS Windows.
  • The Power-user operating system: "I'll hack it until it works the way I want it to work". This category without any doubt includes Linux.
  • The server operating system: This is for specifically supported hardware running specifically supported applications, and includes all the Uni'ces, in particular Solaris because it is available for free and on non-RISC hardware, and probably FreeBSD as well.
I'll discuss each of these categories a bit more in depth, taking them in the opposite order. In the enterprise data center we usually find many large, multi-processor servers designed and ratified to run specific operating systems, which in turn are ratified and supported for use with specific applications and databases. Vendors in this environment sell solutions where all the components making up the whole have been preselected, and include support for their solutions under a contract which often limits your freedom, for example by limiting you to specific versions of a specific operating system. In this environment, going with a supported operating system is the only sensible choice, and the big Unix vendors all have long lists of applications that can be run on their systems. The benefits of going with these types of solutions lies in that the hardware is intelligent about the operating system, being able to halt the operating system without terminating it, allowing hardware to be re-configured while the operating system is running, is able to take crash-dumps when the system hangs and presents a uniform device tree of the installed hardware to the operating system. This all provides a high level of stability and supportability, something which comes at a price, but worth it when the service delivery to your clients depends on the systems running without interruptions. That is not to say that Linux and MS Windows servers do not have a place in the data center - the exact same rule holds true: Select the platform for which your application is supported - whether it is a mainframe or a network vendor's appliance with an embedded firewall and authentication service. With Linux-based servers I often notice that the applications prefer an environment where a high degree of customizability is provided, and with Windows servers I often find that the applications have a keep-it-simple nature, at least as far as the server configuration is concerned. Linux fits squarely into the Power-user operating system category. Features like the ability of the user to re-compile the operating system to eke out an extra ounce of performance, directly modify the source code to get a specific result, the ability to change and/or replace any single component with one of your own choosing, and a wonderful following of supporters eager to help out one another truly makes this operating system into a power-user's heaven. I find myself more productive when working on Linux: this is without a doubt because I have it set up exactly the way I like it and I have a huge collection of software tools installed and configured to help me maintain the system. Also something that helps is that Linux is much more transparent in reporting what is going on "under the hood" - You can obtain accurate log files of system events, examine what program processes are doing in fine detail, and you have accurate reports of the state of every single component that the operating system controls. This all makes it easier to track down the "root cause" of problems, and in my experience with Linux, using the computer is less of a hit-and-miss affair - if it works a certain way today, it will work the same way tomorrow and the day after. Things like program crashes are much more easily reproducible, and as a result, easier to resolve. All of this requires a certain level of commitment and even enthusiasm to learn more about the workings of the computer and the operating system though, and that is probably not for everyone. The other options available to "power users" with PCs are basically FreeBSD, Solaris, Open Solaris, and Windows. Windows is sometimes not transparent enough and things do not always work the way Microsoft says it does. Solaris' hardware support is still somewhat limited, making it a difficult to justify choice. Open Solaris is maybe still too much of a newcomer to be judged fairly, tough good work is being done, albeit slowly, by projects such as Nexenta GNU Solaris, an OpenSolaris distribution providing the GNU tools in stead of the Solaris defaults. The End-user category are generally for people who don't care that their operating system limits their ability to troubleshoot faults, learn more about the computer or replace sub-components of the operating system. Many of these are company PCs used to run a selection of applications needed to run their business, the support and development of which companies have invested a lot of time and money into. 20 years ago Microsoft was probably deliberately lax in prosecuting pirates of their operating systems and application software. Most of these "pirates" were kids who were learning how computers worked, and by the time they made it into the workforce, Windows was what they expected on their computers and in the data centers. Today this large following of Windows users equates to a workforce who will pretty much need to be retrained if one suddenly wanted to switch to a different operating system. You could argue that the training is not so hard - many applications are web based these days, sending an email remains pretty much the same, etc, but this only holds true in theory: The moment many users are faced with a new login screen that looks different, they feel lost. Suddenly the user name field becomes case sensitive, and your lost user turns into a frustrated user who hasn't even had a chance to try the applications yet. Worse, every application's menus are laid out differently, and that is only once they learn which application does what. Space characters in file names suddenly cause havoc, and the slash in directory paths is turned around. To the end user, it feels as if nothing works the way they are expecting it to work and it takes months before they once more become productive. I must agree with A Russel Jones who concludes that for people who don't want to have to consider whether their applications will work with the window manager that they have installed, learn how to enter commands in terminal, or why security is a good thing when it seems to just make life harder, a Windows based PC is often the best option. Linux can be carefully configured to simulate this in a controlled environment where an IT department makes all the design decisions, installs all the software, and makes sure that the all of the hardware in use is supported - when you have people with the necessary skills. Setting up such an ideal configuration is something that will take time and effort, and that means there is a price tag attached, and this could be more expensive than just running Windows. In the end business decisions always come back to the numbers: What are the benefits (over the alternative), and how does the long-term costs of the various options compare. The Pro and Server editions of MS Windows have the same basic limitations that the Home edition have: The server edition is no more stable, powerful, transparent, manageable, or secure than the Home edition, and this puts it into the "End-user" category. It also fits in with the "keep-it-simple" tendency of this category, even if it is installed on a data center rack mounted server which never crashes. The power user is probably the best off: They have the most freedom to choose what operating system and applications they want to run, how they want to configure their computers, and can usually make their own choices about what programs to use. This is basically because they are able to support themselves when things stop working. So basically it boils down to the age-old adage: Use the right tool for the job. I suspect that you will find that nine times out of ten the categories I described here will hold true though. Disclaimer: This article necessarily relies heavily on generalization. Please keep that in mind if you decide to comment.

Tuesday, March 13, 2007

A file server for home (Part 1)

In this first part of "A file server for home" I will cover my situation and how I decided that I need a file server. In a follow-on "part 2" I will get into the actual implementation. I am the type of person who fixes their computer when it is not broken - For me it is all about learning about the computer: how it works, how it can work better, and what else I can make it do. Re-installing the operating system, upgrading, testing any interesting sounding programs that I chance upon on the internet, it is all part of the game. All of this means that my data is at constant risk. I could make a mistake when I re-partition a disk, or a misbehaving piece of software could cause havoc with my data. To make matters worse, all of my data is at risk of failure of an actual disk drive! And I often enough get my computer into a state where it is unable to boot for a few days - this instance being a good example. Now you must understand that the content of my hard drives represent many years' worth of collecting shareware, downloading documents, Linux install disk images, recording logs of my online activities, stored email archives, my digital photos and much more. Replacing this would be all but impossible! The only answer is to take the data out of my computer and save it externally. The first question you may ask is why don't I just make backups? Oh I do.... It is just that I fall behind in keeping my DVD-R based backups up to date, and in any case, CDR and DVD-based backups have been known to be fallible. Each disk that I do make contains a little bit of everything, some photos, some documents, some shareware and what not else, which makes organizing the collection of backup disks another nightmare. What is more, to take a single full backup will take a stack of about 160 disks. So basically taking backups to DVD is just simply not a viable option. A very simple alternative option would be to just get a Home NAS device, but I have discarded the idea as too expensive, too inflexible, and too limiting. There is an alternative which will better meet my needs: A dedicated file server. Actually the word "dedicated" needs to be qualified, as you will see soon enough. Building a file server PC provides me with the following benefits:
  1. I can build a DVD or tape drive into the file server to take backups directly without going via the network for the times when I need to dump some files onto a removable media. This can not be done with NAS devices.
  2. I can add many disk drives to a PC - my current motherboard supports 12 drives without the use of S-ATA port multipliers or even USB-based drives, and additional S-ATA controllers and/or port multipliers can be added for even more disk drives. Very few NAS devices actually support more than two drives or even allow you to extend a raid array.
  3. Existing NAS devices share the data via SMB only. Being a Unix and Linux user I want to have directories shared via NFS and possibly other protocols in the future.
  4. I could install and run programs on a file server, for example to make the machine act as a streaming media server. Granted, some NAS devices already includes this functionality, but you're limited to whatever functions it comes with - you don't have the freedom to replace the software with something of your own.
  5. A PC file server can potentially do other non-file server things, such as being a print server, Scan server, Web server, Mail server, Folding-at-home station, and of particular interest to me, a web proxy and DNS-caching server.
  6. A PC file server can have multiple network ports for better connectivity; it could even have a WiFi port.
  7. NAS devices do not commonly provide RAID-5 disk protection. A PC would support any level of RAID and hot-spare drives to boot. RAID-5 in software is often criticized as being "CPU-heavy", but in an idle file server I'd just be happy to give that processor something to do! Mirroring is too expensive for my "home-user" size budget.
  8. Many NAS devices don't support power-save or sleep modes for the drives. I believe that it is important to turn the drives off to extend the life-time of the drives, particularly in an always-on server.
Essentially one of the advantages of a dedicated file server is that it does not need to be dedicated, it can do other things besides! One must of course also consider what you lose when you go for a computer acting as a file server rather than a nicely build NAS appliance:
  1. To some the physical appearance of an appliance is appealing, but in my case the server will sit tucked away in the corner under a desk, so I do not consider this important.
  2. NAS devices usually have some form of online disk replacement and array re-building, but I can afford the downtime to replace drives and while doing maintenance.
  3. A NAS device is simple and probably fairly stable - you just press the button and voila ... it is running, while with a server you have some instability and, due to the added complexity, more chance for things to go wrong or stop working. Being the hacker that I am, this is possibly a good thing rather than a bad thing, but don't tell my wife.
  4. And finally, some may consider the portability of a NAS device an advantage.
There is one other important consideration, at least to me. With a PC based file server I would be able to log in on the server and do file maintenance directly on the server. To re-organize the directory structure and layout with a NAS device would require me to copy the files back-and-forth between the device and a PC. In particular, moving files from one disk or partition to another will often require the files to be passed via the network twice. This would be slow and in practice it would probably cause me to not actually get around to sorting my data (which is currently in a very sad state of disorganization). With a server I would be able to log on into the server and then perform this organization directly on, and internally to, the server. This is important because I am more likely to actually do the data organization when it doesn't have to pass over the network the whole time, and as a result my data will be better organized and therefore ultimately more useful! I am now in the planning stage for my file server. It will probably have around 1TB of usable disk space to start off with, made up mostly from my existing hard drives. But I will talk about the design and implementation in part 2 of this little series, so keep watching this space! (In other words, add it to your RSS feed aggregator)

Monday, March 12, 2007

Today is the first day of the rest of my life

The big question: What to put into my first blog entry. Of course, this came only after the other big question: choosing a home for my personal blog, a decision the gravity of which I never before really considered! Now, having gone through the seemingly simple process of selecting a blogging website, setting up a URL and picking a name for my blog, I realize just how serious a decision it is. And then to make my blog into something real of course I have to now write this first entry. Gee! Who will read my blog? Prospective employers? Friends? Colleagues? Criticizers? I guess I will not be able to please everybody. And what tone to take? Make it formal, casual, serious, slapstick? I am sure that, as with much else in life, this blog will grow and take on a shape naturally, based on whatever inspires me every day. I have of course long thought that I should take the plunge and set up a blog, but every time I thought about it I stopped when it came to selecting a site - one might say I've been having commitment issues. Deciding on whether to brave it by going on my own, setting up my own website for the extra freedom, against the simpler option of picking a specialist hosting service made it no easier. But in the end I decided to give it to the experienced pros, and now here I am. I also only had vague ideas of what to call the blog, and after much consideration came up with the title "Initial Program Load". As the name suggest this site will be mostly about computers. This is only natural as computers are my great passion. But without a doubt other things will appear, including, but not limited to, some mad ravings about politics, events, and philosophy. How this all turns out remains to be seen. Now let the show begin.