Many, many enthusiast/geeks like myself have huge storage needs. I currently have the following data on my home network:
- 200GB of WMA 9 Lossless music (“backups“ of my CD collection).
- 30GB of WMA 9 128kpbs music (re-encoded of above for access by my portable players).
- 1TB of home video captured from Hi8 tape to DV format (still trying to find a reliable way of batch encoding this to WMV 9; should be able to get it to 250GB and retain quality).
- 1TB of backed up DVDs; only a fraction of the DVD collection!
- 20GB of digital photos.
- ~200GB of other junk (Exchange database, documents, source code, backup of product CDs/installers, etc…)
Granted, I’m a super-geek, so this is an exceptional amount. But even if I encode the home movies, I’m easily at multiple TBs. If I forget about the DVD thing, it’s still more than a TB.
What’s the best way to store all this stuff? A balance of reliability, performance, cost, and manage-ability is required. I have some thoughts on this, and that’s what this post is all about…
For the moment, forget that I’m a geek. Assume I’m a typical Dad with a 5 megapixel digital camera and a PC and let me tell you a story:
Some time after our daughter was born I accidentally recorded over two (yes, I made the mistake twice!) VHS tapes that were precious. One was of our wedding and the other was of my wife’s ultrasound with my daughter in her tummy. It’s bad enough that I destroyed those precious items, but it’s even worse that the shows I recorded over them were some Star Trek episode and Aliens IV. To this day, some 10 years later, my wife still gives me an unbelievable (but deserved!) amount of grief about this.
Prior to us having a digital camera (6 years ago?) I lost all data on my PC due to a stupid “user error“. I lost tons of stuff. But I can’t remember today what any of it was. It was important, but for the most part it was replaceable or repairable (tax records, money files, letters, email, etc…). I don’t think my wife even noticed.
If this same thing happened today, the impact would be very different: my PC now has precious data on it. Photos. 20GB of 5+ years of family photos. If I view these as precious, imagine what my wife thinks? Imagine the doghouse I’d be in if I allowed a PC failure of some sort to destroy those photos?!?!
For the first time in history, home PC users actually have irreplaceable, truly precious data on their machines! This is a big deal (I wonder if you have to be married with children to really understand this).
In my early days with computers I had lots of disk failures; I recall them mostly having to do with flakey Apple II floppy disk drives. But I remember telling people in the late 1990s that I had never had a hard drive fail. I considered myself lucky given the number of drives that had passed under my desk. However starting a few years ago I’ve had an increasing number of disks fail. In fact, in the last 9 months I have had 6 (yes six) drives of varying manufacture and vintage fail. In addition I’ve had one Western Digital and one Maxtor drive arrive DOA from the store.
I’m not alone. Do a Google Groups search for “hard drive failure” and spend 15 minutes reading the horror stories (like this).
I personally believe there are several factors at work here. The first is that the quality of “consumer” drives has gone down. As capacities have increased and margins squeezed, I believe the vendors have started cutting corners. Another factor is that many hard drives are now external. Drives are fragile mechanisms. When they are encased within a PC chassis they are relatively protected from bumps and drops, but not when they are external. The last factor is heat. Today’s drives have lots of heavy platters spinning really fast. Friction and all those watts mean heat generation. Hot drives are not happy drives. Those neato Maxtor and Lacie external disks with fan-less enclosures are death traps. Worse yet, people stack these things. Where does heat go? Up. Up to that poor drive at the top of the stack.
Another disturbing trend is for vendors to produce drive enclosures that contain multiple drives but look to the PC like a single drive. The Lacie Bigger Disk and Bigger Disk Extreme come to mind. These “disks“ are really enclosures around 4 250GB IDE drives with a built-in controller that uses striping to make them appear as a single spindle. The problem here is that if one of the 4 drives fail, you will likely loose ALL of the data on all 4 drives (because of the nature of striping). Thus the “drive“ is 4 times as likely to fail!
While it’s not a hardware failure, it is known that many consumer IDE/ATA drives ignore the “force write-cache flush” command in order to increase benchmark performance. Turning your PC off suddenly can result in data loss and corruption. NTFS is pretty robust to this (but not 100%). FAT is another story.
The fact is, hard drives fail. So you need to protect against that. There are multiple strategies, each with costs associated with them. Some of these strategies can be combined increasing reliability of your data.
- Use high quality drives
- Use fault-tolerance such as RAID 1, 5, or 10 or file based replication
- Avoid external drives except for easily replaceable data and backups
- Implement an automated regularly scheduled backup scheme
- Keep a copy of all precious data “off-site”
- Practice restoring from your backups
I’ll be covering each of these strategies in depths in further installments of this post. I’ve already written about “managing your storage namespace” using DFS. I will have an installment where I expand on that as well.