Many, many enthusiast/geeks like myself have huge storage needs. I currently have the following data on my home network:
- 200GB of WMA 9 Lossless music (“backups“ of my CD collection).
- 30GB of WMA 9 128kpbs music (re-encoded of above for access by my portable players).
- 1TB of home video captured from Hi8 tape to DV format (still trying to find a reliable way of batch encoding this to WMV 9; should be able to get it to 250GB and retain quality).
- 1TB of backed up DVDs; only a fraction of the DVD collection!
- 20GB of digital photos.
- ~200GB of other junk (Exchange database, documents, source code, backup of product CDs/installers, etc…)
Granted, I’m a super-geek, so this is an exceptional amount. But even if I encode the home movies, I’m easily at multiple TBs. If I forget about the DVD thing, it’s still more than a TB.
What’s the best way to store all this stuff? A balance of reliability, performance, cost, and manage-ability is required. I have some thoughts on this, and that’s what this post is all about…
For the moment, forget that I’m a geek. Assume I’m a typical Dad with a 5 megapixel digital camera and a PC and let me tell you a story:
Some time after our daughter was born I accidentally recorded over two (yes, I made the mistake twice!) VHS tapes that were precious. One was of our wedding and the other was of my wife’s ultrasound with my daughter in her tummy. It’s bad enough that I destroyed those precious items, but it’s even worse that the shows I recorded over them were some Star Trek episode and Aliens IV. To this day, some 10 years later, my wife still gives me an unbelievable (but deserved!) amount of grief about this.
Prior to us having a digital camera (6 years ago?) I lost all data on my PC due to a stupid “user error“. I lost tons of stuff. But I can’t remember today what any of it was. It was important, but for the most part it was replaceable or repairable (tax records, money files, letters, email, etc…). I don’t think my wife even noticed.
If this same thing happened today, the impact would be very different: my PC now has precious data on it. Photos. 20GB of 5+ years of family photos. If I view these as precious, imagine what my wife thinks? Imagine the doghouse I’d be in if I allowed a PC failure of some sort to destroy those photos?!?!
For the first time in history, home PC users actually have irreplaceable, truly precious data on their machines! This is a big deal (I wonder if you have to be married with children to really understand this).
In my early days with computers I had lots of disk failures; I recall them mostly having to do with flakey Apple II floppy disk drives. But I remember telling people in the late 1990s that I had never had a hard drive fail. I considered myself lucky given the number of drives that had passed under my desk. However starting a few years ago I’ve had an increasing number of disks fail. In fact, in the last 9 months I have had 6 (yes six) drives of varying manufacture and vintage fail. In addition I’ve had one Western Digital and one Maxtor drive arrive DOA from the store.
I’m not alone. Do a Google Groups search for “hard drive failure” and spend 15 minutes reading the horror stories (like this).
I personally believe there are several factors at work here. The first is that the quality of “consumer” drives has gone down. As capacities have increased and margins squeezed, I believe the vendors have started cutting corners. Another factor is that many hard drives are now external. Drives are fragile mechanisms. When they are encased within a PC chassis they are relatively protected from bumps and drops, but not when they are external. The last factor is heat. Today’s drives have lots of heavy platters spinning really fast. Friction and all those watts mean heat generation. Hot drives are not happy drives. Those neato Maxtor and Lacie external disks with fan-less enclosures are death traps. Worse yet, people stack these things. Where does heat go? Up. Up to that poor drive at the top of the stack.
Another disturbing trend is for vendors to produce drive enclosures that contain multiple drives but look to the PC like a single drive. The Lacie Bigger Disk and Bigger Disk Extreme come to mind. These “disks“ are really enclosures around 4 250GB IDE drives with a built-in controller that uses striping to make them appear as a single spindle. The problem here is that if one of the 4 drives fail, you will likely loose ALL of the data on all 4 drives (because of the nature of striping). Thus the “drive“ is 4 times as likely to fail!
While it’s not a hardware failure, it is known that many consumer IDE/ATA drives ignore the “force write-cache flush” command in order to increase benchmark performance. Turning your PC off suddenly can result in data loss and corruption. NTFS is pretty robust to this (but not 100%). FAT is another story.
The fact is, hard drives fail. So you need to protect against that. There are multiple strategies, each with costs associated with them. Some of these strategies can be combined increasing reliability of your data.
- Use high quality drives
- Use fault-tolerance such as RAID 1, 5, or 10 or file based replication
- Avoid external drives except for easily replaceable data and backups
- Implement an automated regularly scheduled backup scheme
- Keep a copy of all precious data “off-site”
- Practice restoring from your backups
I’ll be covering each of these strategies in depths in further installments of this post. I’ve already written about “managing your storage namespace” using DFS. I will have an installment where I expand on that as well.
my "digital empire" is no where near as expansive, but i’ve experienced a potentially earth shattering HD crash last year that has me rethinking my entire storage/archival strategy.
platters are delicate enough; when companies cram 4 of them into a single enclosure, its a disaster waiting to happen.
i haven’t fully thought this out yet, but i see my long term "precious data" solution as having two critical components; redundancy and optical storage. "live" data; that is, data that is still being added to, modified, or constantly accessed, needs to live on a 0+1 raid array; true archival material (photo’s i’ve scanned, videos i’ve captured, broadcasts i’ve recorded) should live in a DVD jukebox.
are there any good solutions for firewire raid?
It is a mistake to archive precious material to dvds or cds. These degrade.
The problem is that tape hasn’t kept up and changers are too balky to keep running hands off.
I looked at building a SAN using iSCSI and that just doesn’t seem quite ready for prime time. One more system to fail.
Love to see where you are going with this…
The points are easy the tell, but difficult to find an answer:
* Implement an automated regularly scheduled backup scheme
where do you backup tons of gigs? Tapes? no, DVDs? no, hard-drive? maybe, but you need to double your capacity to acomodate the data
* Keep a copy of all precious data "off-site"
Again, where? removable hard-drives. Bulky. Maybe an internet backup schema
* Practice restoring from your backups
Here do you need another system to test and enought free time to do it. It’s a lot of $$$
Good post, this is an important topic as bulky personal media continues to fill our hard drives. I think I have a partial solution to the problem, follow the link below for more info:
Have you considered using one of the tools that monitor the S.M.A.R.T. data stored on most modern drives. This information is supposed to be a predictor of a failing drive.
A lot more of our company staff are using laptops and we are getting regular drive failures on these.
As above, ive found CD’s and DVDs to be too unreliable for backup. A couple of CD’s from a few years back have already started to corrupt.
The most reliable format for backing up is to place the data on many computers. For example my media collection is on my server, pc and laptop. So during a problem one of these devices should be able to restore the data. But it’s still an expensive option.
Apart from the above the most reliable method for backing up in my experience is the Iomega Zip Drive. However 100, 250 and 750MB don’t really dent any normal sized media collection.
Just built a 1tb md raid5 array on a Linux box to rsync everything important onto. Works great, samba really integrates nicely with the rest of the house network.
Now I need a second array for MythTV
> Have you considered using one of the tools that monitor the S.M.A.R.T. data
> stored on most modern drives. This information is supposed to be a predictor
> of a failing drive.
I hadn’t thought of this, according to the wikipedia, the indicators SMART checks for can predict 60% of drive failures.
I assume that modern OS’s or bios’s would already recognize and alert the user that something bad is going down? Anyone know if XP already supports SMART monitoring?
I’m using Magneto-Optical Disks (MO) and drive form Fujitsu since 1994. No failure with any disk since that time. 2.3Gb on a one disk is more than enough for really precious data, such as family pics, working files etc. I have 15 of them.
IMHO Best of the best by now, compare reliability and price.