fail-safe

About a year ago, I spent a good deal of time reorganising my digital photo archive. Over that year I’ve stuck to the same methodology; keeping images on three different devices, including a server, and archiving every SD and Compact Flash card in a fire safe, once they were full. Some might say storing every image in four locations is overkill, but in my experience, with technology you never can be too safe. I was pretty certain it would take an extreme set of circumstances to cause me to lose any important data. That is until this week – when the main server crashed.

The Synology NAS is at the heart of my network and is a repository of the important files from all computers and cameras in the household. It’s configured as a RAID 5 server, used by most IT professionals because of its redundancy and reliability. In RAID 5 data and parity information are stored across all the drives in the array and is designed so that if one drive fails, the remaining drives retain all the data. On replacing the defective drive, the system will integrate the new disk drive into the array, restoring the system back to full operational effectiveness and once again fully protecting data. There is a weakness in the system and it is this: when a drive fails, the system is utterly dependent on the remaining drives staying fully operational. If another drive fails, before the replacement is fully integrated into the array, all data are lost – not just the files on the remaining, working disks. Modern disk drives are pretty reliable, so the chances of two drive failing simultaneously are remote. On Monday this week, however, that remote possibility became a reality and a second drive developed errors whilst the array was being restored and I lost everything on the server.

I previously had backed up the NAS to an external drive, but since upgrading the drives in it and increasing the overall size of the volume, I’ve not had a large enough external drive to be able to do this. I thought the built in security and redundancy of RAID 5 would cover me. That was the big mistake. Thankfully, I have not lost all my data, the important stuff; documents, photos, music and films are stored on other computers too, but they are spread out over several machines and drives, they were only brought together on the central server. I haven’t lost everything, although a small number of items – installers and some documentation were only stored on the NAS and they’re gone for good. However, putting the server back together has taken the best part of a week. It’s Saturday now, the disk drive failed Monday evening and I still have more work to do. The bulk of the important stuff has been restored (and backed up), but I’ve still got more to transfer and to finish configuring the server, that will take a few more days (not full time), but at least now the bulk of the data are once again backed-up.

I also paid the price for parsimony, not all disk drives are equal. I’d used Samsung 1000GB desktop drives in the NAS, a good balance between price and performance. But these are desktop drives, built down to a price, I should have used enterprise-rated drives, more expensive, higher performance and usually lower-capacity, with an average of 1.2 million hours MTBF, these drives should last a good while longer. So that’s what I’ve done now, four new enterprise drives. I’ve gone for lower-capacity too, the NAS now has just on 1TB of storage. Previously it had double that, but I found I only used about one third of its total capacity and a significant portion of the data was filled with a digital dumping ground for stuff I wasn’t sure of, but wanted to get it off other machines. However, I found that all this extra stuff meant it took a whole lot longer to back-up, check, search, maintain etc. Data storage may be cheap, but times doesn’t get any cheaper and I ended up spending longer than I really needed, simply maintaining data I didn’t need to hang onto. The smaller capacity NAS means I need to be more selective about what gets backed up, but if it save me time in the long run, that’s worth it too. And now the NAS is smaller, I can back the whole thing up to an external drive again and I’ve scheduled weekly backups to alternate disks from now on. I’m not going to get caught out like that again.

So it’s been an expensive week, financially and in time costs, but I’m just about back on track again, with a more robust system (I hope). The cloud may seem to be the answer to all these problems, but it’s not for now, we simply don’t have the fat data pipes to backup large amounts of data – the half-terabyte that I restored took more or less a week over gigabit ethernet, over 40M/bit broadband (about the best we can get right now) it’s going to take a whole lot longer. With Apple’s OS X Lion and cloud syncing, it’s moving in the right direction, but it only allows 5GB of data and the 1,000 most recent images, nowhere near enough when I have 50MB scans from my Hasselblad. For the next few years at least, it’s going to be NAS, drives and cables.

Be the first to leave a comment

Leave a Reply