Monday, June 29, 2015

The Boring Topic of Designing a Backup System For Your Studio

[This post was prompted by the occasion of helping a friend try to recover the data from her failed disk server. So the annoying details of this problem, and the necessity of dealing with these issues, is on my mind.]

As part of a series on designing, building and running a small computer animation studio, we are going to have to discuss backups. I will try and break it into small pieces because, frankly, it is a real bore. When we started using computers we did not do so for the joy of making backups which is like taking out the garbage, its not our first choice of how to spend our time. Furthermore, it turns out that there are choices to be made here, and real design issues. I am sorry about that. It just is.

When people started using computers, probably no one told them that they were now expected to be responsible adults about how they cared for their data or run the risk of losing it.   But all of us who have been using computers for a while know this only too well.  You can learn from our mistakes and save yourself a lot of trouble.  

When you drive a car, you are expected to learn how to drive safely. When you work at a real corporation or a University, then it is likely that your professional work is already being carefully backed up and protected, at least to some extent.   But the rest of us, at small companies or on our own, have to put our own system in place.

Keep in mind that hard drives, big or small, solid state or otherwise, are not intended to be perfect.  They have a known failure rate, and even though the manufacturer knows that some of their disks will fail, they only know this on the level of probability.  Disks are made in batches and the failure rate of disks within a batch are estimated as that is part of creating a warranty for the drives.   But disk failure is not the only cause of data loss.  

So here are some basic definitions and principles. In later posts we will go over some of the design choices you may have to make, are likely to have to make, when you design your studio.

For those of you who think I am less creative because I worry about such things, please go fuck yourself. Thank you.

1. The place where you do your professional work might be called your office, or it might be called a studio. A studio can be for one person or 1,000 people. The work might be your personal artwork, or your personal financial records or it might be a very expensive collaborative technology and creative project with a $100M budget.

2. All of these offices and studios need to have given some thought to how much protection they need to give their data in case of disaster, what is the likelihood of disaster, how much it is worth to them to lose one days work, one month's work, one year's work, etc.

3. The goal of a so-called backup system is to provide a level of protection for your data if disaster strikes for any reason, whether by computer malfunction, act of God, or human error.

4. No backup system is perfect, but different backup systems provide different levels of security at different costs, where costs means varying amounts of capital, costs going forward, attention that must be paid to maintaining the system, technical expertise and so forth.

5. A simple backup system well executed is better than a technically complex system that is over the head or beyond the needs of the intended user. An expensive or technically complex backup system that is not well implemented or maintained may be worse than no backup system at all.

6. A backup system is holistic. Together it provides a level of protection.  If some of the pieces work and some do not, you may still have a level of protection.  Thats the plan.   But it is better if all the pieces work generally speaking of course.

7. Backup systems are usually layered, that is, you have more than one protection so that if one fails you do not lose all data, but can fall back to another level. Generally this is implemented as a system to improve the reliability of the main file servers combined with discrete backups saved in a vault from earlier periods.

8. Backup systems are probabilistic. There is a probability of disaster, a probability that any one backup will not be readable. No backup system is perfect, but a good backup system will make the probablility of losing all your data much less likely.

9. Backup systems must be tested before they are used or you run the risk of not finding out that there was a problem until it is too late. This is an extremely common occurrence.

10. No one but you can judge whether this effort, these costs, and so forth are worthwhile. Only you know what this data is worth.

and finally,

11. I have found over the years that I never had too many backups.

In a later post we will go over some fundamental design choices and the kind of risks you will need to protect against.

No comments:

Post a Comment