[This
post was prompted by the occasion of helping a friend try to recover
the data from her failed disk server. So the annoying details of
this problem, and the necessity of dealing with these issues, is on
my mind.]
As
part of a series on designing, building and running a small computer
animation studio, we are going to have to discuss backups. I will
try and break it into small pieces because, frankly, it is a real bore. When we started using computers we did not do so for
the joy of making backups which is like taking out the garbage, its
not our first choice of how to spend our time. Furthermore, it
turns out that there are choices to be made here, and real design
issues. I am sorry about that. It just is.
When people started using computers, probably no one told them that they were now expected to be responsible adults about how they cared for their data or run the risk of losing it. But all of us who have been using computers for a while know this only too well. You can learn from our mistakes and save yourself a lot of trouble.
When you drive a car, you are expected to learn how to drive safely.
When you
work at a real corporation or a University, then it is likely that your professional work is already being carefully backed up and protected, at least to some extent. But the rest of us, at small companies or on our own, have to put our own system in place.
Keep in mind that hard drives, big or small, solid state or otherwise, are not intended to be perfect. They have a known failure rate, and even though the manufacturer knows that some of their disks will fail, they only know this on the level of probability. Disks are made in batches and the failure rate of disks within a batch are estimated as that is part of creating a warranty for the drives. But disk failure is not the only cause of data loss.
Keep in mind that hard drives, big or small, solid state or otherwise, are not intended to be perfect. They have a known failure rate, and even though the manufacturer knows that some of their disks will fail, they only know this on the level of probability. Disks are made in batches and the failure rate of disks within a batch are estimated as that is part of creating a warranty for the drives. But disk failure is not the only cause of data loss.
So
here are some basic definitions and principles. In later posts we
will go over some of the design choices you may have to make, are
likely to have to make, when you design your studio.
For those of you who think I am less creative because I worry about such things, please go fuck yourself. Thank you.
For those of you who think I am less creative because I worry about such things, please go fuck yourself. Thank you.
1.
The place where you do your professional work might be called your
office, or it might be called a studio. A studio can be for one
person or 1,000 people. The work might be your personal artwork,
or your personal financial records or it might be a very expensive
collaborative technology and creative project with a $100M budget.
2.
All of these offices and studios need to have given some thought to
how much protection they need to give their data in case of disaster,
what is the likelihood of disaster, how much it is worth to them to
lose one days work, one month's work, one year's work, etc.
3.
The goal of a so-called backup system is to provide a level of
protection for your data if disaster strikes for any reason, whether
by computer malfunction, act of God, or human error.
4.
No backup system is perfect, but different backup systems provide
different levels of security at different costs, where costs means
varying amounts of capital, costs going forward, attention that must
be paid to maintaining the system, technical expertise and so forth.
5.
A simple backup system well executed is better than a technically
complex system that is over the head or beyond the needs of the
intended user. An expensive or technically complex backup system
that is not well implemented or maintained may be worse than no
backup system at all.
6.
A backup system is holistic. Together it provides a level of protection. If some of the pieces work and some do not, you may still have a level of protection. Thats the plan. But it is better if all the pieces work generally speaking of course.
7.
Backup systems are usually layered, that is, you have more than one
protection so that if one fails you do not lose all data, but can
fall back to another level. Generally this is implemented as a system to improve the reliability of the main file servers combined with discrete backups saved in a vault from earlier periods.
8.
Backup systems are probabilistic. There is a probability of
disaster, a probability that any one backup will not be readable.
No backup system is perfect, but a good backup system will make the
probablility of losing all your data much less likely.
9.
Backup systems must be tested before they are used or you run the
risk of not finding out that there was a problem until it is too
late. This is an extremely common occurrence.
10.
No one but you can judge whether this effort, these costs, and so
forth are worthwhile. Only you know what this data is worth.
and
finally,
11.
I have found over the years that I never had too many backups.
In
a later post we will go over some fundamental design choices and the
kind of risks you will need to protect against.