Let’s say you have a server. This server is full of people and books and lots of other stuff. Now let us say that the building that powers your server has something wrong with it resulting in frequent power outages. These power outages occur typically when the building is not open to the public. These power outages constantly put a drain on the UPS batteries that kick in when the power goes out to the point where their charged lifespan is not nearly what it should be.
Essentially we’re playing Russian roulette with our servers. Last week the system finally managed to find a bullet.
Ok, not a problem. We have backup images? Oops! No we don’t. A previous outage (in February) took our backup service and it never came back. Pulling out some major trickery (with the aid of Microsoft Tech Support) our IT guy manages to pull the important database information off the D: partition of our raid array (the C partition and our whole OS is toast, I seriously don’t want to think about what would have happened if we lost the D:). Once we’re sure we have the data we begin to restore our system from old backup.
This takes nearly 48 hours. Once we have a viable OS and “stable” server we contact the support staff for our ILS (Integrated Library System, which store and records all the people, books, and their interaction) vendor so they can get in restore everything. Which they do…except they don’t add in any of the data we salvaged from the server (meaning our system was missing a month’s worth of data). We contact them again and they realize their mistake and have another go at it….and do it wrong again. Finally on the third try they manage to get it right.
At this point the library has been running checkouts and checkins in an offline mode (it logs the transaction for upload once the system is restored, handy but not perfect) for 4 days. By the time everything is back in, we’ve reconstituted the multitude of reports that make the whole system run it is roughly 3 o’clock on Friday afternoon. We finally bring everything online and upload the transactions (with a multitude of errors) and are finally back up and running.
Of course there are plenty of inconsistencies as a result of those errors but they are at least manageable. I should also mention that we’ve had power outages 3 more times since Friday and our phone system just came back today. We apparently have some sort of Magic Power Device ™ on order from our electricians that will tell us why the main breaker keeps getting tripped. In the meantime we are left in the wind and once more spinning that chamber.
Now I’m getting the hell out of Dodge for a couple of days. Tomorrow I head north to the icy climes of Boston to enjoy 3 days of geeky nirvana with some friends at PAX East 2013!
I’ll be back next week with some reviews and hopefully some thoughts on my 5th PAX experience (2 Seattle, and 2 Boston before this).