The Disaster Disaster Recovery Plan

Mushroom CloudOnce upon a time, I worked for a company that put out a mandate of updating our disaster recovery plans.  Seeing as how most of the company didn’t have them, it really meant creating disaster recovery plans.  Due to a confluence of events, I happened to be the senior guy in our department who wasn’t a manager, and so it fell to me to craft our plans.

As with any good company, we did have some preparations in place even if we didn’t have a formal plan.  We had an off site backup data center which we could switch over to should a service at the primary site go offline.  And in that site we had an approximate copy of our primary site.  Approximate in that the intention of the backup site was to limp the company along until the primary site could be restored, not in that the backup could become the new primary.  So where the primary site had 3 servers doing a task, the backup site had 1, enough to do the job but not enough to do it without frustration.

So there began a series of meetings between departments as we began updating and creating our recovery plans, budgets were outlined and all the ducks were put in a row.  In one of these meetings, I noticed a flaw in one of the other department’s plans.  I brought it up, but since I’d only been there a year and was dealing with people who’d been with the company for 5, 10, even 25 years, I was ignored.  I was told, “You don’t know what you’re talking about.”  To that end, I decided to be sure I was right.  I began researching the other plans and looking for flaws.

My own department’s plan was simple: money.  We needed to spend the money to duplicate our server functions.  I called the companies we licensed software from and got the okay and licenses to maintain a fail-over site without spending any more money on licenses.  I wrote up hardware orders so that where we had 3 servers doing a task at the primary location we would have 3 servers waiting to take over if the primary failed.  My plan done, I had plenty of time to look at other people’s work.  So I did, and I made a nuisance of myself, sending emails and showing up for meetings I wasn’t invited to in an effort to actually make our company’s disaster recovery capable of recovery.  Eventually, I got told to stop.  The message was clear.  I was to focus on my own plan and leave everyone else alone.

I dove back into my disaster recovery plan.  You see, because of the flaws in the other plans, my original plan wouldn’t work.  It had to be redone.  I went to my boss and made one request.  On presentation day, I wanted to go last.

The day came to show off our new plans.  I sat in the back and waited through each department.  One by one they went to the podium and showed charts and laid out plans that illustrated they were well on their way to being ready, each department patting themselves on the back.  Finally, it was my turn.  I got up and started handing out my plan.  It was very short.  A cover sheet and then just two pieces of paper beneath that.  As I made my way through the room people began muttering to each other.  I got to the podium and said, “If you will turn to page one of the packet I’ve handed out you will clearly see the full extent of my disaster recovery plan.”

It was a copy of my resume.

“If the primary data center were to go offline, I would, in reaction to this disaster, begin sending out copies of my resume in an effort to find another job, because I certainly wouldn’t want to work here anymore.”  I could see my boss turning red with rage.  I could also see the managers for other departments shooting dirty looks at me.  Then I opened up my PowerPoint presentation.  I quickly showed the single page of my real disaster recovery plan: buy servers, install software, use extra license keys I’d already obtained.  Then I showed how my plan would still fail due to a flaw in Department X’s plan.  Then I showed that without fixing another flaw in Department Y’s plan, Departments A, B and C would fail.  And then I showed how Department M had overlooked a critical piece of hardware for which there was no backup and rendered everyone moot because the only working mainframe terminal in the backup site would be the one hooked directly to the mainframe.  Their plan actually had them unhooking a piece of equipment, loading it on a truck, and driving it nearly two thousand miles to the backup site, rather than actually purchasing a duplicate – probably because it was extremely expensive.  “So, as you can clearly see, my only reasonable course of action – since I was instructed not to involve myself in the affairs of other departments – is to find another job.”

The fallout from that meeting was huge.  First, I got yelled at.  Then, I got apologized to as they discovered I was right.  Eventually, new plans were drawn up and big money was spent, but our recovery plan was actually capable of recovering from disaster.  To date, that company has not had a disaster from which recovery was needed, but that’s not the point.  The point is that each and every department concerned themselves only with their own particular areas and no one had been assigned the task of looking at the places where they relied on another department.  Each one was happy to be able to say “We have a back up for our functions” and didn’t bother to examine if slot where their tab was supposed to insert was being covered, they just assumed it was someone else’s responsibility and it would be handled.

Since then, I’ve always tried to make sure I keep an eye on the big picture when I do things.  And I try to be open to suggestions and/or criticism from others on the off-chance that I’ve missed something big because I’m too close to it.  Outside that, there is no point to this story other than I just like to share it.

Leave a Reply

Your email address will not be published. Required fields are marked *