I was in some situations where things got wrong and a lot of people were affected. Some key takeaways from those situations:

  • Now it pays off if you have a plan B, Runbook, a failover plan, list of dr ready apps, list of business criticality etc. Anything you have prepared beforehand will be very helpful.
  • First priority is to get the system up and running again. Don’t go into a discussion about what went wrong, who is responsible, etc. Focus on the solution and get the system up and running again.
  • Be honest. If you don’t know it, say it. Ask for help and ask questions to clarify the situation.
  • Don’t hope that someone miracle will happen, like that someone walks in and knows everything.
  • If no one can give an ETA, assume the worst case and start working on failover.
  • Make sure you don’t lose anything. If you destroy anything, make sure you have a copy of the info. Make notes, record the screen, keep the history of commands etc.
  • RTA (Root cause analysis) is important, but for later. This will take a lot of time afterwards.
  • Microsoft support is good, though you need to push and emphasize the severity of the issue. Try to get product group involved as soon as possible if its on MS side.

Updated: