Ouch! I just spoke with one the Bluehost tech support engineers, inquiring about the mysterious outage of not just the NITA Online and the NNIC servers, but even its domain name servers, which is quite rare, indicating a major incident. Well, I wasn’t off the mark: indeed, Bluehost has been hit with an unprecedented power outage, knocking even their own computers offline. The tech support engineer himself didn’t have access to a computer; it also explains why they couldn’t put some alert up on their server status page… Some extremely unusual collision of circumstances must have taken place to reach that level of snafu, given the multiple layers of fall-back assets and redundancies that their datacenter has. I surely hope that they won’t be forced to reach back to a system-wide restore of regular system backups (typically those run on a weekly cycle) so they won’t have to contend with too much fall-out, and indubitably nervous customers. But I sure as heck am curious as to the circumstances of such for a datacenter catastrophic outage.
So, let’s just sit tight and keep our fingers crossed for an expedient resolution, as their team is working hard on getting everything back on its feet again.
Update: it looks like it was a controlled shutdown on account of Provo City Power, who are (ahem) replacing Bluehost’s transformer. Yay foresight, planning and proactive communication (e.g. via their own Twitter account)
Update2: the transformer was replaced somewhat hastily as it appears the old one had a bad case of being on fire. Meaning, there is little foresight could have done here. Except, of course, using their Twitter account to spread the good news of being back at around 11pm PDT. And not leaving Twitter languishing to become a point of mockery, such as this little gem in their one and only tweet to date. I’m not kidding, this is what it says:
Bluehost.com is embracing twitter
As I said at the beginning of this post… Ouch!
Update3: so the greatest mystery up to this point – i.e. how could they man the phones in the middle of darkness – has been solved: they were allowed to keep the UPS online for their phone system. I’m not sure people understand what is involved when a transformer has to be taken offline, though: on Twitter tons of FUD and boiling emotions fly past the fact that you have to take the entire system offline to swap out and reconnect the new transformer. And no, they’re not $800 apiece.
Update4: the Bluehost team has manned their Twitter account, at last. One of the first tweets is an understatement:
@hawaiihypnosis I’m sorry that we didn’t warn you that transformer was going to explode, but we are working on a new predictive algorithm 🙂
I don’t think the predictive failure is much of an issue, the somewhat wanting zest to jump onto Twitter and address the wailing and teeth gnashing masses is. No really it’s 2010!
Update5: what I find remarkable is how three hours today can be unacceptably long for communication silence, while just five years ago it would have been mostly fine. As I side in my previous update, it is a very different social media dependent (or is that addicted) world. This isn’t Kansas, Toto…
Update6: the last before bedtime… as I suggested to Bluehost, I hope people realize how much they’re vulnerable to the stuff happens reality. It’s not too hard to set up a crontab on the server, to roll a daily backup of the database(s) and then fork it over to a remote host, and/or send it via email (e.g. Google mail, with its insane storage capacity) so as to always have a recourse. Typically, static files are easy to backup, every now and then via FTP; it’s the DB that is the problem, especially when you’re running a site with scripts like WordPress or Joomla or Drupal or Movable Type or whatnot CMS. It would be good to set up a tutorial on how to do that; if only to lessen the almost inevitable panic if/when the site goes down.
Update7: (yawn – barely awake) it’s now about six hours since my last update (9/18 @ 6:10am PDT) and meanwhile, the Bluehost datacenter has been powered up again, so the sequence of booting and bringing all services up again has been in full swing. Undoubtedly there will be some unfortunate few who will have exert some more patience, but at least the overwhelming majority of sites is back up again. And it’s Friday. Woohoo!