We're Back(ish)

Discussion in 'Off-Topic' started by tweakmonkey, Mar 8, 2009.

  1. couturedharlot

    couturedharlot couture with a chaser

    Messages:
    1,868
    Trophy Points:
    51
    Location:
    Somewhere
    Just wanted to say "thanks dan!" for all of your hard work getting the forumns back online. i'm lost when the forumns are down. you guys are my big online family :)
  2. MSP

    MSP Haunting a dead forum...

    Messages:
    29,575
    Trophy Points:
    78
    Well, once things settle down I propose we take some time and put together a disaster recovery plan seeing as how this is like the 3rd time this has happened. I'm seated on a council at work that's in charge of putting together a BCP (business continuity plan), so I've got a little experience with it.

    For starters we need some emergency means of communication. A simple email/IM distribution list would suffice, in fact I had a pretty limited one pepared from our last outage. Only had 1/2 dozen people on it though so it was of pretty limited use.

    The database backups you were already doing seem to be in order, but all of the configuration and manual rebuilding? I'm guessing that you had to build the OS from scratch? Anyway, my second recommendation would be that once you gets things back the way you like you create a system restore DVD.

    And finally the completed disaster plan.

    1) Notify user community of disaster.
    2) Rebuild OS and applications with restore disc.
    3) Restore data with data backups.
    4) Notify user community of recovery.


    I would think that once hardware was in place you could have the site back up within a few hours with minimal effort. There's a lot more we could get in to, such as the user account info that was presumably compromised. Maybe an automatic reset of everyone's login credentials, notification of security breach, etc. I've already changed my password, I recommend you all do likewise.
  3. tweakmonkey

    tweakmonkey Webmaster Staff Member

    Messages:
    7,865
    Trophy Points:
    78
    I agree. I do have one emergency contact list: every forum members' email address. I had it at home on my PC, but didn't have it when the event happened (I was 400 miles from home and only had my iPhone). I tried to talk James K through recovering the contact list, but by then it was hopeless because the ISP shutdown the connection to our server as soon as the RSA was involved (reported as a phishing scam being hosted on Tweak3D.Net).

    Next time there's any kind of major outage, I will email every person who's registered at Tweak3D and tell them. I try to avoid doing this without using the VB backend because it emails people who specifically said not to. Also, I'll travel with a notebook and a flash drive with the data required to do this.

    This was done as a preventative measure. We couldn't tell exactly where the problem started, and the administrator insisted I don't make anything live until the latest version of the software was installed. Once I was able to access the server, the forum was up and running in about 40 minutes. The major downtime was because the server was shutdown by the hosting firm and the RSA because of abuse.

    The server never actually crashed.

    The OS was still intact. Just the tweak3d.net/ portion of things (the CMS, forum software, user images, etc.) were so infected with junk/scripts, mailing out 100k+ emails at a time, etc., that we decided it was best to delete them all and gradually migrate it back over. The most recent backup I had was only a few days prior, but I chose to install the latest version of Vbulletin as it was due anyway and I didn't want to re-infect anything.

    The only problem with a single recovery image is that this site changes every hour, even every minute (new posts, signature changes, etc.). I guess we could have a recovery image prepared every week (or even day) but they get pretty big. My biggest worry is losing the data (which I backup daily), and this was an extremely unlucky sequence of events that resulted in more downtime than most others. The problem wasn't the backup, but the hosting's response to the abuse incident.

    The downtime may have been less with a different host, but they've been very helpful bringing things back up at this point. I don't know how other companies might have responded. Maybe that's something you've got more experience with and could chat with me about, as I'm just not sure.

    Yes, this is a good idea. I should do this anyway. I think it's built into Vbulletin to reset everyone's password. And I will send a final "We're up and running email" once I fix the front-loader tonight.
  4. tweakmonkey

    tweakmonkey Webmaster Staff Member

    Messages:
    7,865
    Trophy Points:
    78
    Thanks, it was a stressful weekend, that's for sure. I'm glad it's mostly over now. :)
  5. mistawiskas

    mistawiskas kik n a and takin names

    Messages:
    30,067
    Trophy Points:
    98
    Location:
    Rogue Valley Oregon
    I'm with emily (well not with emily but .....you know what I mean :) ) in appreciating the time and sweat Dan has put into dissaster recovery. Thanks Dan.
  6. MSP

    MSP Haunting a dead forum...

    Messages:
    29,575
    Trophy Points:
    78
    Well, I'd be happy to help any way I can. I could help to handle the communication bits easily enough.
  7. tweakmonkey

    tweakmonkey Webmaster Staff Member

    Messages:
    7,865
    Trophy Points:
    78
    Cool man, maybe we should arrange a soft emergency contact list instead of the big pappa (that emails everyone at once), so the core members are notified. I'm assuming people would prefer this to a blog or something. I don't think we're unreliable enough (typically) to justify that anyway. :D
  8. crowchaser

    crowchaser not to be taken seriously

    Messages:
    2,207
    Trophy Points:
    53
    Location:
    Waukesha, WI
    thanks for your hard work dan, i would thank you but looks like that function isn't available yet