Network Maintenance: Dec 17, 2008 02:00 - 04:00 UTC/GMT: lj

dwell in lj_maintenance

Network Maintenance: Dec 17, 2008 02:00 - 04:00 UTC/GMT

Dec 16, 2008 10:14

EDIT3: Maintenance for tonight is done as of December 17, 0400 UTC/GMT. Thank you everyone! Since the right parts didn't come in, we'll try to do the final bit of maintenance tomorrow. Even though we're having the parts overnighted I'm not sure when exactly they'll arrive. Once they arrive, I'll put up another post but it'll probably be within an hour of us doing the work.

EDIT2: The correct parts did not arrive in time. We will proceed with part of our network maintenance but will have to take another maintenance window tomorrow. Sorry everyone! The times for this window will *not* change though as we'll take the time for extra testing.

EDIT: To convert UTC/GMT to your local timezone, you can check out http://www.worldtimeserver.com/convert_time_in_UTC.aspx or http://timeanddate.com/worldclock/meeting.html. There's lots more sites out there if you don't like the above 2, just search for something like "convert UTC time zone" in your search engine of choice.

The site will be intermittently unavailable during the times of December 17, 02:00 to 04:00 UTC/GMT time.... which is about 7 hours from this posting!

Sorry for the relatively late notification, we're still waiting on some last minute parts to arrive so it's possible that this might not even happen today; I'll edit this post with updates as well as status.livejournal.org so check there for what's going on if you can't reach the main livejournal.com site.

Our main goals tonight are:
1) redo our core network infrastructure for performance. Parts didn't come in time.
2) enable BGP with another ISP for redundancy.

Remember right after the move when LiveJournal was really slow in loading pages? A large part of that was due to choosing (ok, ok, *I* chose... < insert chagrined and very sad face here >) a switch module that had gigabit interfaces but *shared* a gigabit connection to the backplane amongst every 4 interfaces! That's akin to having 4 large pipes funnel down to just 1 large pipe, which equals a big bottleneck as our traffic all of a sudden fights for effectively 1/4 of the capacity it needs. Input queues were constantly overwhelmed and packets were getting dropped on the floor and having to be retransmitted; it was ugly.
We had a temporary work around where we bypassed these modules and plugged the distribution switches directly to each other (ether-channelled/trunked), and ran just specific VLANs. That seemed to work ok, but we need to get rid of these cables running back and forth and go back to our original design.

BGP. Well, we had uRPF checks enabled but they were in strict mode, and we were getting asymmetric routing going on with our use of path prepending inbound and local pref outbound, so I really think loose mode is what will solve the problem of some of you not being able to reach us when we were multi-homed.