Some Observations on LJ’s Downtime

[This entry was actually written late Friday night, January 14, 2005]

By now, everyone knows what has happened at LiveJournal. There is an interesting discussion about it over at Slashdot (I really think we should LiveJournal-them). Some interesting quotes:

A great disturbance in the Force… (Score:4, Funny)
by YowzaTheYuzzum on Friday January 14, @07:59PM
… as if millions of teenage girls suddenly cried out in terror and were suddenly silenced.

According to Brad (user bradfitz at slashdot), all the servers came back up when the power came back. However, they intentionally don’t have databases come back up on boot because if there was a blip, they want to do an integrity check first. As of 8:39, all of his whiteboards were full of boxes of each database cluster, the machines in that cluster, which have passed their checksum tests, which replayed their replay/undo logs, where in binlogs each was writing/reading/executing etc… He indicated that he didn’t want to put a machine back in and find out in a week there was a database page that was corrupt because the battery-backed write-back cache on the RAID card didn’t work as advertised.

I’m glad the Brad and his crew have put so much effort into this; this user thanks you.