altivo: Rearing Clydesdale (angry rearing)
[personal profile] altivo
It's difficult for me to understand just what this problem is that LJ and SixApart have with power. The technology exists and is well understood, yet power outages that last only minutes leave them fumbling for hours. It seems to have always been this way. Every time, their backup power system fails them. Every time they say they will find out what the problem was and fix it. And the next time it's just the same.

I can understand how an uncontrolled shutdown could make restarting such a large network of machines into a tricky task, but that too needs better planning and should not take so long. The real issue is avoiding the uncontrolled shutdown. It would seem that they can't manage to sustain power for five minutes, even though that really should be all it takes to achieve a controlled and orderly close that would allow a quick and efficient restart.

Date: 2007-07-25 11:32 am (UTC)
From: [identity profile] ducktapeddonkey.livejournal.com
Did you also notice that LJ was clearly not the priority when it came to restoring services?

Although, I suppose one should expect that.

It's funny when I'm working in one of the factories I frequent, and the power goes out. Big automated machines don't always recover well from shutting down like that either. Although all it would take is a few more battery backups that could detect when they've been engaged. But I guess they just don't want to spend the money.

Date: 2007-07-25 11:34 am (UTC)
From: [identity profile] duncandahusky.livejournal.com
Well, apparently part of the problem is that it wasn't just a plain old outage - bang, and then you're off. Since the power was bouncing up and down, I could see how that could wreak havoc. I suspect, though, that heads are going to roll at 365 Main because of this.

Date: 2007-07-25 11:34 am (UTC)
From: [identity profile] shadow-stallion.livejournal.com
Tivo, it actually is not that surprising to me. I have a dear friend who works in that industry and after hearing the horror stories of how places are mis-managed and equipment is not allocated properly this prolonged outage comes as no real surprise to me. Yes, in this day and age things like this should not happen but you have to remember there is a human element and a good many of the managers in charge are idiots.

Date: 2007-07-25 11:44 am (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
Yes, I did notice the low priority given to LJ. That's a puzzle too. Don't they each have their own staff? Shouldn't both be handling recovery operations simultaneously?

Date: 2007-07-25 11:47 am (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
Bouncing up and down repeatedly is the normal mode for an outage where I am. Commonwealth Edison is utterly incompetent too. Even simple UPS systems handle that with aplomb. No, the problem I perceive is a lack of adequate disaster planning and disaster recovery procedure.

Sure, that kind of outage is going to take you down. I don't dispute that. But it isn't that hard to insure that you go down quickly and smoothly, so that you can recover quickly rather than needing hours of fiddling and intervention.

Date: 2007-07-25 11:50 am (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
Oh I agree that budget cutting and penny pinching are factors in this, as well as facility mismanagement and probably even downright lies about what the facilities can deliver or are delivering. However, it still comes back to the LJ software and hardware design, which evidently isn't set up to facilitate quick recovery from an outage. Every time this happens, they need hours to recover rather than minutes, and that's a design failure.

Capitalism...

Date: 2007-07-25 12:03 pm (UTC)
From: [identity profile] avon-deer.livejournal.com
The answer is in the question. This is just part of the problems that LJ has when it comes to dealing with it's customers. The rot set in when SixApart took over the place. When they brought in "sponsored communities" and "advertising supported journals" the alarm bells were ringing quite loudly for most people. Then they had the censorship thing in June 2007; which, however much they deny it, WAS just caving in to fundies. SixApart are gearing the place up for stock market listing, and "streamlining the operation" is as much a part of this processes as "getting rid of embarrassing customers".

Date: 2007-07-25 01:27 pm (UTC)
ext_87: Custom symbol (Default)
From: [identity profile] tango.livejournal.com
I'm of the opinion that 6A, Second Life, Craig's List, Yelp, and the other services co-located at the 365 Main facility need to consider new locations.

Date: 2007-07-25 01:31 pm (UTC)
From: [identity profile] hartree.livejournal.com
They've had this sort of thing happen before. At the time it sounded like much of the reason it took so long to bring it back up was that LJ hadn't been designed at the start to be a high availability system. It was a MySQL hack that took on a life of its own. They've moved to a lot more reliable systems and layout, but trying to keep the various parts consistent in the face of a major failure appears to give them a lot of trouble still. Whenever the whole thing goes down, they seem to take several hours to check consistency across it before it goes live again.

Re: Capitalism...

Date: 2007-07-25 01:34 pm (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
Well, yes, of course, I knew things would go bad as soon as SixApart took over, and the evidence is clear now. I note that LJ got negative priority in the recovery, and I agree with you that the censorship pogrom was inexcusable.

I also notice that Brad himself has been more and more absent where he used to be an active presence. I don't know if he is now a millionaire, but I doubt it. Whatever happened, SixApart has managed to successfully separate him from LJ to the extent that he had no influence on the witch hunt in May and June and isn't even trying to make peace with users now. So in essence, there is no one left there who much cares what happens to LJ. They'll just keep milking it for whatever money they can suck out of it until it runs dry.

The question is, what do we do about it? What can we do about it?

Date: 2007-07-25 01:37 pm (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
They've already moved at least twice in the time I've been an LJ user. The same problems follow, wherever they go. It's undoubtedly a failed priority. Power failures are inevitable. They happen. Not being prepared to deal with one is a failure on SixApart/LJ's part, and can't be blamed on the facility. The failure of backup power systems may be the facility's fault all right, but that's one of the things that should always be planned and prepared for.

The fact that LJ users received NO priority in the recovery, and only SixApart products were targeted for fast reactivation shows us where the corporate loyalties are based. They've demonstrated this repeatedly since the time 6A took over.

Date: 2007-07-25 01:40 pm (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
They've had this sort of thing happen before.

Indeed they have. Repeatedly. And that's my point. Once I can chalk up to inexperience. Twice I can allow because it does take time to resolve all the issues in something this large. But we're way past four or five times. This is now inexcusable, and SixApart's apparent low priority for LJ recovery is evidence that it won't get better. I contend that LJ is now just a cash cow for them.

Re: Capitalism...

Date: 2007-07-25 02:54 pm (UTC)
From: [identity profile] avon-deer.livejournal.com
The only thing we can do is attempt our own buyout. Unlikely of course. We'll just hope that the Facebook/My Space transformation by Salami tactics stops here.

Date: 2007-07-25 03:06 pm (UTC)
From: [identity profile] captpackrat.livejournal.com
I take it these folks have never heard of generators? I find it astounding that a co-loc facility would not have a huge UPS to sustain things until a backup generator could be started.

Date: 2007-07-25 03:25 pm (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
From the veiled comments made every time this happens, it sounds as if the 365 Main facility does indeed have power backup facilities. They must be so poorly set up or maintained that they fail to perform when needed. When I worked in mainframe datacenters that had generator facilities, I learned that those things require constant maintenance and testing. You don't just install them and let them sit until they are needed, or they will fail every time.

Probably it's a larger scale version of what happens with small networks that rely just on battery backup. The UPS is the very last thing considered when upgrades or tests are performed. Consequently, it is often inadequate to provide the necessary power when a need actually arises. Backup is a cost, not a source of income. Increasing your rack capacity and network bandwidth produces a direct increase in income. So... they always put off adding more power facility until it is too late. It's the basic defect of capitalist economics: short term thinking.

Re: Capitalism...

Date: 2007-07-25 03:34 pm (UTC)
deffox: (I see)
From: [personal profile] deffox
The question is, what do we do about it? What can we do about it?

I'm going to do pretty much the only thing I'm able to do. I'm going to let my paid account lapse to free when it comes due in September.

Getting the lowest priority on recovery for this repeating problem shows it's not worth paying for.

There is also a habit to allow things like comment notification emails to completely break down before doing a system upgrade. The 'fast' servers that a paid account supposedly allows still has timeouts. They can't be unaware of load issues.

Powers that B

Date: 2007-07-25 04:04 pm (UTC)
From: [identity profile] gabrielhorse.livejournal.com
I'm tempted to draw a parallel between this and our governments claims about our security and the War on Terror... Nah, I'm in a good mood for a change...

Date: 2007-07-25 04:13 pm (UTC)
hrrunka: Attentive icon by Narumi (Default)
From: [personal profile] hrrunka
A friend pointed me at this report, which mentions 365 Main seem to rely on flywheel systems rather to keep things running just long enough for the generators to pick up the load. He added some comments about "cutting corners"...

Re: Capitalism...

Date: 2007-07-25 05:48 pm (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
Unfortunately, that isn't likely to have much impact. Compared to the advertising revenues they probably get now from the "plus" accounts, paid user subs at only $25 a year are not all that significant. At least, not enough to bother SixApart. My suspicion is that SixApart never really intended to truly support LJ, but rather looked at it as an opportunity to purchase a user population that they could eventually transfer to their own products.

Date: 2007-07-25 06:02 pm (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
Interesting article. I wouldn't consider flywheel generators to be cutting corners, but rather cutting edge. Of course, those were already there when 365 Main bought the facility, so they get no credit for the green alternative. The problem appears to be Diesel generators that didn't start up automatically as they are supposed to do, and that's nothing new. In my experience, those things fail in just that way about half the time. Diesel engines are not exactly "quickstart" gadgetry.

My advice to LJ and other non-essential services that rely on large colocations like this would be to design and implement a rapid shutdown that triggers cleanly at the FIRST sign of trouble. Better to have an orderly down time and a clean restart than rely on crossed fingers and witchcraft to keep you going until the power returns or stabilizes. Even a 15 minute delay in shutdown is usually too long. Because LJ needs some sort of consistent synchronization between its many data clusters, this is critical. If a five minute power outage means a two hour downtime, that's fine. At least a simple restart is likely. Their problem seems to be that they have to manually resynchronize everything before they can allow a full restart and access to the database. I'd rather have a two hour outage followed by a quick restart than an eight hour outage with a lot of people running around firefighting to get it to restart.

Re: Powers that B

Date: 2007-07-25 06:03 pm (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
*nuzzle* Yeah, don't let my ranting ruin a good mood. You can use that.

Date: 2007-07-25 07:57 pm (UTC)
hrrunka: Attentive icon by Narumi (sparks)
From: [personal profile] hrrunka
Flywheels aren't exactly a new idea (the Mullard Radio Astronomy Observatory was using one thirty years ago to ensure that the dishes of the Ryle telesciope could all be stowed safely after a power failure), but they have to be big enough, or they're just a waste of energy. If you're using them to tide you over while your generators get to full power then you'd beter be sure they'll get there in the time they've got...

Date: 2007-07-25 08:04 pm (UTC)
From: [identity profile] duskwuff.livejournal.com
Starting LJ back up was done in parallel with the rest of the services; the LJ team just didn't provide minute-to-minute updates like the other teams did.

Date: 2007-07-25 08:12 pm (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
Exactly. One large datacenter I worked for in the 80s had four backup generators. Two were Diesel, two were gasoline engines. When the power went out, guess which ones started? The gasoline engines were always right on and in time. At least one of the Diesels would always fail to catch.

These days, generators of that sort could also be powered with natural or LP gas, and those I've been around (only a couple) have always started up very promptly and without intervention.

No, flywheels aren't a new idea. However, thinking of them as a green alternative does seem to be relatively new.

The real culprit in this scenario, though, is SixApart/LJ. This problem has recurred enough times that you'd think they'd make a faster shutdown and faster recovery a top priority instead of developing new graphical gimmicks to stick into other people's profiles.

Never trust the datacenter to provide uninterrupted power. Always have a controlled and effective recovery plan.

Date: 2007-07-25 08:14 pm (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
Well, they still look bad then because they didn't at least provide hourly updates and worse because this has happened so many times and they still don't have a faster way to get back up and running.

Date: 2007-07-25 08:43 pm (UTC)
From: [identity profile] duskwuff.livejournal.com
Three words: "Database integrity checks". MySQL doesn't like going down unexpectedly.

Date: 2007-07-25 08:59 pm (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
I understand that. But if MySQL can't be configured to perform those checks more quickly and efficiently, then the underlying structure needs to be re-evaluated. No database likes going down unexpectedly, and I've worked on some gigantic ones, believe me. We never would have considered an eight hour recovery period to be acceptable.

The "going down unexpectedly" part is also preventable. Evidently they continue to trust the power backup hardware to work for more than a few seconds, and it continues to fail. Therefore, they should be shutting down immediately on the first warning, and not crossing fingers and waiting. Even those flywheel generators must give them a minute or two warning. If it takes longer than that to stop all further updates and commit all writes, the code needs redesigning.

Date: 2007-07-25 09:11 pm (UTC)
From: [identity profile] duskwuff.livejournal.com
The flywheel conditioners at 365 Main provide maybe 10 seconds of backup power. It's only supposed to be just enough for the diesels to start up. And due to the way the conditioners work, if you can't get the diesels started within those 10 seconds, you're hosed, because the flywheels are too heavy to start using the diesel engine alone.

Date: 2007-07-25 09:22 pm (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
If the safety margin is that short, then the facility is simply inadequate for a database the size of LJ, and those who say another move is in order are correct. However, I still maintain that LJ is responsible for a design that should jolly well be improved to make this easier.

Date: 2007-07-25 11:56 pm (UTC)
From: [identity profile] cabcat.livejournal.com
Does LJ generate much revenue I wonder, if it's a low priority its always more likely it'll hit the backburner while other more profitable systems are brought back first.

As a free account holder I can't complain about any drop in service, but the people on paid accounts must be chewing their tails.

Date: 2007-07-26 02:18 am (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
Well, I'd like to chew someone's tail, or perhaps better yet, kick their tail good. And I'd start with the CEO of SixApart because I haven't yet forgiven him for that absurd witchhunt he allowed to happen a few weeks ago.

And I'm just a regular paid member. The people who have bought permanent accounts have a right to be really angry. It's not as if this were the first or second time. We've seen the same problem repeatedly, and every time it takes them hours, even most of a day, to recover from it.

Re: Powers that B

Date: 2007-07-26 01:15 pm (UTC)
From: [identity profile] gabrielhorse.livejournal.com
*smiles, closes eyes pleasently* Thanks... I'm beginning to get a lot of help & comfort from people in the outside world, which is a welcome soothing balm for the near constant sting of the whip of Standardized Justice I've been feeling lately. I suppose if I survive this ordeal, I'll be much stronger from it *tenses up as I rotate my shoulders* Uh, it's quite a load, but I seem to be adapting... at least, for now... I'm gonna stay here a while, and doze- don't mind me... *instantly drifts off to sleep*

Re: Powers that B

Date: 2007-07-26 02:42 pm (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
*slips a pillow under your head*

I wish I could be more helpful than just offering advice. I sense that if there was an actual lesson to be learned from your encounter with law, it has been learned all right. Unfortunately, there seem to be many who, rather than learn from such things, resist the learning and just try to "get even." That is certainly a disastrous choice and I'm glad you aren't going that way. Hold onto your resolve and integrity, and you'll find a path.

Re: Powers that B

Date: 2007-07-26 03:42 pm (UTC)
From: [identity profile] gabrielhorse.livejournal.com
*talks in my sleep*
Yeah, I noticed that just a few days ago- I suprise one of the officers I was being instructed by because I was paying attention and asked her a specific question about one of my probation conditions. I threw her for such a loop, she asked me, "where did you see that?" Uh, fourth line down.

I have no desire to cop an attitude... I know that now is the time circumstances have caused everyone to look in my direction with a dubious glare, and I am choosing to show them all the things in me they've casually overlooked all these years. Showing them they misjudged me is far more satisfying than dwelling on hateful spite.

I think the initial lesson I learned during my 35 day incarceration after my initial arrest... I had plenty of time to think. I think if people knew who I was, they'ed see putting me in with a bunch of people who need direction wouldn't suite their purposes very well ;)

Re: Powers that B

Date: 2007-07-26 04:05 pm (UTC)
ext_39907: The Clydesdale Librarian (Default)
From: [identity profile] altivo.livejournal.com
Absolutely. And therein lies the entire failure of the American justice system, "law and order" version. Of course any time I suggest such a thing I'm immediately labeled a "bleeding heart liberal" and henceforth anything more I have to say is automatically discarded.

Without knowing or needing to know anything more than I know (which is little, admittedly) I can tell that you are worth a lot more than you've been given credit for.

November 2024

S M T W T F S
     12
345678 9
10111213141516
17181920212223
24252627282930

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 23rd, 2026 10:35 pm
Powered by Dreamwidth Studios