February 18th, 2008 — Capacity Author: Joe
I didn’t write about the Amazon storage service outage here before now but I have been thinking a lot about what we all can learn from it. First a few details; The amazon S3 storage solution had issues from 3:30am PT to 6:48am PT on 2/15. The issue manifested itself in a “large” increase in authenticated calls to the S3 service. The real problem is the team didn’t know this was coming until it was to late. To resolve the problem the Amazon Team moved additional capacity in to handle this increase in authenticated requests.
I can certainly feel for the Amazon team, being caught off guard is NOT a good feeling. So what monitoring is missing from your environment? This should be an opportunity for all of us to think about the little service that everything relies on and could cripple the environment. Monitoring, trending and basic capacity planning is critical to the health of all our applications. We have been working much more closely with out engineering teams then ever before to instrument all parts of the applications supporting our sites via JMX. Call it what you want and I don’t like the word but it feels like a good time for a basic monitoring audit.
December 12th, 2007 — Uncategorized Author: Mistamista
May 4th, 2007 — Stuff Author: Zach
Today is Friday. No-change Friday, to be more specific. And not because we don’t like to work (although I’ll confess that we like to slack as much as the next guy). Today is no-change Friday because Fridays are special. Fridays are for Sneakily Making Crappy Things Better.
Now, that might just lead to a whole host of new questions in your mind. Things like “Why are things crappy,” or “Why would you have to sneak improvements.” Well, if you have to ask any of those things then you’ve probably never worked in Operations.
Things are crappy here for the same reason they are everywhere else–our work is never “done.” Projects begin and end, events come and go, but no matter what we still have work to do. The core services we support are still around and they can be better. Now, that’s not to say that they’re in such bad shape today; crappy is a relative term. But they can always be better.
And as for the sneaking, well, sometimes there just isn’t enough time in the day to get everything done. Any there’s nothing sexy about fixing old problems or iteratively improving performance. So, you do it quietly and while no one is watching.
And that’s what Fridays are for. Spending time trying to make things better. Sometimes that’s adding some fancy new monitoring widget, other times it’s sitting down as a team and talking through some piece of architecture. This afternoon it seems to be finally updating this blog. So, whatever you do today ask yourself: am I sneakily making crappy things better?
March 24th, 2007 — Links Author: Joe
Well I don’t want to make this blog about aggregating others blogs but I came across a great post regarding metrics this afternoon that I think we should read.
How and what to measure via Ask the Wizard
March 13th, 2007 — What? Author: Joe
Trending is all the rage and rightfully so. To put it as simply as possible this allows us to know what is going to break before it does. We have been using tools like cacti and HitBox for some time now to try and predict what to expect in our upcoming events.
One thing I have always struggled with is the bigger picture. What does the trend look like over the year or into next? Are we, as a whole, gaining ground or loosing it? What makes up those metrics? A new tool similar to Amazon’s Alexa has come out recently called compete.com Compete really isn’t that different but it does something important for me. It reinforces the data that Alexa has had a lock on for years. Using this data I can confidently say yes it does appear we are down 18% year over year.
The next step in all this is translating that into operations speak.. So this means we can reduce our footprint by 18%? Well no not exactly we are serving more dynamic content then ever and oh did we mention the new video service launching next week.. This is in part what makes operations interesting trying to find better ways to help scale but with reduced cost, smaller footprints, and oh right less POWER. Sounds like a topic for another post…..
March 12th, 2007 — What? Author: david
Good movie quote, recently brought to me by a colleague when discussing the virtues of accountability and owning the health, maintenance and restoration for systems we are responsible for. Broader topic was “monitoring.”
I don’t expect magic, in buckets or within the context of goodness. I do expect that if we are really empowered to engineer and manage systems, we need to ensure they are built correctly and alert us on state and utilization. Empowerment = given the funds we request to build and instrument correctly. Thus far, I haven’t recieved any resistance to properly teed-up requests.
Thus…a good quote: “I don’t wanna hear about no motherf***in’ ifs. All I wanna hear from your ass is, You ain’t got no problem, David (Jules). I’m on the motherf***er!”
March 11th, 2007 — Stuff Author: Mistamista
Operations Catch Phrases:
4. That’s just like that Seinfield episode where Jerry said…
3. Mainstreaming Web 2.0 Trends
2. I’m no network engineer, but…
1. Magic buckets of goodness
March 11th, 2007 — Non-Events Author: Zach
I woke up this morning and my alarm clock proudly displayed the time. I was so bummed.
I looked across the room at the digital atomic clock + thermometer on the bookcase. Perfectly correct as usual. Damn.
My last hope: the old-looking analog clock that actually has an internal Atomic radio receiver. It too, was right.
I had been so convinced that this was really going to be it. Y2K was such a huge letdown; I really thought that this time would be different. This time it wasn’t some simple 2-digit vs. 4-digit date thing. THIS was Daylight Saving Time! Something this sacred simply had to wreak havoc on our technology. How would it deal with this change? Surely my TiVo was going to record the wrong show, the power would go out, and my car would not start. Yet to my ever-living disappointment, all that happened was that my heat turned on an hour late so the house was a little cold when I got up this morning. So that’s how I began my day: coldly disappointed.