Amazon flood gates open.

AWS LogoTwo huge new features were announced today for EC2. The first being Elastic IPs which is basically the static IP solution everyone has been waiting for, but better! Elastic IP is a 1:1 NAT solution. What is so cool about this is you can dynamically remap your static IP to different running instances creating a poor mans HA solution. The second feature is Availability Zones. This allows you to launch instances in isolated zones that amazon describes as “distinct locations engineered to be insulated from failures in other zones.” The next step to this is allowing for region specific selection as well, currently you are limited to selecting a zone within your defined region based on your account. This provides for a huge increase in availability and will certainly make organizations take another hard look at what amazon has to offer to extend or augment their existing facilities.

What we can all learn from the Amazon outage.

I didn’t write about the Amazon storage service outage here before now but I have been thinking a lot about what we all can learn from it. First a few details; The amazon S3 storage solution had issues from 3:30am PT to 6:48am PT on 2/15. The issue manifested itself in a “large” increase in authenticated calls to the S3 service. The real problem is the team didn’t know this was coming until it was to late. To resolve the problem the Amazon Team moved additional capacity in to handle this increase in authenticated requests.

I can certainly feel for the Amazon team, being caught off guard is NOT a good feeling. So what monitoring is missing from your environment? This should be an opportunity for all of us to think about the little service that everything relies on and could cripple the environment. Monitoring, trending and basic capacity planning is critical to the health of all our applications. We have been working much more closely with out engineering teams then ever before to instrument all parts of the applications supporting our sites via JMX. Call it what you want and I don’t like the word but it feels like a good time for a basic monitoring audit.