So yesterday we were wracking or brain trying to figure out where a 300% request per second increase to an app only seeing a 30% page view increase was coming from. We started with “why is the DB so slow” following our rules, but soon realized something else was going on. One of our engineers, while using fiddler, noticed an error in the flash that on mouse over made a call to / or the root of the app for no reason. The way the app was laid out this would account for a huge number of requests, somewhere in the neighborhood of 3000/sec at peak that were unnecessary.
This got me thinking what kind of QA would find this, is it peer review, classic code review including the design portion, or should this be part of our role? We run our shop very similar to a startup as it is primarily event driven so we don’t have the classic development cycles clearly defined. What this did show me is designers are designers and developers are developers while many can do both sometimes it really is best to separate the functions.
In our org I believe we should have a technical qa team that works with the operations team ripping apart and through the final product from an engineering and technical production standpoint. I think this would provide the best level of accountability on the two teams and formalize the release without sacrificing the startup feel. Of course we wold need to officially work this into the time line but would leave the core teams focusing on building the best possible products.
How can you know when something is about to go wrong if you can’t see it?? We finally closed the loop today on some MSSQL trending we have been missing for a very long time. Being able to watch things like table scans/sec, batch reads & writes/sec, and transactions/sec is huge during an event. As much as we drill into folks heads the importance of communicating changes, it is still to easy for a simple change to have unexpected impact on something like a DB. As I noted the other day it is almost always the DB or the file system and while we have our share of issues that aren’t many times, we have chased our tail due to lack of trending on the DB a lot and in the end it has been something stupid like an index got dropped.