So-called software glitches often take the blame when a major technology failure occurs.
That was the case on Wednesday as the New York Stock Exchange halted trading for nearly four hours because of an “internal technical issue.” The shutdown came on the heels of “network connectivity issues” that grounded United Airlines flights earlier in the day.
In the immediate aftermath of high-profile events like these, it’s not always possible to pinpoint a single cause. There is usually some detective work to do, and it becomes the job of internal IT departments to understand what went wrong, and more importantly, how to prevent costly major outages in the future.
Why Software Testing is Critical
NYSE and United were both quick to point out that their respective shutdowns were not the result of a security breach or outside attack.
In reality, many software glitches that bring critical systems to a halt are internal—and preventable. That’s where enterprise-wide quality assurance and testing best practices come into play.
NYSE officials say they were deploying new software that seemed to have compatibility issues with other software used for trading. While I don’t have insight into their QA practices, it’s important that changes to software or hardware be properly tested in a in a representative test environment with appropriate test data.
When the test environment and test data don’t adequately match the production environment, the data may not flow as expected in real world conditions and a system outage can occur.
Without proper planning it may be a difficult and time consuming process to back out the problematic changes. When the systems handle high transaction volume, like the NYSE, it can make the issues more difficult to sort out.
Tips to Prevent Outages
Many of my suggestions below may seem like common-sense, and they are! But many large companies still underestimate the value of quality assurance and testing, and thus put themselves at greater risk for being the next United or NYSE.
Here are 5 recommendations to reduce the risk of software outages.
1) Create a comprehensive test plan that includes a logically similar test environment and realistic test data under normal and high transaction conditions.
2) Automate regression tests to quickly and consistently ensure that changes to the system don’t negatively impact existing functionality.
3) Make business critical software changes during maintenance windows or slow business times if possible.
4) Plan and build a back-out plan for situations when you need to roll back to a stable environment.
5) Work with professional software testers and avoid cutting corners by using lesser-skilled resources. Experienced testers supported with sufficient resources will always save money in the long run by preventing costly software outages that can debilitate your company.
If you or your company need expert consulting around quality assurance and testing, I hope you won’t hesitate to reach out to me here at Run Consultants.
Mike Cooper is the VP of Quality Assurance and Testing for Run Consultants