There are many good definitions of Six Sigma on the internet that I won’t repeat. What follows is a brief description of how the Six Sigma project DMAIC project type can be applied to a typical software load testing project.
The Six Sigma DMAIC project begins with a problem definition, or improvement opportunity, together with project goals. For load testing our problem definition will relate to either lack of stability or lack of capability.
The lack of stability problem definition would look like:-
The current “Place order” transaction is not stable as evident by the system response time degrading over overtime.
Note: In this section the results of a previous stability test could be given, which would go further into detail regarding the specific workload and target configurations.
Identify the cause of the response time degradation, address the issue and implement a stable process for the “Place order” transaction.
The above DMAIC project Definition would be more detailed with specific requirements for the workload and environment but basically this type of load testing project seeks to bring an unstable (out of control) process to a stable state. Given a stable process then capability would be the focus of DMAIC improvement projects. For example if the “Place order” transaction was found to be stable but gave an average response time of 20 seconds against a target (specification ) average of 8 seconds then the project definition and goals would address this capability issue. Stability (and capability) DMAIC load testing projects could address measures as broad as overall response time or address more specific issues, for example network bandwidth, memory or CPU usage.
Following the problem definition specific measures are taken to further characterize the issue. This could be omitted if the measures are available from previous benchmarks or production monitoring. All Six Sigma projects are driven from data as evidence of both the issue statement and the improvement. In Six Sigma projects, including Six Sigma for load testing, it is not enough to generalize the issue or goals by making statements such as the current response time is unacceptable please improve it. The Measure phase of the DMAIC load testing project only takes broad measures to characterize the issues and to be used as a reference point for future comparisons (to determine if the project improvement goals were met). The next phase of DMAIC (Analyze) might take more detailed measurements as the root cause of the issue is identified.
The Analyze phase of DMAIC seeks to identify the root cause of the issue (defined in the problem statement). For load testing projects it is useful to produce process maps of the transaction being analyzed. These process maps should break the transaction down to physical units of execution (i.e. User interface, Web server, API code modules, Database SQL’s, and the communication paths between these units of execution). These process maps depict the smallest units of work (including communication lines) that can be isolated for analysis. Following the processing breakdown, the time spent in each unit of work together with the resource usage in each unit should also be analyzed. This step might involve further measures being taken and follows a divide and conquer strategy where the main root cause of the bottleneck (or lack of stability) is isolated. Note the analyze phase is only concerned with identifying the root cause and not analyzing potential remedies (which is the concern of the next project phase – Improve).
Having identified the root cause of the issues (problem statement in the project definition) the Improve phase seeks to make improvements to address the project goals. This phase will potentially involve building prototypes and performing experiments to verify if indeed an improvement was achieved. For some projects (for example lack of stability due to a memory leak) the Analyze phase will be the most difficult as the faulty components(s) are identified. For other projects, e.g. where the issue is a long running SQL query (which is easily identified in the Analyze phase) the Improve phase will be the more difficult as alternatives are investigated and tested. In the context of load testing for Six Sigma projects, an organization should build (and maintain) a library of best practices for their given stack. For example if PHP is being used then, as well as keeping up to date on the latest versions, the load testing architects should continually research counter measures to PHP performance issues such as use of the PHP accelerator. The same applies to databases, web servers and other programming languages. Of all the phases in the load testing DMAIC the Improve (closely followed by Analyze) is where deep knowledge of the subject matter will yield the greatest returns. This is not to diminish the value (or work involved) in the other phases but typically the cause of the issue is readily identified but its resolution requires expert knowledge in the organization’s technical architecture.
The control phase of the DMAIC ensures that the improved process remains stable and capable when running in production. For the load testing using Six Sigma initiative the control phase means continuously monitoring production for performance issues. Control charts, for key performance process indicators, are well suited for this monitoring task. In addition the stability and capability of the overall system’s performance should be monitored and subjected to continuous process improvement. Performance regression testing should be performed as a matter of routine, ideally during nightly Continuous Integration (CI) runs alongside automated functional testing. The results of such monitoring (Control) will feed into other DMAIC project definitions (Define) thus forming a continuous improvement cycle, inspired by Deming’s classic Plan-Do-Check-Act model.