Mutation testing is a software testing method which involves modifying the system code in small changes, yielding a faulty system, a mutant of the original one. These changes mimic typical errors that a programmer could have made. The goal is to find weaknesses in a test suite and to develop more effective one.
Mutation analysis is typically applied on program code. However, in this post we discuss mutation testing from design model based testing point of view. This approach mutates the design (or system) model and then compares the mutants with the original model to automatically generate test cases and evaluate coverage.
Mutation analysis process induces faults in to the system by creating many different versions of the system each containing one fault. The basic idea is then to see if the test cases can detect these faults i.e. the test cases are used to execute these faulty systems in order to distinguish the faulty system from the original system. The main objective is to select such a set of test cases that can be used to detect errors. Faulty systems are mutants of the original system and a mutant is said to be killed by distinguishing the output of the mutant from that of original system. In other words, a test case that detects a fault injected thru a mutation kills the given mutant and if the test suite kills all the mutants, it potentially detects many unknown faults in the implementation. Mutants that are not killed are said to be alive and if the mutant produces the same output as the original system, they are equivalent and cannot be killed.
Mutation testing and analysis provides a testing criterion rather than a testing process. Testing criterion is a set of rules that impose some requirements on the test cases. We use coverage as a measure of the extent in which the criterion is satisfied; for example reaching a statement is a requirement for statement coverage while killing mutants is the requirement for mutation. Coverage is measured in terms of the requirements and it is defined to be the percent of requirements that are satisfied. Testing criteria gives requirements of how much testing is actually needed and we use mutation testing as a way to measure the quality of the test cases.
Mutants represent likely faults that the programmer could have made by a mistake. This is first of the two hypothesis in which mutation testing is based on and states that most software faults introduced by experienced programmers are due to small syntactic errors. Therefore each mutant introduces a very small change, i.e. a fault, to the system compared to the original one. Mutants are limited to simple changes to the system on the basis of coupling effect–our second hypothesis–meaning that complex faults are coupled with simpler ones in a way that test cases that detect all the simple faults will also detect the complex faults. Mutated versions are created by applying mutation operators that are rules applied to the system to create mutants. These are simple syntactic or semantic transformation rules: deleting an assignment expression and replacing or inserting new operators to create syntactically legal expressions are examples of typical mutation operators.
The mutation score (or mutation adequacy) is calculated once all the test cases have been generated. The mutation score is the ration of dead mutants over total number of non-equivalent mutants:
mutation score = number of killed mutants / total number of non-equivalent mutants
As with any other testing criteria, the goal here is to raise the mutation score to 100% which is indication that all the mutants have been detected and covered by the test cases. This ratio can be increased by adding more test cases that detect live mutants: live mutants reveal inadequacy in the test suite. Equivalent mutants are not counted in the mutation score (Detection of equivalent mutants is one of biggest obstacles for practical usage of mutation testing. This, in general, is undecidable problem requiring trace inclusion check or constraint resolving to find contradiction). It is worth noting that mutation score gives some indications about the extent of the test suite even if a tester manage not to find defects in the implementation using the set of test cases.
Mutation Operators
A mutation operator is simply a syntactic or semantic transformation rule like replacing operator == with !=. The mutants are generated by applying these mutation operators to the original system. Application of these mutation operators should model potential faults that a programmer could have accidentally make, therefore it is of great importance to identify different kinds of faults. Note that ultimately the number and type of mutation operators influences the number and extent of tests that are produced.
Mutation operators are limited to simple changes to the system on the basis of coupling effect. This says that simple faults are coupled with complex faults in a way that test suite that detects simple faults will also detect complex faults. This was first hypothesized in late 70’s and demonstrated theoretically in mid 90’s.
Typical mutation operators are for example replacing Boolean operators in expressions with different syntactically valid Boolean operators, deleting assignment statements, deleting transitions in state chart diagrams, etc.
An application of a mutation operator is probably best explained by an example. Assume that we are modeling a behavior for checking that an integer field of an input message must hold a value that is greater or equal to 0. This we model with a simple if statement as follows:
if (msg.value >= 0) { YES(); } else { NO(); }
Let’s further assume that we have two test cases for testing this functionality, one with value -1 and another with 1. Given the correct logic, we should see NO() being triggered when the system is stimulated with value of -1 and YES() with value 1.
As mentioned earlier, the idea and intuition with mutation testing is that we aim to mimic typical programming errors. The programmer of the application might have for example accidentally typed > instead of >=. Or she could have written ==. The idea now is to find these types of programming errors and make sure that the application is implemented correctly. Now let’s assume that the programmer accidentally typed > instead of >= as follows:
if (msg.value > 0) { YES(); } else { NO(); }
If we run our two tests against this faulty system, we see the same behavior as with the correct logic. That is because the application behaves exactly the same given these two values. It is only when we stimulate the system with value of 0 we see a difference in the behavior. That is, when system is stimulated with 0, the correct logic triggers behavior YES() while the faulty one NO().
Mutations for Test Generator
Mutation analysis is usually applied to actual program code to automatically validate tests and in some rare cases even to generate unit tests and like. However, mutation analysis can be also applied in specification or design model driven automatic test generation. This approach mutates the specification instead of real program code and then applies techniques to automatically design test cases to compare the mutants with the original specification to automatically design and generate tests and to evaluate coverage.
When mutation analysis is applied for design model driven automated test design, it can be used to (1) validate test cases and/or to (2) generate test cases.
Essentially when mutation analysis is used to validate test cases we first generate test cases without any mutants. Then mutation operators are applied to the model in order to generate mutants. Test cases are then executed against on each mutant and if the output of the mutant differs from the original output (the correct one), the mutant is marked as being dead.
When mutation analysis is used to generate test cases, the test generation tool also generates tests that kill the mutants.
After these steps, we need to identify equivalent mutants (i.e. those mutants that cannot be killed due to the fact that they are semantically equal and they produce the same output as the original system) in order to provide a precise and accurate mutation score. This, in general, is undecidable problem requiring trace inclusion check or constraint resolving to find contradiction. Finally, mutation score is calculated and presented.