Test generation from system models is computationally very hard: Just generating input sequences that cover all the statements of a system model is theoretically an undecidable problem, meaning it can be never solved completely. This does not mean there couldn’t be an algorithm that handles most of the industrially relevant problem instances, but it gives a hint that it’s a challenging engineering problem producing such an algorithm.
Over the years, one of the single biggest complaints against using computer to generate tests from a system model is that it simply takes too much time or even worse, nothing ever comes out. We’ve had our fair share of these complaints, however, rest assured that scalability issues are taken very seriously and we are constantly investing time into research and development of more efficient algorithmic approaches to automated test design.
In this post I’m detailing a bit of what is happening “under the hood” in the actual test generation core and how Conformiq has evolved our flagship product Conformiq Designer to meet the needs of the industry.
The core of Conformiq Designer is a semantics driven, symbolic execution based, test generation algorithm. The algorithm traverses a part of the (usually infinite) state space of the system model. The explored part in itself is also infinite, but yet is only a part of the whole state space. Conformiq Designer searches this state space part for testing goals which are called “checkpoints”. The number of checkpoints in a model depends on the testing heuristics selected by the user. For example, selecting “transition coverage” causes each state chart transition (an arrow) to become a checkpoint. The tool then constructs tests from the explored part of the state space by selecting paths that lead to checkpoints then converting those paths to tests. Every input on an execution path (to the system model) becomes a test stimulus (to be sent to the real system), and every output of an execution path becomes an expected output (to be verified during test execution).
Now, because system models are “open” (meaning they communicate with an unspecified environment, i.e. with system model driven approach, the user doesn’t need to describe the environment of the system or the tester), the tool needs to generate inputs to the model so that it can be explored. One could generate the inputs “randomly” like some random testers do, but this is not a scalable approach which is very easy to demonstrate. This is why Conformiq Designer uses an algorithmic approach called constraint solving to efficiently handle the unspecified inputs. In addition, this is why the approach is called symbolic execution – the input messages are internal within the tool represented as symbolic values that are only fixed later by applying constraint solving.
Now the flip side of the coin is algorithms that are based on state space enumeration are all computationally difficult and typically suffer from a problem known as state space explosion. This means that the state or search space gets astronomically large compared to the size of the program itself. For example, a simple program that just increments a 32-bit counter sequentially has a state space of 4.3 billion states. A symbolic state space exploration procedure, like the one in Conformiq Designer, alleviates part of that problem but does not solve it completely. Even worse, the actual state space of a Conformiq Designer system model is typically infinite, even though in practice it does not matter whether it is mathematically truly infinite or only extremely large.
Conformiq Designer’s algorithm selects which paths to expand and to what extent and has been carefully tuned during the last decade with details that are a trade secret. The algorithm has also been carefully crafted for not only multi-threaded, but for fully distributed, parallel test generation algorithm that generates the same set of test cases deterministically, regardless of the number of processor cores, their speed, and load. This parallel test generation algorithm parallelizes relatively efficiently, providing savings of up to 90% of test generation time by scaling from one to sixteen processor cores, and even more when deployed on a computation cluster with tens or even hundreds of cores. This is important as it improves the productivity of model driven test engineers by cutting down wait time before seeing their newly generated tests.
Every now and then we still run into models that cause a lot of headaches for our technology, however not that often. Already the algorithmic complexity, as discussed above, is such that we will never have an algorithmic approach that can solve all these problems, but for what it seems right now, we have come a long way getting the core to be strong enough to handle the majority of real and practical industrial problems – which I think is a quite an achievement.