January 24, 2020

The Art of Embedded System Testing

In my life I’ve worked on many software systems including cloud and enterprise level data management, but my favorite ones have been the Embedded Systems. Such systems present lots of cool problems to solve but perhaps the most intriguing are those related to testing them. This post will deal with the art form that is Embedded System Testing.

Testing is often mistaken for a science

I once heard a comedian tell me that: “The easiest person to jive is yourself” and it is one of the grains of truth I have sifted out in my life. Developers often jive themselves into believing that simple recipes are all that you need to follow to get to well tested code.  These include:

Unit test everything you do

Do continuous integration testing

100% test coverage

The fact is that none of these, in and of themselves or in aggregate, will achieve a completely tested system. Actually, a completely tested system is unachievable for any moderately complex system.

To understand why these things are insufficient it is necessary to understand the actual problem that is faced. Early on in the development of software programming concepts, academics postulated and proved such things as the Halting Problem and Rice’s Theorem that asserted that any moderately complex system cannot be proven to be without defect. No matter how much testing is applied to a system, there will always be a finite probability of undiscovered bugs within it.

The problem comes down to the number of states. Each new variable creates a new dimension of states within the system with a range in accordance with its capacity. Each new decision block creates new transition pathways through those states. When taken in aggregate, the number of possible state transitions in a system can easily exceed the number of  fundamental particles in the universe. Exercising the entire set of possible transitions is well past the limits of mortality.

Within this set of state transitions many types of bugs can exist. There are common logical bugs where the correct logic is not encoded. In embedded systems especially, there are also asynchronous bugs that arise as the timing between threads and events vary. Moreover, since embedded systems usually run continuously, there are resource exhaustion issues like running out of memory or thread handles that also arise.

To try and isolate such defects, many forms of testing have been created such as:

  • Unit testing
  • Integration testing
  • Black box testing
  • White box testing
  • Coverage testing
  • ...

Each of these cover various ranges of the state transition matrix, but all fail to cover the entire set of possible defects given the vast size of the matrix.

Even coverage testing which on its surface seeks to cover every line of code ultimately cannot cover every line in every possible state. Going past a specific line of code may only express a bug in a small set of states but most coverage testing only attempts to get past each line once.

Thus it is my assertion that testing should not be reduced to a strict science with rules and formulas to follow. A good test suite is more of a work of art, where there are good and bad practices but no ultimate. In the end, you can find beauty more easily than you can find truth.

Economics of testing

To this point we have covered that it is not possible to test against all states of the system. However, it is clearly important to test some set of those states.  To understand where scarce resources should be applied, economics is a good tool to bring in.

There are finite but varied levels of cost that can be associated with any particular defect in a system. Some can rise to the level of very costly lawsuits, while others just present themselves as cosmetic blemishes. Still each has some cost that can be associated with it. It is often a good practice to understand what are the most costly bugs that might occur and bring specific focus to those.

At the same time there is a cost to testing for any particular type of bug within the system. This can vary from simple developer man-hour costs to the cost of developing tools and hardware. In general, it can be shown that there is a relationship between hours spent testing and the number of defects that can be asserted as not existing. Care should be taken to not spend more going after a bug than the bug may ultimately cost.

Running over the top of all of this are the costs associated with delay in the release of new software. These involve both the costs of continued test and development as well as the opportunity costs involved in delay to market.

There between the pinnacle of our aspirations and the pit of our means we constantly find ourselves in a perpetual search for that line of “good enough”. There is no magical method to find all defects, no general rule of where or when to quit...  Each project must come to a resolution that is unique to its particular circumstances. This too speaks to testing being more of an art.

Valuing human resources

The most valuable test resources an organization has are human. These includes not only developers, but QA, and Field Engineers as well. Each brings unique perspectives to the test solution, and each should be applied in ways that maximize their ability to contribute to the art.

Developers should not only be focused on testing their own code but also involved in making it easier for Field Engineers to collect critical diagnostic info, and helping to limit the amount of rote repetitive time that is spent by QA retesting releases.

While it is very important to have QA resources evaluating each release, it is equally important to not fall in to using them as automated test surrogates. If a test can be done as a rote test, it should (for the most part) be included as part of an automated regression test. These are the kinds of things that CI tools excel at doing. QA Engineers should be using more of their time targeting what has not been tested. When defects are found, they should also be in the loop to approve that the tests covering the defects are sufficient to catch the defect going forward. After that, rely upon the automated test to assert these issues and get back to testing what has not been tested. These are the things that human directed testing excels in doing.

Orchestration is important

Actually all of the various testing methods described so far are good and should be brought together in an orchestration that is based on the “good enough line” for a given project. No one strategy should be done at the expense or exclusion of any other because no one strategy will find all bugs and any one method can consume all test resources and still not be complete.

Don’t be a purist. Many a time I have asked a question of colleagues or forums on how I can build this or that feature into a unit test and received a scolding for not doing unit testing correctly, that what I wanted to do is not proper for unit tests...  John Madden (famous American Football Coach) was once asked by his players if it was better to win pretty or win ugly...  His answer was “Just win baby!”   That’s my philosophy on testing: “Just test baby!”, Pretty, Proper, Ugly…  doesn’t matter. If it asserts a focus on whether a feature of the code is working correctly, it’s beautiful in its own way.

The most important things to bear in mind when testing are:

Test first! Also known as test driven development. Before doing any coding begin with a test that shows it doesn’t currently work and then fix code till the test passes. This serves to build a framework of testing that asserts the code works. It’s easier to build a well thought out test regime if it’s done everyday and every time something is coded. When waiting to the end to build a test, time will become the adversary of good testing.

Don't repeat the same mistakes! When fixing bugs, first code a test that shows the bug you’re fixing and then fix that the bug so the test passes... This makes sure that going forward, the same mistake will not be repeated.

Test repeatedly! Also known as continuous integration testing. Use CI tools to assure that every time you commit new versions of code to source control all the applicable previous testing on that module is rerun and asserts that regression (at least against the current suite of tests) has not occurred. Rely on this to catch old bugs instead of having QA retest for them on every release.

Keep on testing! When someone’s curiosity or conversation takes you to an “I wonder what would happen if” moment, write a test and find out... Many a strange and subtle bug has been discovered this way.

Design for test! As much as possible design systems so that each module is connected to via well designed interfaces. Interfaces should be easily converted to stubs so that each module can be tested with convenient stubs to simulate what the code might see in the actual runtime environment.

Record field conditions!  In our systems, in addition to standard logging, we also log every event that is sent out or comes into our system via one of its IO interfaces. This data is always returned to us with any field support ticket. In fixing the bugs that are associated we often take these recordings of IO events that are seen in the field and play them back as part of the test we create when fixing the bug. This not only makes writing tests easier but it also makes the test follow real world conditions better.

Test platforms

Our team’s code base is a multiple module collection of C and Python 3 code, so we code most of our tests using a home grown C Unit testing framework and Python Unit testing framework that often calls our C Code using CType library. It is important to note that we use Unit test frameworks not only to do classical unit testing on small modules but also to do integration testing of module interactions. The Unit Test frameworks just provide a means of organizing and launching a collection of tests. We divide up our testing into a suite of overlapping  and independent testing platforms as follows:

Developer test. All tests in the suite need to be run by devs so it is a good idea to begin with being able to run on developer machines. In our case this is OS X and since we ultimately compile our C code on an ARM Linux based target, this means we have to have a small set of definitions that tell us how to emulate the various differences on the developer system when running there. I like to think that this level of cross OS testing also makes our test more likely to discover various types of subtle system bugs.

CircleCI. We also use GitHub for our source code repositories so CircleCI is a natural fit for running our test frameworks as part of the Continuous Integration testing. This also means we have yet a third platform to compile and test in as Ubuntu 16 is the standard there.

Synthesized Environment. In addition some features of our system, such as IO processing and overlapping events, just don’t lend themselves to testing within CircleCI. For these systems we have created a special harness that allows us to connect our hardware back to back so one can serve as an IO simulator and the other can function as Unit Under Test (UUT) and both can be run concurrently via SSH.  We use the field recorded scripts to then generate conditional sequences that we expect to see in the field and play them back to see that all is correctly decoded by the UUT. We can also invoke this system to run long duration tests of the IO processors and look for resource contention and rare asynchronous issues. This system is also connected to a local devoted PC that detects when commits occur to the GitHub repositories and runs the test as part of the CI process each time new code is committed.

Conclusion

These are just notes and anecdotes. There is no roadmap to good testing; it's a journey, not a destination, one that is filled with successes and failures. Be bold in pursuit of  the pinnacle you aspire to, pragmatic in contemplation of your means, and contemplative in finding that line of acceptability... Just Test Baby!