The complexity of the software employed in many medical devices means that ensuring their safety requires complement testing with a combination of other techniques such as design validation, implementation validation, and remaining fault estimation.
Device manufacturers are morally, legally, and financially bound to ensure that their products do no harm. But despite the enormous efforts manufacturers invest in validating the safety of their devices, failures continue to appear. For example, FDA reports that between 1990 and 2000 there were 200,000 pacemaker recalls due to software in the United States, and between 1985 and 2005 there were 30,000 deaths and 600,000 injuries from medical devices, of which 8% were attributable to software.1
Some of the best known safety-related standards address functional safety—safety that relies on the continued operation of a (software) system to ensure that persons, property and the environment are kept free from unacceptable risk or harm. IEC 61508 (electrical, electronic, programmable), ISO 26262 (automotive), and the CENLEC 5012x series (rail transportation) all deal with functional safety.
In contrast, IEC 62304, which is becoming the de facto global standard for medical device software life cycle processes, does not address functional safety. Instead, it addresses the “framework of life cycle processes with activities and tasks necessary for the safe design and maintenance of medical device software” and, through ISO 14971, the risk management associated with those processes.2
Because IEC 62304 doesn’t address functional safety, it doesn’t define numerical values for acceptable failure rates. Conformity to IEC 62304 doesn’t imply a safety integrity level (SIL) as does, for example, conformity to IEC 61508, which is meaningless without one.
Although IEC 62304 sets out the processes required to produce a compliant device, it is not clear how the quality of those processes relates to the quality of the device produced. A study produced by the National Research Council in 2007 found:
… Moreover, there is sometimes an implicit assumption that adhering to particular process strictures guarantees certain levels of dependability. The committee regards claims of extraordinary dependability that are sometimes made on this basis for the most critical of systems as unsubstantiated, and perhaps irresponsible.3
Martyn Thomas, one of the authors of the National Research Council report, explained the role of process in producing a safe product:
What makes good processes essential is the confidence that they provide in their outputs: [W]ithout well-defined testing processes you cannot interpret the results from testing, without strong version control you cannot have confidence that the product that was tested is the same as the one that was inspected...So strong, auditable processes are necessary if we are to have confidence in the product, but the fact that a process has been followed in developing a product does not tell you anything about the properties of the product itself.3
This statement indicates that complying with the development processes described in IEC 62304 does not exonerate us from also performing the necessary analysis to ensure that our product is safe. The question is: What analysis can we perform?
In theory, any software system is deterministic. The number of states it may assume and the number of possible transitions between those states are finite. In practice, however, the increasing use of multithreaded code and the decreasing cost of multicore processors have led to a rapid increase in the number of states that a software system can assume. These systems are now so complex that they can only be treated as being nondeterministic. We cannot know or predict all their possible states and state changes.
This fact renders exhaustive testing impossible, and so, of necessity, testing has become a statistical activity. Additionally, the large number of states and trajectories through that state space means that most of the bugs that remain in the code after module testing are not Bohrbugs, which can be reproduced, but are so-called Heisenbugs, which occur because of subtle race conditions between threads. Heisenbugs are typically not reproducible because the tester can determine neither the precise (multidimensional) state that lay at the start of the error, nor the trajectory from that state that led to the failure. Even if these states and trajectories could be determined, reproducing them in a controlled environment would be impossible.
In Medical device software—Part 1: Guidance on the application of ISO 14971 to medical device software, AAMI cautions that a pitfall to avoid is “[d]epending on testing as a risk control measure.”4
A system’s dependability is its ability to respond correctly to events in a timely manner, for as long as required. That is, it is a combination of the system’s availability (how often the system responds to requests in a timely manner) and its reliability (how often these responses are correct).
Safety is inseparable from dependability (whether reliability or availability is more important for safety depends on the nature of the system), and increased dependability means increased development and product costs. Therefore it is essential to develop a system that is just sufficiently dependable. Developers must explicitly state their claimed dependability, use the necessary tools to design to a minimum level of acceptability, and produce evidence to demonstrate the claimed dependability in the final product. These steps require significant expertise on the part of system architects, designers, and implementers:
Building software is hard; building dependable software is harder...the requirement to produce evidence is highly demanding and likely to stretch today’s best practices to their limit. It will therefore be essential that developers are familiar with best practices and deviate from them only for good reason.5
To build a safe system, developers must begin with the premise that all software contains faults, and these faults may lead to failures.
Failures are the result of a chain of circumstances that start with a fault introduced into a design or implementation. Faults may (but often do not) lead to errors, and errors may (but often do not) lead to failures (see Table I).
Working from the premise that all faults may lead to failures, developers must include multiple lines of defense when building a safe system:
Testing. Testing is designed to detect faults in the design by noting the errors and failures faults can cause. For the reasons previously discussed, testing is of primary importance in detecting Bohrbugs, which are solid, reproducible bugs that remain unchanged even when a debugger is applied. Figure 1 illustrates how, in a waterfall development, the testing levels correspond to the architecture and design levels. As one moves up the right-hand part of the V, testing becomes less complete and more statistical. With Heisenbugs, the tester may conclude that a failure has been detected but that it is impossible to identify the associated fault. In general, the failure may be delayed for a long time after the error occurs.
Design Validation. While testing can increase confidence in a device, it cannot provide convincing evidence that the system is fault free. Formal design validation can provide this evidence for some components.
To illustrate the difference between testing and design validation, consider the simple, two-threaded program shown in Figure 2.6 Here, two threads increment the global variable x. Reading the code, it appears that, depending how the threads interleave during their execution, x will have a value between 10 and 20 at the end of the program. The author has tested this program 10,000 times and, as expected, each time a value between 10 and 20 has resulted. For many years, this example was used as a class exercise—until an error was discovered. In fact, x can end up with values as small as 2. Now the same example is used to demonstrate the human inability to understand the complexity of multithreaded code and inability of testing (even 10,000 tests) to detect unlikely but possible cases.
While testing cannot find a fault in this program (which, incidentally, contains more than 77,000 states), formal design validation using a technique such as linear temporal analysis can immediately find a sequence of 90 steps that leads to x finishing with a value of 2.
Formal design validation tools such as Spin can provide counter-examples of this type or proofs of correctness of the algorithm and protocol designs. They can be used retrospectively or during the design phase, as indicated in Figure 1.
Implementation Validation. If we are convinced that our design is fault-free, we must turn our attention to the implementation, the code itself, and produce evidence that it is correctly implemented.
This validation is normally achieved through invariants and contracts included in the code, through deep static analysis or through the symbolic execution tools that lie on the borderline between static and dynamic analysis. Symbolic execution is particularly powerful, though it demands significant computing power. While dynamic analysis (testing) can follow only a single trajectory through a system’s state space at a time, symbolic analysis can follow all possible trajectories simultaneously, producing test cases and seeking error behavior.
Remaining Fault Estimation. While there are many proposed functions for estimating the number of remaining faults in a system given the failure history, these functions are particularly difficult to apply for a particular system. Fault injection is a statistical technique for determining, at any point in the testing cycle, the number of significant faults remaining in the system.
With fault injection, we introduce bugs with characteristics of the possible remaining (unknown) bugs. This method accomplishes three things:
The chief difficulty with fault injection is creating artificial faults with the same characteristics as the unknown faults. As with random tests, the results of fault injection require careful statistical analysis.
Concerns about commercial, off-the-shelf (COTS) software in medical devices are often based on the idea that COTS implies software of uncertain provenance (SOUP). It is worth noting, however, that IEC 62304 assumes that SOUP will be used in medical device software.2 If access to the source code, fault histories, and in-use histories for the COTS or SOUP is available, using such components may be the best solution for many medical projects.
Developers must, of course, look carefully at what the COTS vendor presents to support its product’s dependability claims: the claims themselves, processes, expertise, design validation, statistical and fault tree analyses, proven-in-use data, design artifacts, and safety manual. Developers must ask: What proof does the software vendor provide that its product supports the safety requirements for this device? And in addition to the standard questions about functionality, features, cost, support, and so on: Will this COTS software support our getting approval for our medical device?
The complexity of the software employed in many medical devices has rendered inadequate traditional methods (testing) for demonstrating their safety. Multithreaded code (particularly when executed on a multicore processor) has made Heisenbugs the predominant cause of failure in the field , and, because testing is ineffective in isolating Heisenbugs, developers must complement testing of software systems with a combination of other techniques to produce the necessary evidence of the safety of the system.
The use of a particular design and implementation process does not guarantee a safe product and does not absolve the developer from additionally using techniques such as formal proof of key algorithms or protocols, deep static analysis of code, programming by contract, and symbolic execution to demonstrate the validity of the architecture, design, and implementation.
1. D Jackson et al., eds., Software for Dependable Systems: Sufficient Evidence? (Washington, DC: National Academies Press, 2007), 23.
2. IEC 62304:2006, “Medical Device Software—Software Lifecycle Processes,” (Geneva: International Electrotechnical Commission, 2006).
3. M Thomas, “Engineering Judgement,” in Proceedings of the 9th Australian Workshop on Safety Critical Systems and Software (Brisbane, Australia: Australian Safety Critical Systems Association, 2004).
4. ANSI/AAMI/IEC TIR80002–1:2009, “Medical Device Software—Part 1: Guidance on the Application of ISO 14971 to Medical Device Software, (Arlington, VA: AAMI, 2009), 55.
5. National Research Council, op. cit.
6. M Ben-Ari, Principles of the Spin Model Checker (New York: Springer, 2008).
Chris Hobbs is senior developer of safe systems at QNX Software Systems Limited (Ottawa, ON, Canada).