Feature

Incorporating Static Analysis into the Development Process


Find more content on:

Advanced static analysis tools are designed to enable successful product development. OEMs should learn how these tools work, what they are capable of doing, and how to integrate them into a process.

Static analysis is a technique used to explore a given software system without actually executing the software.1 Several static analysis tools are available, and each meets slightly different needs. Regardless of the tool chosen, adopting static analysis is a quality-improvement initiative and a deployment plan designed to ensure a successful product rollout. This article describes best practices for adopting a static analysis tool and successfully incorporating it into the development process.

 

Background

 

Photo by iSTOCKPHOTO
Static analysis tools have been used for decades to improve software quality. However, the first-generation tools (e.g., the Lint family) were of limited effectiveness: they were not good at finding the important defects and had various usability issues including very high false-positive rates. The latest generation of tools, most often referred to as advanced static analysis tools, employ sophisticated techniques to find bugs in a reasonable amount of time with a low rate of false positives. These tools typically share the following set of features:

 

• They find serious bugs such as buffer overruns, race conditions, resource leaks, and null pointer exceptions.

 • They identify suspicious artifacts such as unreachable code, contradictory assumptions, and inconsistent treatment of return values. These often correlate with defects because they indicate programmer confusion.

 • They report helpful information about each error: the point at which it occurs, the path taken through the program to that point, and important values and conditions along the path.

 • They determine which source files to analyze by integrating closely with the regular build system.

 • They provide warnings written to a centralized database.

 • They have a user interface that allows the user to attach annotations to each warning, such as whether a warning is a true or false positive or in a freeform note. Warnings can be assigned to programmers for further investigation or to be corrected. These annotations persist through subsequent analyses so if, for example, a warning is dismissed as unimportant, it will not be rereported.

 • They provide analysis that is highly configurable by an end-user, including allowing additional checks to be written.

 

The best practices described in this article apply specifically to advanced static analysis tools.

 

Where Static Analysis Fits In

 

The author assumes that a fairly standard software development environment is in place—programmers interact with a source-code management repository, and a centralized build system compiles and builds executable artifacts before passing them on to be tested. Bugs are easier and less expensive to fix the earlier they are found, so the best time to apply static analysis is as soon as possible—even as code is being written.

 

Figure 1. (Click to enlarge) A static analysis tool extension that implements a custom check.
Programmers can benefit from using static analysis as soon as their code compiles, even before any test cases have been written. The preferred approach is to encourage programmers to analyze their code on a daily basis, and then to require that all code be analyzed before it is committed to the source-code repository. In addition to finding bugs early, this approach has two additional benefits. First, programmers become familiar with the tool and how it responds to changes in how the code is written. This encourages them to write code that is easier for the tool to analyze, thereby reducing false positives and false negatives. Code written this way is easier for humans to reason about, too. Secondly, if a bug is eventually found that was not detected by the tool, it encourages programmers to write extensions to the tool so that similar bugs can be caught in future scans. Tool customization is discussed later in this article.

 

Although use by individual programmers is encouraged, the greatest benefit to the team is realized when the static analysis scan is integrated with the centralized build system. One approach is to schedule an analysis to run overnight if any changes have been made to the source code. This is appropriate for large programs for which a full analysis run can take several hours to complete. All of the mainstream tools also support incremental analyses, in which the time to reanalyze the code is approximately proportional to the size of the changes since the most recent analysis. One approach is to have an analysis automatically triggered when a programmer commits a change to the source-code repository. This allows any new warnings to be automatically correlated with the usually smaller set of changes that caused them.

 

Deployment

 

To deploy a static analysis tool, the following steps are recommended:

 

• Set up a server to collate and manage the analysis results.

 • Integrate the tool with the build system.

 • Conduct an initial analysis of the code and sanity-check the results. Tune configuration parameters and repeat until satisified.

 • Perform a baseline scan.

 • Configure automatic activities, including periodic scans and e-mail notifications.

 • Train users.

 • Implement custom checks (optional).

 

Setting Up a Server. Static analysis tools typically have a three-tier architecture. Analysis processes run on the same machine that compiles the source code. These processes generate results that are communicated to a hub server. This hub collates and filters the incoming results, and stores them in a database. The hub also provides a user interface for viewing and managing analysis results. Often it allows users to connect with it using a standard Web client. Normally there are many analysis processes but only one hub and one database. All of the processes can be hosted on different machines as long as they can communicate over a network. The key decision is where to host the hub and database. Static analysis tools can generate large quantities of data, so the database should be located on a machine with ample storage available. Most tools ship with standard off-the-shelf databases (e.g., Postgres) and can integrate with other mainstream databases too, so traditional database management techniques apply.

 

Integration. The next step is to confirm that the tool is properly integrated with the build system. Static analysis tools work by parsing code in the same way it would be parsed by its usual compiler, so the tools must be aware of which compilers are being invoked and how. Most tools work out of the box with many different compilers, but if a nonstandard compiler is being used, then the static analysis tool must be configured to recognize that compiler by telling it

 

• The name of the compiler executable.

 

• The command-line parameters that the compiler accepts and how to interpret them.

 

• How to interpret any nonstandard syntax that the compiler accepts.

 

Any parse failures reported by the analysis tool can usually be resolved by following the preceding three configuration steps.

 

Initial Scans. The first analysis performed may be less than optimal in some ways. There may be false positives, and possibly some false negatives too. These tools have many different configuration options. The goal of the initial scans is to maximize the benefit of using the tool by converging on a good set of values for these options.

 

The first step is to decide on the set of warning classes to report. The best classes for safety-critical code are not the same as the best for a desktop game, for example. Configuration parameters are used to selectively enable and disable checks for different classes.

 

The reported warnings should then be reviewed to identify obvious false positives. Often these are a consequence of source code being unavailable, such as when a program uses an application programming interface (API) to a library that is only available in object code. Analyses usually ignore calls to such functions. If any of these functions can cause the program to exit prematurely, then they can be a source of false positives: the analysis will wrongly consider some paths subsequent to the function call to be feasible, and any conclusions it draws along those paths will be invalid.

 

The recommended way to handle missing source code is to write models for critical functions. A model is a stub of code that captures the important semantics of a function. Good models will both reduce false positives and increase true positives. Good tools ship with models for popular libraries (for example, the C library). When models are needed, it is typically the case that a few will have a significant influence on the results; there is no need to model everything.

 

Baseline Scan. Applying a static analysis tool is simple if done before any code is written. It is more likely, however, for a tool to be adopted after code already exists, either because code is being reused, or because the programming phase has already started. Therefore, there may be hundreds of warnings reported by the first run of the tool. Each warning must be inspected manually, so dealing with all of these warnings at once can be daunting. Programmers often react negatively to being handed a list of warnings that they must address immediately. A better approach is to roll out the use of the tool in a way that enables programmers to deal with only new warnings immediately. A good technique is to tag all of the results generated by the baseline scan as legacy warnings and to set up the default filter so that these are not shown. Those legacy warnings can then be addressed as a background activity.

 

Automation. All tools provide various ways to process warnings automatically. One useful application is to automatically assign some warnings to specific programmers. For example, suppose that programmer Alice is the lead developer for a wireless communications module, and that the code is located in a directory named wireless. A configuration option can be set so that if a warning is reported in any file within that directory, it is automatically assigned to Alice. Similarly, Bob may be the team expert at finding and fixing memory leaks, so the tool could be configured to automatically assign those warnings to him.

 

Another opportunity for automation is to integrate with bug tracking tools such as the open-source Bugzilla (www.bugzilla.org). It is possible to have all warnings automatically reported to the bug tracking system, but this is not recommended because there may be false positives among the warnings. Instead, it is better to allow the user inspecting a warning to easily select an option to submit the warning to the tracking system. Proper integration couples the analysis-tool warning with the bug-tracker entry so that it is easy to jump from one to the other.

 

Training. Training users is arguably the most essential component in a successful rollout. Sometimes programmer resistance is a barrier to successful adoption. Understandably, programmers do not like having flaws in their code pointed out to them, especially in an open forum, and some may have had negative experiences with the earlier generation of static analysis tools. The primary focus of any training should be to demonstrate the value that the tool can bring. The best way to do this is to have hands-on exercises in which programmers can first use the tool to explore confirmed positive results. Although most reports are easily understood by programmers, some can be quite subtle, and it takes practice to be good at interpreting them consistently and accurately. If no in-house expertise has been developed, then the tool vendors themselves are probably the best sources of training materials.

 

Custom Checks. Almost all software development efforts have project-specific rules or idioms that should be followed to avoid errors. These can include simple coding standards, rules for how a particular API should be used, and very complex cross-cutting idioms such as for logging or exception handling. Static analysis tools can often be extended to implement checkers for such rules. Customization mechanisms vary widely and also differ depending on what kind of checker is to be implemented. One tool (CodeSonar, which is used by the author) that checks on the use of an API may be implemented by replacement functions. This is good for checking properties of parameters (e.g., parameter p must never be zero), or for checking that functions are called in the right order.

 

For example, see Figure 1 for an extension that implements a static check for a simple property. The analysis proceeds as follows: when it sees a call to function foo, it treats it as if it were a call to csonar_replace_foo, and similarly for bar. As the analysis explores paths, it keeps track of the program state, and if it sees a call to csonar_trigger where the condition given may be satisfied, then a warning is issued.

 

The API for writing extensions in this way allows programmers to write extensions as if they were writing dynamic checks for a property—a technique they are already familiar with.

 

Other techniques allow users to write checks on properties such as program structure, control flow, and identifier naming, among others. The analysis tools essentially enable the authors of a checker to access all of the underlying abstract representations, which makes writting checkers relatively simple.

 

Conclusion

 

Advanced static analysis is a powerful technique for finding programming errors. The most benefit from such tools is gained when they are incorporated into the standard development cycle and customized for project-specific rules.

 

Medical device manufacturers that adopt these tools early in their processes may see the most benefit. That is because programmers can learn to program with static analysis to avoid errors and avoid costly corrections later.

 

 
References

1. Raoul Jetley and Ben Chelf, “Diagnosing Medical Device Software Defects Using Static Analysis,” Medical Device & Diagnostic Industry 31, no. 5: 72–83.

Paul Anderson is vice president of engineering at GrammaTech (Ithaca, NY). He can be reached at paul@grammatech.com.
Author: 
Paul Anderson
No votes yet

Login or register to post comments