I originally wrote this literature review on Software Inspections, with an emphasis on empirical evidence, in December 2007 for a course on Software Verification and Validation. I wanted to publish it on the web because a good portion of the literature that I reviewed is inaccessible to software practitioners and the general public. (One has to pay exorbitant fees to get access to the full-text journal articles and some of the articles that were previously published on the web have since disappeared.)
Abstract
The software engineering community, researchers and practitioners, accept software inspections as a basic and essential tool in conducting software verification and validation activities. In terms of untestable work products such as requirements specifications and design documents, inspections may be one of the few available tools to conduct an effective verification and validation of these work products. However, many in the community want to know how effective inspections actually are, in empirical terms, and how to make them both more effective and efficient in detecting the maximum number of errors with a minimal amount of effort or cost.
In this review of the literature on inspection technologies, the Fagan Inspection is used as a baseline for study and evaluation as it is the most well known inspection technique. Since Fagan (1976), others have noted problems and limitations of the Fagan Inspection technique and proposed their own inspection techniques. Two major variants include Active Reviews by Parnas and Weiss (1985) and Phased Inspections by Knight and Myers (1991). More recent inspection techniques focus on the changing nature of the software industry, in terms of how software teams work and the types of software being developed. Specifically, recent variations on inspections address the impact of globalization (distributed teams, outsourcing, offshoring) and the advent of the World Wide Web (distributed technologies, decentralized architectures). All variants attempt to address flaws in the Fagan Inspection technique with the hopes of making inspections more effective and efficient. Although publications describing the varying inspection techniques claim effectiveness through a case study of a single instance or application, few are able to show the improvement empirically.
Although there is a lack of empirical data that covers broad applications to prove the effectiveness of inspections and to calculate the return on investment of inspection activities, the varying inspection research and techniques still contribute to the larger verification and validation toolkit. Software engineers and software project managers are currently not able to use empirical data to make and justify their decisions on the varying inspection techniques, but they are able to pick and choose from a rich variety of options to suit their application. However, researchers are continuing to improve upon inspection techniques and study how to reliably measure and compare techniques to provide empirical data for decision making. There is no consensus on a “best” inspection technique, but rather the consensus among the research community is that there is still a lot of further study to be done to better understand inspections.
Software Inspections
Experienced software engineers and software project managers recognize that the earlier errors are detected in the software lifecycle, the less costly it will be to resolve the error. As a result, software engineers and project managers plan verification and validation activities into the development lifecycle to detect errors as early as possible in the lifecycle. One of the most well known verification and validation activities is the software inspection, a staple of many verification and validation toolkits and plans. The Software Engineering Institute (2001) defines software inspection as a disciplined engineering practice for detecting and correcting defects in software artifacts, and preventing their leakage into field operations. Where testing can only verify executable work products, software artifacts that can be inspected include requirements specifications, software designs, source code, test plans, and test procedures. The goal of this paper is to present software engineers and software project managers with a survey of available inspection techniques with an empirical evaluation of their effectiveness to allow for software engineers and project managers to not only identify inspection as a necessary activity in a verification and validation plan, but to choose the techniques that are most appropriate for their projects.
Fagan Inspection
The Fagan Inspection is the most commonly cited software inspection technique in software engineering literature. In a sense, the Fagan Inspection is a standard software inspection technique as its essence is included in the IEEE Standard for Software Reviews (1997). Developed by M. E. Fagan in 1972 at IBM, the Fagan Inspection consists of six steps: (1) Planning, (2) Overview, (3) Preparation, (4) Inspection, (5) Rework, and (6) Follow-Up. Compared with walkthroughs, which are informal and consist only of two steps, Preparation and Walkthrough, Fagan Inspections are a more formal alternative requiring an inspection team, with each member playing a role: moderator, designer, coder, or tester.
Fagan’s original work included an empirical evaluation of his documented inspection technique in finding errors. Fagan (1976) showed that inspections caught 82 percent of the errors in the application under case study. Although the findings proved successful for that particular case, Fagan later showed in follow-up work that the effectiveness of an inspection depends on how the inspection was conducted. Fagan (1986) noted that well-conducted inspections resulted in the discovery of 82 and 93 percent of all defects in two specific case studies. However, in cases where inspection moderators were not trained or inspection was conducted on a limited portion of the development lifecycle, defect discovery rates were slightly over 50 percent. Fagan (1986) considered these reduced rates to be a success given the shortcomings of the inspections performed with limitations, but the high rate of variation between moderately successful and highly successful inspections served as an indication to others that inspection techniques could be improved to provide more consistency in finding defects.
Active Reviews
Parnas and Weiss (1985) developed Active Reviews, an alternative inspection technique that addresses the limitations of Fagan Inspections. In describing the motivation behind Active Reviews, Parnas and Weiss (1985) detail the problems of traditional (Fagan) inspections, including: reviewers being overloaded with information and not having sufficient time to work with it, reviewers not being familiar with all of the requirements and constraints imposed on the artifact under inspection, and reviewers attempting to examine all of the artifact without focusing on the parts they might be most familiar with. In an attempt to address these shortcomings, the Active Review technique involves: inspectors working on a single technical area, inspectors working alone, authors of the artifact supplying questionnaires to the inspectors to check comprehension, all of which results in individual discussions between each inspector and the author to arrive at agreed-upon feedback that results in product rework.
Although Parnas and Weiss (1985) mention in their conclusions that they were convinced that Active Reviews achieved their review objectives and that the problems with Fagan Inspections had been addressed, there was no experimental study conducted. That is, there was no formal experiment that compared Active Reviews against a control or baseline method of Fagan Inspections. Despite no empirical evidence existing that shows the effectiveness of Active Reviews, a search of the literature today (in the ACM Digital Library, CiteSeer, or IEEE Computer Society Library) reveals many dozens of papers citing Parnas and Weiss (1985). In essence, the software engineering community generally accepts Active Reviews as an effective practice based on a logical and sensible rationale, even though no empirical evidence shows it to be more effective than Fagan Inspections.
Phased Inspections
Following on Fagan’s work, Knight and Myers (1991) developed an improved inspection technique, Phased Inspections. Knight and Myers recognized several shortcomings in the Fagan Inspection technique, many of them similar to what Parnas and Weiss (1985) found. Knight and Myers realized that there is generally too much material to check in a single inspection activity. Instead of having one inspection meeting as in the Fagan Inspection, the Phased Inspection technique involves conducting an inspection as series of tightly focused steps (phases) instead of in one inspection meeting.
Each inspection phase may involve either a single or multiple inspectors and each phase is concerned with inspecting only one specific quality goal. The inspector responsible for checking for a particular goal assumes that other quality goals either have been checked by other people in a previous inspection phase or that they will be checked in an upcoming inspection phase. This allows for inspectors, individual or group, to remain focused on reviewing one specific aspect of a work product instead of having to try to examine the entire product. For example, a Phased Inspection of source code may consist of four inspection phases: a source code format inspection, a documentation (code comment) inspection, a module-internal logic inspection, and a module-external interface inspection. In a Fagan Inspection, all of the concerns would be addressed in one meeting and source code formatting issues may end up distracting attention away from module-internal issues. The premise behind Phased Inspections is to provide more focus in each of the inspection phases to result in a more effective overall inspection effort.
Although Knight and Myers (1991) conclude that Phased Inspections are an improvement over the well-accepted Fagan Inspection, Knight and Myers only mention that preliminary data supports their conclusion. The preliminary data mentioned was the result of an experiment conducted for a masters thesis, but the data does not appear in a published peer-reviewed journal. As with Active Reviews, Phased Inspections have some level of acceptance in the software engineering community based on the number of citations in the literature on inspections, but no empirical evidence supports inspections, of any form being predictably effective.
Recent Trends in Inspections
Recognizing that major improvements could be made to the traditional Fagan Inspection technique, more recent literature presents both techniques specific for the domain of the system and techniques arising as a result of the evolving nature of software projects. In the last decade, the software industry has shifted from developing all applications in-house to either outsourcing part or most of their software development projects. In addition, software development teams within an organization may not be co-located, but rather be distributed across the country or around the world. The nature of the software applications themselves has evolved in the past decade as well: instead of having standalone applications that are completely self-contained, many applications now consist of multiple components and use services external to the application. For example, applications developed using rapid development frameworks and code generators (as in Microsoft Visual Studio), applications that use open-source components, applications that are developed for the World Wide Web. Each of these examples present cases where conducting a traditional inspection only makes sense if the inspection is tailored particularly for the application.
Stellman and Greene (2006) detail a variant of the Fagan Inspection technique for software development in situations that involve outsourcing as most inspection techniques described in the literature assume that all software is developed in house. Similarly, Tom Gilb, an author of a textbook on classical (Fagan) inspection techniques (Gilb, Graham, & Finzi, 1993), now has new material that describes inspection techniques for Agile software projects (Gilb & Gilb, 2007). Additional research in Agile Software Methods (Phongpaibul & Boehm, 2006) shows that a separate inspection activity may not be necessary, but may rather be counter to the objectives of the Agile Methodology. Some instances of the Agile Methodology call for Pair Programming (Cockburn & Williams, 2000), where programmers work together as a pair while they are coding. The rationale is that one person will be writing code, and the other will be both helping and checking as the code is being written. In essence, the second person of the pair acts as an inspector (as in Fagan Inspections) while the code is being written and asking questions (as in Active Reviews). Since these authors claim that Pair Programming is in effect combining the coding activity with the inspection activity, a separate inspection activity or process is not needed. Although this claim may be true for code, traditional inspections may still play a role in the Agile Methodology as only programming is done in pairs. That is, there are still other artifacts and work products developed in the Agile Methodology (artifacts that correspond to traditional requirements specifications, design documentation, and test plans) that still need to be verified and validated.
The Search for Empirical Evidence
Where the results of Fagan’s original study are often cited, where inspections can detect approximately 50 to 90 percent of software errors, few studies since then have been able to state conclusions based on empirical evidence, rather than on the results of a case study. This is likely due to the fact that most studies, including Fagan’s original study, are conducted on real-world production applications and that establishing the components necessary for a valid experimental design. For instance, in a production project, few organizations would have the resources allocated or the motivation to conduct a parallel inspection effort just to make comparisons. Likewise, an academic study of a theoretical application would not reflect the scale and issues involved in developing production software. In either case, results obtained from a case study, including Fagan’s original study, reflect the situation under study, including the software development process and the people employed on the project.
Given these limitations, there have been some recent studies that attempt to provide empirical data to help evaluate the effectiveness of inspections. Porter, Siy, Marcus, and Votta (1998) focus on creating an model not to evaluate inspections directly, but to rather identify the sources of variation in inspection activities. Identifying the sources of variation allows for software engineers to have a greater understanding of what specifically determines the effectiveness of a software inspection effort to both maximize the defect detection rate and to make the defect detection rate more consistent and predictable. Porter et al (1998) found that the inputs to the inspection activity created the most variation in an inspection effort, compared to the process by which the inspection was conducted and the structure of the inspection team.
In addition to evaluating inspections from a software engineering point of view, Biffl and Halling (2003) conducted a cost-benefit analysis on software inspections to provide more perspective from a software project management point of view. Although many software engineers would agree that inspections are an essential and beneficial tool in the verification and validation process, inspections are expensive activities in a project due to the number of people involved in an inspection team and the amount of time spent preparing and conducting the inspection activities. A cost benefit analysis views inspection activities as an investment in a software project. That is, Biffl and Halling (2003) sought to understand how to allocate sufficient resources to perform inspections that result in beneficial returns through effective and early defect detection, while not overspending effort and resources that result in diminishing returns. Although the authors did not explicitly state it in their conclusions, it is evident that the effectiveness-versus-cost relationship of software inspections follows the Pareto principle, better known as the 80-20 rule. This observation is supported by the Biffl and Halling’s (2003) conclusion that investing a moderate amount into a software inspection effort results in significant detection of defects while investing beyond that amount results in limited gains for the effort spent.
Conclusions
As versatile as software inspections are throughout the development lifecycle, the effectiveness of inspections in detecting errors is not necessarily guaranteed by their execution alone. The effectiveness of a software inspection depends on how the inspection is conducted. The majority of the literature on software inspections shows ways to improve how software inspections are performed, using the Fagan Inspection as the baseline. However, few facts about inspections have been proven empirically through solid experimental design.
Researchers and practitioners often cite Fagan’s original defect detection rates of 50-90 percent detection using inspections, or similarly state that inspections will find the majority of defects in work products. However, few have shown improvements in Fagan’s original results: either by increasing the percentage of errors detected or by narrowing the variance between moderately and highly effective inspections. Instead, improvements have been focused around matching inspection techniques to the current state of the software industry in terms of how software teams work together (processes) and the types of software they are working on (products). This may be discouraging for some as without solid empirical evidence, one wonders how one can make decisions on inspections in verification and validation activities.
There is no silver bullet when it comes to software inspections, and as of today, there is no “best” inspection technique and no easy formula to calculate the return on investment when it comes to conducting inspections. However, what software engineers and software project managers have at their disposal are a library of software inspection techniques they can add to their toolkit and many rules of thumb on when to conduct inspections and how much effort should be put into them. As with many other issues in software engineering, the software engineer and project manager should view inspections as a customizable tool they can use to achieve the objectives of detecting errors and detecting them early in the lifecycle to help reach the overall goals of delivering a high quality product at the lowest possible cost by first assessing the situation, knowing the inspection tools they have available, and choosing (or synthesizing) the right ones for the situation.
References
Biffl, S., & Halling, M. (2003). Investigating the Defect Detection Effectiveness and Cost Benefit of Nominal Inspection Teams. IEEE Transactions on Software Engineering, 29(5), 385-397.
CeBASE: NSF Center for Empirically Based Software Engineering. (2003). eWorkshop on Software Inspections and Pair Programming. Retrieved on December 1, 2007, from http://www.cebase.org/www/home/index.htm.
Cockburn, A., & Williams, L. (2000). The Costs and Benefits of Pair Programming. Proceedings of the First International Conference on Extreme Programming and Flexible Processes in Software Engineering.
Fagan, M. E. (1976). Design and Code Inspections to Reduce Errors in Program Development. IBM Systems Journal, 15(3).
Fagan, M. E. (1986). Advances in Software Inspections. IEEE Transactions on Software Engineering, 12(7).
Glib, T., Graham, D., & Finzi, S. (1993). Software Inspection. Boston, MA: Addison-Wesley.
Glib, T., & Gilb, K. (2007). Agile Inspection. Retreived on December 1, 2007, from
http://www.spipartners.nl/data/train_course/gilbinspect_agile_en.php.
Knight, J. C., & Myers, E. A. (1991). Phased Inspections and Their Implementation. ACM SIGSOFT Software Engineering Notes, 16(3), 29-35.
IEEE. (1998). IEEE Std 1028-1997: Standard for Software Reviews. Los Alamitos, CA: IEEE Computer Society Press.
Parnas, D. L., & Weiss, D. M. (1985). Active Design Reviews: Principles and Practices. Proceedings of the 8th International Conference on Software Engineering.
Phongpaibul, M., & Boehm, B. (2006). An Empirical Comparison Between Pair Development and Software Inspection in Thailand. Proceedings of the 2006 ACM/IEEE International Symposium on Empirical Software Engienering.
Porter, A., Siy, H., Mockus, A., & Votta, L. Understanding the Sources of Variation in Software Inspections. ACM Transactions on Software Engineering and Methodology, 7(1), 41-79.
Software Engineering Institute. (2001). Software Inspections. Retreived on December 1, 2007, from http://www.sei.cmu.edu/str/descriptions/inspections_body.html.
Stellman, A., & Greene, J. (2006). Applied Software Project Management. Sebastopol, CA: O’Reilly Media.