Evaluating Software Design Processes by Analyzing Change Data Over Time

This paper presents analyses o f eady design and code change data f r o m the Software Cost Reduction (SCR) project, a wellreported ef for t conducted at the Naval Research Laboratory f r o m 1978 t o 1988. The analyses are mostly time-based studies o f the change data and relationships between the data and SCR personnel activity data. This analytical approach seems t o allow useful insights in to software design processes even when data are l imi ted t o a single software pro j ect. I t also enables project personnel t o notice favorable o r unfavorable patterns w i th respect to project goals dur ing the course o f the project. Some analyses o f the change data show patterns consistent w i th a major goal o f the SCR project-the design and development o f easyto-change software. Specifically. most changes took a day o r less t o uncover and resolve; the major i t y o f changes updated at most one module. Moreover, these percentages remained fair ly stable. Also, no posit ive relationship appeared between error-correction effort and the number o f days that an er ro r remained in the SCR design documentation. Other analyses suggest that consistency may have been temporary. For example, the analyses suggest a stepwise growth in average change effort, and an increasing percentage o f changes resulted in module interface updates. Certain specific ratios between SCR change data and personnel act iv i ty data show promise as possible indicators o f design incompleteness. The ratios are based o n data o f the k inds that are typicall) collected on software projects.


I. INTRODUCTION ASILI and
Weiss describe a methodology for collect-B ing valid software engineering data [2].The intent is to capture data that can yield insights into software development and maintenance processes, that help confirm or reject claims made for different software engineering technologies, and that point to better techniques for prevention, detection, and correction of errors.Since the 1970's, their methodology has been applied to a few projects at the Naval Research Laboratory (NRL).The application has been limited for a number of reasons.One is that such data collection tends to be time consuming and costly.Indeed, a major effort can add as much as 5-15% overhead to a project [9].A second reason is that there is a major limitation to the goal directed data collection approach in a actual development environments-the inabil-ity to isolate the effects of single factors.As a consequence, project managers have been less than enthusiastic about data collection.
A result of the limited application is that we have data for a few projects that differ greatly in staffings, goals, and applications.Further, for some projects, our data are incomplete because the collection efforts were terminated before project completion.This has caused difficulties in analyzing and reporting on the data; for example, we often cannot generate summary statistics at the end of a project and compare them with similar statistics from past projects.Furthermore, when we do produce summary statistics, they have provided us with little insight into the software design processes and have proven difficult to compare with similar statistics published in the open lite rat ure .
An approach we have adopted for dealing with these difficulties is to view and analyze software engineering data over time.Often, time-based measures allow project personnel to detect favorable and unfavorable trends even before project completion.An added advantage is that the underlying data sets can be subjected to statistical techniques that can highlight trends and potential relationships between measures.As part of our approach, we have established some guidelines for presentation of such analyses.One guideline is to avoid a jittery graph by plotting the cumulative value of a measure instead of its incremental values, which tend to vary greatly between time periods.A second is to avoid changing the historical pattern of a graph associated with change in a measure's range by plotting a measure as a percentage of the total.A percentage plots is, of course, a special kind of ratio plot.Accordingly, a third guideline is to encourage comparison between different components of a software design and to highlight relationships between different data by examining ratios between data, specifically change data and personnel activity data.In short, our experience has reemphasized the importance of observing and understanding system dynamics as an approach to understanding software development and evolutionary processes that Belady and Lehman [ 3 ] discussed in the 1970's and that has been illustrated more recently by Grady in 1987 [ 161.This paper illustrates the above ideas and the time-based approach to the analysis of software engineering design change data.It presents analyses of design changes proposed and made by software development engineers who worked on the Software Cost Reduction (SCR) project at U.S. Government work not protected by U.S. Copyright NRL.There are five sections in the paper.The remainder of this section contains a brief overview of NRL's SCR project and Software Technology Evaluation project.The second section is a description of the techniques and strategies that were used in collecting and categorizing the data.The third section is a detailed discussion of the change and error data.The final two sections contain the analyses of the data and their possible implications.

A . The Software Cost Reduction Project
The Software Cost Reduction Project began in 1978 at NRL as a cooperative effort with the Naval Weapons Center (NWC).The purpose was to redevelop version 2 of the Operational Flight Program for the A-7E aircraft using improved software technology [IO].Two major goals were to 1) demonstrate the feasibility of using selected software engineering techniques in developing complex, real-time software, and 2) provide a model for later NWC software designers [24].Software engineering techniques such as formal requirements specification [ 191, information hiding [22], abstract interfaces [23], and cooperating sequential processes [ 131 were prominent among the technologies applied.The claimed advantage of these technologies was that they facilitate the development of software that is easy to change and maintain.
A complete discussion of the project's software requirements was provided by Heninger et al. [ 171.Britton and Parnas provided a detailed description of the module design structure [5].Fig. 1 presents an example of a module interface specification (i.e., a design specification) taken from a specification for the device interface module The SCR project terminated at the end of 1987 after implementing three subsets of the operational flight program requirements.The subsets were evaluated and tested using ground using ground-based test facilities at NWC.

B. The Sofwure Technology Evaluation Project
The data reported here was collected and analyzed by researchers working on the Software Technology Evaluation (STE) project, which was an NRL project separate from the SCR project in terms of goals, staffing, and funding.'The goal of the STE project was to evaluate alternative software development technologies.A major task of the STE project, therefore, was to provide the basis for an objective evaluation of the methodology used in the SCR project.
The approach followed in the STE project was to monitor, evaluate, and compare software development technologies used in different software projects.The monitoring and evaluating processes consisted of goal-directed data collection and analyses techniques (21.For the SCR project, data was collected in three areas: personnel ac-

I. Introduction
There are two visual indicators conmlled by h e OFP on the A-7E aircraft; one that can, and one that cannot, be seen by the pilot during flight.These are cumntly labeled "IMS Non-Aligned and "Auto-CAL, respectively.Each can be on steady.on blinking.or off.11.COLLECT~ON OF CHANGE DATA From 1980 until early 1985, SCR project engineers reported design and code problems, suggested design changes, and logged their modification activity to baselined (i.e., published and change-controlled) interface specifications, pseudocode, and TC2 code' on Change Report Forms (CRF's).An example of a completed CRF is presented in Fig. 2.There were two reasons for this procedure.First, it was required as part of the SCR project's configuration management (CM) procedures.Second, such data were needed by STE researchers for eval- STE researchers validated primarily those CRF's that were resolved either by official acceptance and incorporation into the baselined documentation, or by official rejection of the proposed change.Ideally, validation should have been a continuing activity that occurred as CRF's were generated and resolved.Validation of SCR CRF's, however, tended to be an aperiodic activity in which large groups of CRF's were validated at one time.The validation consisted of checking completeness, accuracy, etc.It often included discussions with persons who submitted the CRF's, authors of affected documents, and SCR CM personnel.A major validation point concerned what constituted a design or code change.Basically, the view taken was that a change was conceptual: that is, one should have been able to state a proposed change in a simple declarative sentence and the change may comprise alterations to one or more baselined interface specification or implementation documents.In addition, a change that was described in one CRF similar to a change in a CRF resolved and implemented in earlier baselines (i.e., a change that required completion or correction to earlier baselined alterations) was considered a unique or new change.Thus, a change was to have a unique basis-error correction, adaptation to outside change, improvement, or other (see Fig. 2).The notion of basis followed the scheme presented by Swanson (271.A proposed change that was rejected obviously resulted in no alterations.

ACCESS PROGRAM
This definition of a design or code change caused problems.Occasionally a CRF was submitted that incorporated more than one change, and different engineers sometimes submitted the same change on difTerent CRF's.For example, it was not unusual for a CRF to describe two conceptual changes as in the following: "The last sentence of the description is ambiguous.
Replace it with . .* Note also that the word descriptor is misspelled." A workable and reasonable solution used by STE researchers for dealing with these situations was to split submitted CRF's that incorporated more than one change into an appropriate number of CRF's, such that each described a single change.Multiple CRF's that describe identical changes were consolidated into one CRF.One result of this policy was that there was not a one-to-one correspondence between submitted CRF's and validated CRF's.The other result was, of course, that there was a one-to-one correspondence between proposed changes and validated CRF's.
There were other sections of the CRF that caused difficulties.One was the basis of an accepted change.A problem was that it was not sufficient to define an error as a discrepancy between a specification and its implementation.For example, an inadequate interface design was considered an error; an adequate interface design needing enhancements was considered an improvement.The only reasonable solution to this problem was to let SCR lead engineers decide in such situations.Another problem was determining whether or not a change was a correction or completion of an earlier change that was already incorporated in a baseline.The fact was, that after a long period of time or after many versions of a document, authors frequently forget earlier changes that had addressed the same issues presented in current CRF's.For each of the CRF's reported in this study, STE researchers reviewed all versions of all documents baselined prior to resolution of the CRF and discussed all questions with lead SCR engineers.This was a laborious process but was necessary to ensure that corrections or completion errors were properly identified.
Lastly, the SCR project's CM procedures were not perfect.Validators found a few CRF's that were not resolved, but, nevertheless, were implemented in published specifications.The only reasonable solution for this was to resolve such CRF's with the date of the latest baselined specification and to submit CRF's for remaining aspects of the change.Validators also found modifications for which there were no corresponding CRF's.The policy for this was to submit CRF's and record them as immediately resolved with the date of issue of the appropriate baselined specifications.

OVERVIEW OF EARLY SCR CHANGE DATA A. General
This paper is a summary of 325 validated CRF's that were resolved by January 1984 (i.e., through the CRF's were no longer validated by STE researchers. By January 1984, engineers had submitted 424 CRF's.The 325 CRF's reported here map to 296 (70%) of those submitted and resolved by SCR CM personnel by that date.Figs. 3 and 4 are profiles of resolution activity for the CRFs.'By January 1984, approximately 47 500 person hours had been expended on the SCR project.The 400 hours of resolution effort accounted for approximately 1 % of project activity.Table I presents the distribution of the CRF's categorized by the originators' activities when the CRFs were generated.
A large proportion of CRF's originated during design activity.In addition, by January 1984 only 15% of SCR project hours were spent on pseudo coding, coding, and testing activities.This means the changes reviewed in this study can be characterized as changes that are typically proposed and made early in software development.in contrast with changes reported elsewhere [ 11, [ 151, (29).The remaining 297 accepted CRF's resulted in modifications to 47 baselined module interface specifications, most of which are packaged in two documents.No module implementation documents (which include psuedocode) or code were affected for the simple reason that none were baselined prior to January 1984.This limit of impact to interface specifications means that the 297 changes can be further characterized as early design changes.
The bases for the 297 accepted changes are presented in Table 11.None of the changes were the result of changes to the software requirements specification.This can probably be attributed to the following: 1 ) an extensive requirements specification was generated prior to design [ 171, 2) the requirements specification has been shown to be relatively error free and remarkably free of ambiguities t81, 3 ) as noted earlier, the changes reported can be characterized as early changes, and 4) the SCR project is redeveloping software for a fixed operational version of the A-E flight software.
The percentage of error corrections (see Table I1 and Fig. 7) is higher than the range (40-64%) reported Basili and Weiss [ 11, [29].But it is far lower than the 96% figure reported by Shooman and Bolsky [26] and is decreasing.The proportion of total CRF effort spent on error corrections (Fig. 8), even though decreasing.sharply contrasts with the 17% figure reported by Lientz and Swanson be noted again, however, that the SCR requirements document change data are not included in this summary.The proportion of error corrections that involved completing or correcting a prior change (see Fig. 9) is large as compared to the 6-12% range of figures reported by others [ l ] , 1281, [29] and seems to be increasing in a step fashion.The 12% figure is computed from data presented by Weiss (281 and by Weiss and Basili (291.This large proportion could be the result of the many hours spent by STE and SCR engineers in assuring the correct identification of correction and completion errors.

B. The SCR Euse-oj-Cliciiige Goul
A major objective of the SCR project was to produce software design, code.and a documentation set that could be used to scope and to implement changes easily.The SCR design and code change CRF was designed explic- itly to collect data to try to evaluate achievement with respect to this objective.
Fig. 10 presents the distribution of effort required for understanding and incorporating the 297 accepted changes into the SCR project's design documentation set; Fig. 11 presents the distribution for error corrections only.Only one of the 28 rejected CRF's was not implemented because the proposed change was deemed not worth the effort.Most changes (81%) took'an hour or less to understand and resolve; 98% took a day (i.e., 8 person hours) or less.Eighty-six percent of the error corrections took an hour or less to understand and resolve; 99 % took a day or less.Although the data presented in Figs. 10 and 11 exhibit downward trends, these data seem to suggest that, for early changes and error corrections, SCR engineers were meeting their major objective.For errors uncovered and corrected late in the life cycle of a NASA/Goddard Software Engineering Laboratory project, Basili and Perricone Fig. 12 presents the cumulative average effort for all SCR changes and error corrections.There appeared to be a step growth in cumulative average change effort as the SCR project proceeded.This is consistent with Boehm's data that show an exponential growth in cost to fix or change software for successive phases of the software life cycle [4].Although consistent, the average change effort for the early SCR design changes nevertheless seems quite small.Fig. 13 presents the effort for an error correction based on number of days that the error was in the system.The figure "days in system" is the difference between CRF resolution date and the earliest issue date for the interface specifications containing the error.Boehm's data imply that the longer an error remains undetected and uncorrected in a system, the greater the cost of the eventual error correction.Surprisingly, this effect does not appear in the SCR data; the correlation between days in system and average effort is 0.07, which is not significant at the 0.05 level.There may be several reasons for this.The first is that SCR requirements change data are not in- cluded here.The second is that the changes reported here can be considered to be only design-phase changes, and more of the SCR project's life cycle might have had to pass before any relationship appeared.The third is that there were many very low effort changes.And the fourth, of course, is that the SCR methodology may, indeed, have lessened the impact of long-term unresolved errors!The information hiding principle was used in the SCR project for identifying and specifying a hierarchy of design modules [ 2 5 ] .A module was supposed to hide a likely changeable aspect of the A-7E flight software.This meant that a module's interface specification must be written such that the hidden information was not revealed; that is, a module's hidden information was available only to the implementors of that module.The anticipated result was that, when an expected change occurs, only one or two low-level module implementations (i.e.. no interfaces) would need modification.Fig. 14 presents the distribution for the number of lowest-level modules updated by changes (i.e., the ripple efi'ect of changes).Such modules were considered to be "updated" if their interface specifications (implementation documents.or code) were updated.unless the updates were to ancillary items such as indexes and tables of contents.Most early SCR changes (90%) updated zero or one modules, and this percentage is relatively constant.The data presented in Fig. 15 are a special case of the data presented in Fig. 14.Fig. 15 presents the distribution for the number of lowest-level modules which had interface specifications updated (i.e., in- terfaces updated because of changes).A module interface is considered to be "updated" if a change to its specification (or implementation document.or code) caused, or would have conceivably caused, a change to programs of other modules that use, or would eventually use, capabilities provided by the module.Examples of interface updates are the moditication of a parameter type and the addition of a sysgen parameter.The percentage of early SCR changes that resuited in updated interface updates (56%) was growing.The percentage of changes updating two or more interfaces (12%) was also growing.These latter trends seem to suggest that a greater ripple effect and a more uniform distribution of change effort could have been expected later in the SCR pro.ject.

C. Clzarige Data Related to Personriel Acri1,ity Datu
SCR project engineers reported their activity weekly using activity forms designed by STE researchers (see Norcio and Chmura) [20].The design and code data can be related to collected personnel activity data because origination activity was captured for each CRF (see Fig. 2).Fig. 16 presents the ratio of the cumulative changes uncovered during specific SCR activity (i.e., design, code, and test) to the cumulative project hours expended on that activity.Fig. 17 presents the ratio of cumulative hours for changes uncovered during an activity to the cumulative pro.ject hours expended on the activity.Interestingly, both show a similar pattern.Coding activity, which also includes pseudo coding activity, was the most "ef-  ficient" way for uncovering needed niodifications and errors, followed closely by testing activity.But. this was true only initially.In the long run for the SCR project, it seems that that design, code, and test activity were all equally efficient in terms of uncovering the need for changes.It should be noted, however, that the amount of coding (6504 hours) and testing (1 188 hours) that accumulated by January 1984 were small compared to the amount of design (21 742 hours).
The ratio of cumulative error corrections to cumulative project work months and the ratio of cumulative accepted changes to cumulative project months appear in Fig. 18 (one work month equals 160 person hours).Although the ratios appear to be increasing, both are small compated to the data reported by Weiss and Basili [29].They report approximately 2-3 error corrections per work month.

IV. DATA ANALYSES
In previous analyses of SCR personnel activity data, Norcio and Chmura discovered that one ratio between two subactivities of SCR design activity correlates significantly over time with the cumulative design hours for the module [20].The ratio is between a SCR module's cumulative design discussing hours and its cumulative design creating hours.The ratio has been referred to as the progress indicator ratio (PIR).When the release dates for module specification baselines are examined with respect to a graph of the PIR, patterns are readily apparent that may indicate relative instability of the module interface specification.SCR module interface specifications were rarely updated more than twice after this ratio became "stable" (i.e., showed small monthly change).In other words, if a specification baseline is issued before the ratio rises sharply or during a sharp rise, such a pattern seems to suggest that the baseline is probably far from complete.
A major complication with the PIR is that it requires a data collection scheme that accurately captures intricate information about personnel activity during the design process.Even though this seems possible to do accurately [7], it is fair to say that few software development efforts could readily afford and tolerate the collection operation.Because many design efforts routinely record software change data, we have looked at the SCR change data for information similar to that provided by the PIR.Fig. 16 suggests an alternative-a ratio between cumulative CRF's uncovered during design of a module and cumulative design hours for the module.It is an attractive al- ternative because intuition suggests that a module's interface design might be unstable while its designers are generating and resolving CRF's.
Table 111 lists some of the second-level modules of the multilevel hierarchy of information-hiding modules resulting from the SCR design activity [ 5 ] .These modules had interface specifications with one or more baselines by January 1984.For each of the modules, two time-based ratios between the number of CRF's resulting from that module's design activity and the module's cumulative design hours can be computed and plotted.One ratio is based upon CRF date of origin; the other on date of resolution.Table IV is a summary of the data underlying these ratios for the modules listed in Table III.4

A . Date of Origin Ratio
For each module, the date of origin ratio (DOOR) is defined as the ratio of the cumulative CRF's by date of origin uncovered during design of the module to the cumulative design hours for the module.DOOR's for SCR modules are presented in Figs.19-23.The vertical lines in these figures indicate isue dates for module specification baselines.Pearson product moment moment correlation coefficients (r) and coefficients of determination (2) between DOOR's and the original PIR's for each module with ten or more CRF's are presented in Table V [ 141.The time period over which correlations are computed begins with the date of origin of the earliest CRF as presented in Table IV.
As can be seen in Table V, the correlation between DOOR and PIR for FD module is negative.This is not a problem.It merely means the two ratios are slightly oscillating in opposite directions.The important and significant point is that (2) is necessarily positive and signifi-'Even though the number of CRFs for the AT module is only 2. these data are reported here and in Figs. 14 and 24 for completeness.These data for this module were not used in the subsequent statistical analyses.cantly high, which means that both ratios are behaving in very similar fashions.

B. Date of Resolutiorz Ratio
The date of resolution (DORR) is the same as the DOOR except that CRF date of resolution is used rather than date of origin.DORR's for SCR modules are presented in Figs.24-28.Again, vertical lines indicate baseline issue dates.Pearson product moment correlation coefficients ( r ) and coefficients of determination (r') between DORR's and the original PIR's for each module with ten or more CRF's are presented in Table VI [ 141.The time period over which correlations are computed is the same as for the DOOR.

C. Possible Iiylicatioris
Analyses of the design CRF data suggest that. in sonic cases, fairly simple change and personnel activity data may be used as an alternative to the originally proposed PIR.The DOOR'S and the DORR's for modules with a significant number of design changes show a strong relationship to the original PIR's.The DOOR explains 52.97 and 46% of the variation in the original PIR's for the DI.EC, and FD modules; the DORPIR.49, 94, and 50%.
When issue dates for published baselines are superinposed upon the DOOR and DORR plots, patterns remniscent of those observed with the original PIR are ob- served.Baselines that appear during times of instability in the DOOR or DORR are soon followed by other baselines.For module designs that have been specified with only one or two baselines, one sees a prior instability with the DOOR and DORR, a downward trend, issuance of the baseline, and then relative stability.For other modules.this pattern is lacking for one or more of the earlier base- lines.In other words, the DOOR and DORR both may indicate the incompleteness of interface specifications.If these ratios have not surged and then turned downwards prior to appearance of a baseline and, subsequently stabilized, then the design of the module's interface may not be complete, irrespective of the claims of software engineers and the information in published documents.

V. SUMMARY A N D CONCLUSIONS
A study of the SCR project's early change data and analyses of time-based relationships shows the following.
1) There was a high proportion of error corrections and error correction effort, although time-based plots of these statistics show that both were on the decrease.
2) The percentage of error corrections that involved completing or correcting a prior change was far higher than has ever been reported, and this percentage was increasing.
3) The percentage of changes that took a day or less to resolve was extremely large, but was decreasing.Consistent with this decrease was a stepwise growth in average change effort, a growth in the percentage of changes that involve modifying module interfaces, and a growth in the percentage of changes involving two or more module interfaces.
4) Surprisingly, no relationship was shown between change effort and number of days that an error exists in the documentation.5) Coding activity, followed by testing activity, was the most efficient way of uncovering needed modifications and error corrections.In the long run, however, it seems that design, code, and test activity were all equally efficient.
Analyses of the design CRF data and their relationships to personnel activity data show two ratios that may be useful to design managers in assessing the progess of the software design process.Referred to as the DOOR and DORR, the ratios exhibit patterns seemingly related to the incompleteness of interface specifications.If these ratios have not surged and then turned downwards prior to the appearance of a baseline and, subsequently stabilized, then it would not be surprising to see several more specification baselines in the future.The ratios are attractive alternatives to an earlier-reported PIR ratio because they are based on simple design activity data and on change data close to the kinds typically collected on software projects.
There are some drawbacks to the DOOR and the DORR as potential indicators of design progress.One is that they are later indicators as compared to the original PIR.Another is that they are based heavily on the responsiveness and timeliness of a project's change control process.If changes are not resolved promptly, any potential relationships between these ratios and design progress may be weakened.
It must be noted that we do not claim that the DOOR or DORR are measures of the completeness of an interface design.There may be many reasons why the ratios stabilize (e.g., personnel have been assigned to another module or have taken vacations).For SCR modules, however, the ratios do show readily apparent patterns that are strikingly different for modules with a history of many specification baselines than for those without such a history.
[21].A standard organization for such specifications was described by Clements et al. [ 1 I].
[ 11 report 36% of the error corrections took an hour or less; 55% took a day or less.For errors uncovered and corrected late in the Wuhan University Problem Analysis Diagram Translator project, Xu reports 24% of the error corrections project took an hour or less; and 80% took a day or less [30].

-
Fig 18 Ratios of cumulative accepted CRF's and error corrections to cumulative project months.

Fig. 19 .
Fig. 19.Date of origin ratio for AT.