PROBLEM
Existing research has reported the existence of recurring bug fixes, i.e. the bug fixing changes that are identical or similar. In this project, our ultimate goals include
1. Understanding the nature and the characteristics of recurring/similar bug fixes.
2. Leveraging such understanding to help developers in bug fixing process.
APPROACH
1. Key philosophy
To explain the existence of recurring fixes, FixWizard is based on the following philosophy:
Due to the common practice of software reuse, in a large object-oriented program, there tends to exist several objects that play similar roles, i.e. provide similar functions and/or perform similar interactions with the other objects. Those functions and interactions are realized by methods and classes having similar code and/or being similarly used in several usage scenarios. Then, when such functions and interactions need to be changed (e.g. due to bug fixing), the corresponding code/usage are changed in the similar manner, thus, creating recurring bug fixes. We call such program entities (methods/classes) with similar usages/interactions/functions as code peers.
In brief, the key philosophy/hypothesis in FixWizard is "code peers tend to have similar changes, thus, causing recurring fixes".
2. Empirical study
To verify our hypothesis and analyze the characteristics of recurring fixes, we have conducted a two-part empirical study. In the first part, a group of 7 experienced programmers examined all fixing changes of the subject systems and manually identified the similar ones. Then, in the second part, we analyzed their reports to characterize such recurring fixing changes and the enclosing code units. From the empirical study, we observed that:
- A considerable portion of fixing changes (17%-45%) is actually recurring. A large percentage of recurring fixes (85%-96%) was made at the same revision.
- A large percentage of recurring fixes occurs on code peers, i.e. methods/classes having similar functions/usages and/or interactions. Peer classes tend to have several peer methods. To represent usages, we use our graph-based representation models developed in GrouMiner, see ref. [FSE 2009].
- Code peers and their recurring changes involve similar usages (e.g. method invocations, usage orders, etc).
- Code peers often have similar structure and/or names, and are related via inheritance or interface.
Such observations confirm our hypothesis on code peers and recurring fixes. They also provide important characteristics to build an automatic tool to identify such code entities and the recurring changes, and an automatic tool to recommend a fix to other code peer locations.
The detailed report and some interesting examples from our empirical study could be found here.
3. Prototype tool
From the experimental study on code peers and recurring fixes, we develop FixWizard, a prototype tool to support developers with recurring bug fixes. The main task of FixWizard is to identify the code peers existing in the program, and when a code unit is fixed, FixWizard will recommend the similar fixes to its peers (if the peers exist). The working process of FixWizard is illustrated in Figure 1.

Figure 1: FixWizard's working process
Identification of Code Peers
FixWizard identifies code peers as "code units (methods/classes) having similar object usages, internally and/or externally". 1. Similar internal object usages mean that within the bodies of two code units, those code units call other methods of other classes in a similar manner (i.e. interacting with other objects in a similar way). 2. Two code units with similar external usages mean that those code units are used by other objects in other scenarios in a similar way.
Object usages are represented by our graph-based models developed in project GrouMiner ref. [FSE 2009]. Similarity among such graph-based models is based on our vector-based approach Exas (refs [ICSE 2009, FASE 2009]).
The first step of code peer identification is finding the candidates. To avoid pair-wise comparison among all code methods/classes for code peer identification, we use heuristics to find candidates. That is, the classes/methods with either one of the following properties will be considered as code peer candidates:
- The similarity in structure of source code (structural code clones), or
- The similarity in naming, inheritance, interfaces, or
- The similarity in changes in the past (i.e. actual recurring fixing changes). E.g., code units having actual recurring fixes are considered as candidates of code peers.
Among such candidates, FixWizard compares the object usage scenarios in which they are involved in. To do such comparisons, FixWizard uses the vector-based similarity measure Exas (ref. [FASE 2009]) on the representation graphs of usages (developed in GrouMiner - ref. [FSE 2009]). Candidates having similar object usages are considered as code peers.
Recommendation of Fixes on Code Peers
When an actual fix was made to a code unit X, FixWizard will first identify the code peer(s) of that code unit (via code peer identification).
And then, to recommend a fix that was made to X to its code peer Y, FixWizard will
- Determine structural changes of X along with tree edit operations (Our tree edit script algorithm ref. [ASE 2009], Treed, is used to derive the operations),
- Identify the changed parts of the object usage graph for X, and map them to changed sub-trees in X's AST,
- Map the changed subtrees of X to the corresponding ones in Y,
- Map the nodes in changed subtrees of X to the changed nodes in Y, and
- For each mapped node in X, derive similar edit operation (if any) of each changed element on Y.
Evaluation on the tool can be found here with latest results.
References
[FSE 2009] Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen, Graph-based Mining of Multiple Object Usage Patterns. In Proceedings of the 7th Joint meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE 2009). Distinguished Paper Award.
[ASE 2009] Tung Thanh Nguyen, Hoan Anh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi, and Tien N. Nguyen, Clone-aware Configuration Management. To appear in Proceedings of the 24th ACM/IEEE International Conference on Automated Software Engineering (ASE 2009).
[ICSE 2009] Nam H. Pham, Hoan Anh Nguyen, Tung Thanh Nguyen, Jafar M. Al-Kofahi, and Tien N. Nguyen, Complete and Accurate Clone Detection in Graph-based Models. In Proceedings of the 31st ACM/IEEE International Conference on Software Engineering (ICSE 2009), pp. 276-286.
[FASE 2009] Hoan Anh Nguyen, Tung Thanh Nguyen, Nam H. Pham, Jafar M. Al-Kofahi,
and Tien N. Nguyen, Accurate and Efficient Structural Characteristic Feature
Extraction for Clone Detection. In Proceedings of the 12th International
Conference on Fundamental Approaches to Software Engineering (FASE 2009),
pp. 440-455.