Problem Statement

Model-Driven Engineering (MDE) has become an important development framework for many large-scale software systems. Matlab/Simulink is a popular MDE tool for designing and modeling software in many products from small electronic control software to large-scale flight control systems. With Simulink, models are primary software artifacts. Models are the collection of logical entities which describe a system at multiple levels of abstraction and from a variety of perspectives.

Previous study showed that with the nature of using graphical editors for models, cloned fragments of Simulink models often occur in a project. Cloned fragments are the exactly matched or similar fragments of Simulink models. Similar to traditional source code clones, clones in Simulink models require additional efforts for maintenance and management. For example, changes to one place must be carried out multiple times for all occurrences of clones. Therefore, detecting clones in models plays the same important role as in traditional software development.

Unfortunately, there have been very few work on detecting clones in models. ConQAT represents the state-of-the-art of clone detection in MDE. However, it has several limitations. The most important limitation is its inaccuracy and low degree of completeness in detection. The authors reported that several clones were not detected
(e.g. small clones are covered in larger clone pairs). It was also reported that many detected clones by ConQAT are not interesting to the developers even though they are clones according to ConQAT's definition. Several detected clone groups are inaccurate and do not carry much meaning for developers. Another key limitation is that ConQAT algorithm tends to find as large clones as possible. They are sometimes too large and not useful, and do not correspond well to copy-pasted fragments. Users are easily confused when ConQAT reports such large clones in a graphical editor. Most importantly, ConQAT could not detect approximate clones where two parts of a model have slight differences. These cases occur often when users make a copy and then modify it.

It is neccessary to have a method that provides a more accurate, complete, and scalable solution for the detection of clones in models.