Zhenchang Xing - JDEvAn

Home Updated on: January 2008
----
Contact Info
----
Curriculum Vitae
[ Print: PDF ]
----
Research Statement
[ Print: PDF ]
----
Teaching Statement
[ Print: PDF ]
----
Interview Talk
----
Tools
JDEvAn
JDEvAn Viewer
Diff-CatchUp
AutoCVSCommit
----

The tool, JDEvAn (Java Design Evolution and Analysis), is the research prototype that I have developed during my PhD study at University of Alberta. JDEvAn is used to implement and evaluate my PhD work on design-evolution analysis in support of evolutionary software development. The goal of JDEvAn is to integrate within the development environment, and enable investigating the change patterns of object-oriented software evolution, exploring the underlying motivations behind them, and guiding future development and maintenance activities.

JDEvAn includes the backend database and the frontend Eclipse plugin. The very first version of JDEvAn can be downloaded below, which includes the backend database templates and extensions and the JDEvAn frontend Eclipse plugin. The JDEvAn frontend Eclipse plugin implements the UMLDiff algorithm (the basic algorithm was published in [9]; the current implementation is the revised algorithm, which includes some more features such as taking into account the lexical javadoc similarity and transitive-closure usage similarity when necessary, using the inheritance, containment, usage dependencies to guide the identification of renamings and moves, and multiple rounds of renaming/move recognition, etc.; the revised version UMLDiff was published in [2]); the change-tree visualization and navigation of UMLDiff change facts ; the interactive correction of erroneously reported changes and interactive recovery of missed instances of changes when inspecting the UMLDiff results via the change trees; and the queries for design change patterns, such as instances of refactorings. In addition to change-tree visualization, JDEvAn is now (since December 2006) equipped with a separate UML-style visualization componenet - JDEvAn Viewer, which allows the interactive exploration and analysis of the UMLDiff change facts and the detected change patterns.

Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/jdevan_src_all.zip
Requirement: PostgreSQL 7.4.5 or later, Eclipse 3.1, Java 1.5.
NOTE: This is the initial version of JDEvAn (I am still writing the End User License. Basically, this JDEvAn tool is for academic purpose and it would be under kind of GNU license), including database template, database server-side extension for the computation of transitive-closure relations and the plugin source code. I am still working on it. The code has not yet been cleaned up. It may still undergo substantial changes. Furthermore, the memory and time performance are not my major concerns at current stage. Some tasks, such as the computation of transitive closure and the UMLDiff process, may take several hours or even the overnight job, depending on the machine you run the database and plugin. However, it works on all my case studies so far. The machine I am using now is an Intel Centrino 1.6GHZ with 768M physical memory; JDEvAn’s database (PostgreSQL 7.4.5) runs on a linux workstation (the VMWare guest operating system), and its front-end Eclipse plugin runs on Windows XP Professional on the same machine.
UPDATE:
1. December 20, 2006. entity table defines one more column "handle". The JDEvAn plugin source, FieldContext, MethodContext, and TypeContext classes, have been modified to assert the handle of the reverse-engineered Java elements. See update explanation of entity table for detailed information.
2. December 20, 2006. Furthermore, The classes UMLDiffer, Field, Callable, RefType have been modified to allow the computation of the attribute/usage differences of isfromsrc=false elements and the usage differences of the newly-added/removed elements in order to simplify the queries used by JDEvAn Viewer.
3. December 20, 2006. Fixed a fact extraction bug with the extra dimensions of parameter/variable/field declaration (type identifier {[]}).

In the early implementation of JDEvAn (which depends on a prolog-like factbase), I wrote Java programs within JDEvAn Eclipse plugin to aggregate the elementary change facts reported by UMLDiff at the class and system level in order to construct the class- and system-evolution profiles, on which the third-party tools, such as WinPhaser and Weka toolkit, were used to perform subsequent sequential and data-mining analyses. The analysis results was then displayed in JDEvAn's class- and system- evolution views (such as class evolution histogram, system evolution matrix, etc.). Note that this early version of JDEvAn has never been released, but some of its screenshots appeared in some of my early publications. These third-party tools do not scale well as my case studies become larger. They limit the analyses I can conduct on the large-scale industrial-wise software system, such as Eclipse. I am now exploiting the use of OLAP (On-line Analytical Processing) to address the scalability issue and enable the multilevel, multidimensional evolution analysis on large software systems. Thus, I removed the class- and system-evolution analysis components from JDEvAn. In the future, I think they will be designed, implemented and delivered as OLAP models that are built on the JDEvAn's database that stores the ground facts about the design models and their evolution.

The followings are the results of two of my empirical case studies, HTMLUnit and JFreeChart.

Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampledatabase/htmlunit_tmpl.zip
Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampledatabase/jfreechart_tmpl_094_056.zip
Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampledatabase/jfreechart_tmpl_0911_094.zip
Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampledatabase/jfreechart_tmpl_0917_0911.zip
Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampledatabase/jfreechart_tmpl_100_0917.zip
NOTE: These are the dumped databases with all the processes but UMLDiff done.
UPDATE: December 20, 2006. NOT yet updated with the new column "handle" of entity table.

Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampledatabase/htmlunit_r40_m30.zip
Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampledatabase/jfreechart_094_056_r40_m30.zip
Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampledatabase/jfreechart_0911_094_r40_m30.zip
Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampledatabase/jfreechart_0917_0911_r40_m30.zip
Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampledatabase/jfreechart_100_0917_r40_m30.zip
NOTE: These are the dumped databases with UMLDiff renaming and move threshold being set at 0.4 and 0.3 respectively.
UPDATE: December 20, 2006. NOT yet updated with the new column "handle" of entity table.

Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampleanalysisset/htmlunit_tmpl.zip
Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampleanalysisset/jfreechart_tmpl_094_056.zip
Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampleanalysisset/jfreechart_tmpl_0911_094.rar
Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampleanalysisset/jfreechart_tmpl_0917_0911.rar
Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/exampleanalysisset/jfreechart_tmpl_100_0917.zip
NOTE: These are the frontend analysis set configuration files and the corresponding directories that should be put under workspace/.metadata/.plugins/cs.ualberta.jdevan.

All my case studies have been done on the same machine that runs the host operating system and the guest operating system with VMWare or on two machines on local area network. Recently, I have been working at NRC-IIT in Ottawa. I found out that using JDEvan frontend plugin with a remote JDEvAn database (the frontend plugin runs on NRC machine, while the JDEvAn database runs on my univerity machine) is much slower than using them on the same machine or machines on LAN, because of the much lower data transfer speed from the frontend plugin to the backend database. This is especially obvious during the fact extraction process and the usage-dependency differencing step of UMLDiff process. However, the time required to execute SQL queries and retrieve query results seems normal. I do not know what exactly cause this problem for now.

Ackonwledgement: I would like to thank my thesis supervisor Professor Eleni Stroulia for all her thoughtful assistance and excellent advice and suggestions on the development of JDEvAn tool.

Backend database

JDEvAn stores all the extracted ground model facts, the UMLDiff changes, and the analysis results in the PostgreSQL 7.4.5 or later databases. Each analysis set has its own corresponding database.

Transitive closure computation

Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/backend/jdevan_transclosure.zip
NOTE: You need to compile and link the program and then put jdevan.so and jdevan.sql in the PostgreSQL libdir and datadir respectively. The Makefile depends on the PostgreSQL source. My working directory is postgresql_sourceroot/contrib/jdevan/. If you do not want to modify Makefile, you probably need to put the .c, .h, and .sql.in files under that directory structure. Furthermore, you need to configure and compile PostgreSQL source first before compiling jdevan transclosure server extension.

To address the fact that the relational database lacks recursive computation capabilities, which is essential to computing the transitive closure of various relations among entities, Simon’s transitive closure algorithm has been implemented as a PostgreSQL database server-side extension (written in C) to compute, at the end of the fact-extraction process, the transitive closure of the containment and inheritance hierarchy, field read/write and method call and class creation, and class/interface usage relations, which are populated in the corresponding transclosure table.

The extension can be called: SELECT * FROM transclosure('tableorview_name', 'firstcolumn_name', 'secondcolumn_name', iscylic) AS tc(id1 INTEGER, id2 INTEGER). This extension actually can be applied (I think) to compute the transitive closure of any relations, as long as you have a table or view with at least two columns of type integer. These two columns define such relations as parent-child, usage dependencies, etc., between two entities. You also need to specify whether the relation is cyclic or not. It returns all the transitive closure pairs of entities.

Sometimes, the environment variable $libdir is not properly defined during database installation, which results in the failure of locating $libdir/jdevan.so when executing transclosure() server extension. The problem can be resolved by simply replacing $libdir with the directory where jdevan.so is in jdevan_template.sql file and recreating and reimporting jdevan_template database. You can test whether transclosure() works as follows:

First, create a test database with jdevan_template as template. On psql command line, enter CREATE DATABASE testdb WITH TEMPLATE = jdevan_template. Or on linux command line, createdb testdb, and then psql testdb -f jdevan_template.sql.

Secondly, psql testdb, and then insert a few rows in relation table, such as INSERT INTO relation(type, eid1, eid2) VALUES(1, 1, 2), INSERT INTO relation(type, eid1, eid2) VALUES(1, 2, 3), etc.

Finally, on psql command line, execute SELECT * FROM transclosure('relation', 'eid1', 'eid2', false) AS tc(id1 integer, id2 integer) ORDER BY eid1, eid2. The last parameter should be true if the data you enter in the second step has cycles. If transclosure() has been setup properly, you should be able to see its output rows.

Reference: K. Simon. An improved algorithm for transitive closure on acyclic digraphs. Theoretical Computer Science 58, Automata, Languages and Programming, 376-386, 1986.

The jdevan_template database

The jdevan_template database defines the meta information and the initial state for any individual analysis set database. The analysis set databases are created within front-end Eclipse plugin using "CREATE DATABASE analysisset_name WITH TEMPLATE = jdevan_template" command. And then they are populated with the ground model facts and the UMLDiff results.

Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/backend/jdevan_template.sql
NOTE: jdevan_template database needs to be created first before you start any real work (On the linux command line, "createdb jdevan_template" and then "psql jdevan_template -f jdevan_template.sql").
UPDATE:
1. December 20, 2006. entity table defines one more column "handle". See update explanation of entity table.
2. December 20, 2006. Populated with queries defined in JDEvAn Viewer's visualize.sql (see JDEvAn Viewer).

The intention/usage of database tables are discussed as follows:

pjsnapshot table: The pjsnapshot table contains the system snapshot information (corresponds to the <system ...> node of analysis set). It also maintains the information about what have been done for a given snapshot: classusedone = true if "Aggregate class usage" is complete; tcdone = true if "transitive closure computation of relations" is complete; pkusedone = true if "Aggregate package/subsystem usage" is complete; umldiffdone = true if "UMLDiff the given snapshot and its immediate predecessor" is complete; sysevodone and discdone are no more used. There is a predefined virtual snapshot 0 that contains nothing.

entity table: Each extracted entity has a unique id, which is referred in relation, transclosure, status, statustc, and some other tables. Each entity is described in terms of its category (in bitwise form, defined in entity_category table), name, visibility and non-access modifiers (in bitwise form, defined in modifier table. I also use this field to remember if the entity is deprecated and/or javadoced, if the field is initialized, if the method is overloading), whether it belongs in the system source code or in a library [isfromsrc], location in the source file [srcloc. a byte stream that stores the information used to open the Elipse Java source editor and highlight the entity], the version of the system in which it belongs [snapshotid], and its javadoc (all the material before the first explicit doc tag; if it is empty string, consider the entity as not being javadoced). The name of array types is in the form of “basetype_qualifiedname.dimension”. The name of packages, named classes, interfaces and fields is their declared identifier. The name of methods and constructors is in the form of “identifier(paramtypetype_list)”. JDEvAn’s fact extractor assigns names to anonymous classes and initializers as follows: for anonymous classes, “new supertype_identifier”; for class initializers, “{classidentifier_initializer}”; for field initializers, “{field_identifier=…}~initializerstring”. Finally, a fully-qualified prefix is added in front of the names of library entities.
IMPORTANT UPDATE: December 20, 2006. entity table defines one more column "handle", which holds the string representation of the handle of Eclipse Java element (IJavaElement.getHandleIdentifier()) being reverse-engineered. Such handles are used to recreate the corresponding Java elements for bringing up the Eclipse Compare Dialog to show the text comparison results of the mapped fields, methods, or constructors from within JDEvAn Viewer (see Figure 8 in JDEvAn Viewer) . If your are new users of JDEvAn and JDEvAn Viewer, this will not affect you. If you already have some JDEvAn databases and you want to use them with JDEvAn Viewer, you have to manully modify entity table (ALTER TABLE entity ADD COLUMN handle text; ALTER TABLE entity ALTER COLUMN handle SET DEFAULT ""). However, since this is just a bypass, you do not really populate handle column, double-clicking a mapped field/method/constructor from within JDEvAn Viewer will not bring up the Eclipse Compare Dialog. If you really want to use that feature of JDEvAn Viewer, you have to redo the fact extraction process.

relation table: The relation table contains tuples of the form (relation_type, v1, v2), where v1 and v2 are entityid and relation_type is a UML dependency between them (defined in relation_type table). The number of times of each distinct usage relation (including use/classuage/pkgusage/subsystemuse, creator, throw/actualthrow/catchexception) and localvariable relation is recorded in relcount table. All except the following relation_type are straightforward. 1) A callable entity (method or constructor) has a methodparam relation with each of its parameter entities; the parameter entity has a paramtype relation with a type entity; A callabe entity also has a methodparamtype relation with each of its distinct parameter types (may be less than the number of its parameters; this is just for efficient-accessing the information, you can always get this piece information by retrieve methodparam and then paramtype). 2) A block entity may have throw (just for callable) , actualthrow, and catchexception relations; throw is for throw clause, actualthrow is for throw statement, catchexcpetion is for catch clause. 3) A block entity has a localvariable relation with each of its distinct local variable types. 4) The usage relations of the initializer entities and the members of anonymous classes are transitively passed to their containing named entities; but such relations are bitwised with transusage to be distinguished from the normal usage relations.

transclosure table: The transclosure table contains the transitive closure relations of the containment (type=1) and inheritance (type=6) hierarchy, field/method usage (type=33272), and class/interface usage (8192) relations.

generaloverrides/overrides/implements/topmostoverrides tables: These tables contains override+implement, only override, only implement, and tompost override relations between two method entities.

status, statustc, wronggeneralmatch, similaritypoint table: The UMLDiff changes are stored in the table status in the form of <scategory, stype, prev, next>. scategory and stype represent different categories and types of structural changes as defined in scategory and stype table. prev and next are integer array containing the id of entity, and if necessary, the id of the related entity and their attributes. The statustc table records the evolution traces of mapped entities. When inspecting change-trees, you can manually remove the erroneously identified genenal-matches and the status entry will be copied to wronggeneralmatch table for later investigation; you can also manually identify missed general-matches, the matchpoint field of these status entries will be -1. The similaritypoint table contains all the information for the computation of matchpoint of identified renamings and moves, which you can inspect to get the clue about something like why this renaming was erroneously reported.

Other tables and views: There are some other tables and views that are used internally by JDEvAn, which the ordinary JDEvAn users should not care about.

Frontend Eclipse Plugin

The JDEvAn frontend has been developed as an Eclipse plugin. This first version has only been tested on Eclipse 3.1.0 on Windows XP. But I don't think there is something (at least not too much) specific for Windows platform.

Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/frontend/jdevan_plugin.zip
NOTE: You need to compile the source code and export jdevan.jar and copy it, together with plugin.xml and icons and lib directory, to eclipse_install_dir/plugins/cs.ualberta.jdevan/.
UPDATE:
1. December 20, 2006. The classes FieldContext, MethodContext, and TypeContext, are modified to assert the handle of the reverse-engineered Java elements. See update explanation of entity table. Furthermore, the classes UMLDiffer, Field, Callable, RefType have been modified to allow the computation of the attribute/usage differences of isfromsrc=false elements and the usage differences of the newly-added/removed elements in order to simplify the queries used by JDEvAn Viewer.
2. December 20, 2006. Fixed a fact extraction bug with the extra dimensions of parameter/variable/field declaration (type identifier {[]}).

The analysis set

You need to define an analysis set for the project whose evolution you want to analyze. The analysis set is defined in XML files. It contains the database connection information, the UMLDiff parameters, and the system versions to be compared. Different analysis set can have completely different settings. The followings are two examples of analysis sets. One is for the 11 versions of HtmlUnit project; the other is for the version 2.1.3 and 3.0 of Eclipse JDT-related plugins.

Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/frontend/htmlunit_tmpl.xml
Download: http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/frontend/jdt_30_213_tmpl.xml
Download:
http://www.cs.ualberta.ca/~stroulia/Zhenchang_Xing_Old_Home/jdevan/frontend/jdt_30.txt
NOTE: The processed attribute indicates if the fact extraction process for a given subsystem/package is complete, which initially should be false and will be set ture by JDEvan plugin after the fact extraction of the corresponding subsystem/package is complete.

The analysis set name (not file name) is also the database name to be created. The database server and port, the database user name and password (plain text, not encrypted for now) are also specified.

The NameSimilarityMetric specifies which one of three name-similarity metrics will be used by UMLdiff: 1=the commom adjacent character pairs; 2=the character-based longest common subsequence; 3=the word-based longest common subsequence (the sequence of words is produced by splitting the declared identifiers using underscores, dots and case switching as delimiters).The RenameThreshold and MoveThreshold are the minimum similarity values between two entities in the two compared versions in order for these entities to be considered as the same conceptual entity renamed or moved. UMLDiff allows multiple rounds (RenameRound and MoveRound) of renaming and/or move identification in order to recover as many renamed and moved entities as possible. The similarity of their Javadoc comments (JavadocSimilarity) may also be taken into account when comparing two entities, if the compared entities have an initial similarity above the JavadocThreshold; this prevents entities with very low name and structure similarity from qualifying as renamings or moves just because of their Javadoc comments. The similarity of transitive-closure relations (TransclosureSimilarity) between two compared entities may also be used to assess similarity. The DiffUsage and DetermineMoveBehavior parameters instruct the UMLDiff about whether or not to compute the usage-dependency differences for all entities and analyze the redistribution of method behavior (in terms of field read/write, method call, and class instantiation) at the end of UMLDiff process.

One analysis set may include one or more system snapshots (can be the major releases or just the snapshots checked-out from repository). The system (you can think of a system as a special top-level subsystem) and subsystem are used to organize nested subsystem and/or packages. You can define the subsystem in a free manner, but you must finally have some subsystems correspond to the Eclipse projects opened in Eclipse workspace. The HtmlUnit example below is the simplest one. It defines 11 system snapshots, which corresponds to 11 Eclipse projects. All the packages (<package name="*" ...) in these projects are direct children of the corresponding system. The Eclipse JDT example is a bit complicated. It defines two system snapshots for version 2.1.3 and 3.0, which contain six and nine second-level subsystems respectively. These second-level subsystems correspond to the different JDT plugin project in Eclipse workspace. They may also contain nested subsystems. You must finally specify which package(s) each subsystem contains (cannot use <package name="*" ... I know it is boring to do that. I can provide a little bit help. You can modify plugin.xml and uncomment the action of "Generate Analysis Set Initial File", and then define an Eclipse workset that includes the projects. This action will output a text file (like jdt_30.text) in the directory workspace/.metadata/.plugins/cs.ualberta.jdevan/. You can then copy and paste to the analysis set file. It would not be difficult to output analysis set file directly in XML format. Probably I will add it in later.). The subsystem/package hierarchy can be completely different from the way in which Eclipse project is organized. The processed attribute indicates if the fact extraction process for a given subsystem/package is complete, which initially should be false and will be set ture by JDEvan plugin after the fact extraction of the corresponding subsystem/package is complete.

JDEvAn supports the incremental analysis. When you have new release(s) or snapshot(s), you can simple add them at the end of the analysis set file. JDEvAn will only do the fact extraction, the UMLDiffing, and the analysis for those newly added versions. Removing or adding one or more snapshots between any two snapshots is possible. But that can only be done manually now (by editing database directly). No straightforward support is available at current stage from within JDEvAn Eclipse plugin. However, I think this is a desirable feature. For example, when UMLDiff results between two major releases show poor precision and recall, one may want to add some monthly snapshots in between them (thus, shorter the time lapse between compared snapshots) so that he may get a better results about what have been changed and how. In contrast, if there is not much changes between several consecutive snapshots, one may want to remove those snapshots in between in order to decrease the database size and get better query performance. I may include support for this feature in the future..

The fact extraction and UMLDiff

Given an analysis set, you should have the corresponding Eclipse projects opened in the workspace as follows:

htmlunit_eclipseproject

To use JDEvAn plugin, open Design Evolution Analysis perspective and activate Design Evolution Analysis view. And then go through the following steps: Open analysis set -> Aggregate class usage -> Compute transitive closure -> Aggregate package/subsystem usage -> UMLDiff. Some takes are potential time-consuming. You can cancel the current task at any time. However, to maintain the data integrity, the JDEvAn plugin will only check cancellation requests after finishing: fact-extracting the current source file, after finishing aggregating usage or computing transitive closure for the current snapshot, after finishing UMLDiffing the current two snapshots. Relying on the job feature of Eclipse 3.1, JDEvAn can run as many analysis jobs as you want (of course, you need a very good machine to do that. You have to consider things like physical memory, database performance, etc.). Different jobs will not interfere one another.

After each step, some database tables will be altered greatly. I found that running PostgreSQL command "vacuum analyze affected_table_name" can improve the database performance.

Action affected tables
Open analysis set entity, relation, relcount, access_loc
Aggregate class usage relation, relcount
Compute transitive closure transclosure, generaloverrides, overrides, implements, topmostoverrides
Aggregate package/subsystem usage relation, relcount
UMLDiff status, statustc, similaritypoint, diffusage, containchange, inheritchange, status_umldiff

open_analysiset

Open analysis set displays an open file dialog to let you choose an analysis set file. The file can be stored anywhere. But I normally place them in workspace/.metadata/.plugins/cs.ualberta.jdevan directory. (The first time you open Design Evolution Analysis perspective, it will create subdirectory cs.ualberta.jdevan in workspace/.metadata/.plugins directory). After selecting and opening a file (there is no validation. you must make sure it is in the right format), JDEvAn will attempt to connect to the dabase as specified; if the database with the given name does not exist, it will create the database using jdevan_template. If there exist any project(s) that are not yet fact-extracted, the fact extraction process starts automatically. Finally, a tabbed view (contains General Query view, Inheritance change-tree view, Containment change-tree view, Refactorings view) will be opened as follows. These views will be explained with some details below. You can then Aggregate class usage, Compute transitive closure, and Aggregate package/subsystem usage. You must complete all these steps before staring UMLDiff process.

htmunit_tabbedivew

If you want to test different UMLDiff threshold settings on the same analysis set, you do not need to repeat the above process. What I will do is: make a template database, such as htmlunit_tmpl, that have all the processes but UMLDiff done; and then, dump the database in a file (pg_dump htmlunit_tmpl > htmlunit_tmpl.out); after that, create another database (createdb htmlunit_r40_m30) and import the data from the dumped file (psql htmlunit_r40_m30 -f htmlunit_tmpl.out). In this way, you will have an exact copy of database as if you've done the whole fact extraction, aggregation, transitive closure computation processes. You also need to duplicate htmlunit_tmpl.xml and htmlunit_tmpl subdirectory in workspace/.metadata/.plugins/cs.ualberta.jdevan and rename them to htmlunit_r40_m30. And then you rename the connected database name to htmlunit_r40_m30 and specify the desired UMLDiff configuration in analysis set configuration file htmlunit_r40_m30.xml.

UMLDiff is a potential long computation process. Turning on JavadocSimilarity and TransclosureSimilarity will take more time to complete UMLDiff process. If you have a lot of system snapshots to UMLDiff, you probably need to vacuum analyze those affected tables in the middle of UMLDiffing process. But you do not need to wait till all the snapshots are UMLDiffed. You can let UMLDiff process Run in Background (an Eclipse feature), and start inspecting the change-trees and analyzing the UMLDiff results for those that have already been UMLDiffed.

The visualization and analysis

NOTE: For detailed discussion on the various types of design-evolution analyses, please read my recent publications.

To enable an intuitive means of communicating and inspecting all the design-change facts produced by UMLDiff, I have developed the change-tree visualization. Furthermore, various types of queries have been defined (the user can define their own) to summarize and analyze the reported changes.

The change-tree visualization

There are two types of change trees: inheritance and containment, which are essentially the same but follow the inheritance- and containment- spanning tree of software model respectively. Note that JDEvAn is equipped with a separate UML-diagram visualization component - JDEvAn Viewer. It is easier to layout, navigate, and browse a large amount of model elements and their changes in the change-tree visualization than in JDEvAn Viewer. The change-tree visualization is useful during the interactive inspection session of UMLDiff results for correcting the erroneously reported changes and identifying the missed renamings/moves. However, the JDEvAn Viewer's UML-style visualization is much more intuitive than change-tree visualization in inferring the underlying rationale that motivates the design evolution.

As an Eclipse plugin, JDEvAn reuses and extends the visualization of Eclipse’s Java DOM model. Therefore, in change tree, consistent with IDE’s convention, the different icons to the left of each node represent the different object-oriented entities: package, class, interface, field, and method/constructor. Their different colors represent the entity’s visibility. The top-right adornment shows the attributes of the entity, for example, abstract, constructor, static, final, etc. The bottom-right adornment represents method override or implementation. The only extension is the bottom-left adornment that represents the UMLDiff result of a particular entity between two compared versions: it can be the plus sign for add, minus sign for remove, 01 for rename, arrow with a minus sign for move out from source, arrow with a plus sign for move into target, and the star sign indicating extracting or inlining operation, in terms of field-read/write, method-call, and/or class-instantiation, from or into the specific mapped method/constructor. The question mark icons with yellow color are message node, just for displaying some information. The questions mark icons with red color are query node (when expanded, a query will be executed. The returned entities or information will then be displayed).

containment_changetree

The above figure shows a partial containment change-tree between the release 1.3 and 1.4 of HtmlUnit project (the root of change-tree indicates two compared snapshots). The class HtmlFrame existed in both release 1.3 and 1.4. Its method getLongDescAttribute() was moved-out (actually pulled-up into newly extracted superclass. see the inheritance change-tree below) to the class BaseFrame (see the status bar, the third bracket). The class HtmlFrame implemented java.lang.Clonable interface in both releases, but it no more implemented interface WebWindow in release 1.4. The outgoing method call of moved method getLongDescAttribute() was exact the same in both versions (both called HtmlElement.getAttributeValue(String) once). The id of selected entity is shown in the first bracket on the status bar. The second bracket shows the number of renamed, moved-in, and moved-out entities contained in the selected entity (if you select the package com.gargoylesoftware.htmlunit.html, the second bracket will be {R:2,MI:33,MO:32}, which indicates that two children (may by direct or indirect) of the selected package were renamed, 33 children were moved-in from somewhere else, and 32 children were moved-out from their original place).

inheritance_changetree

The above figure is the inheritance change-tree between release 1.3 and 1.4. In inheritance view, you can clearly see the Extract Superclass refactoring, involving the new superclass BaseFrame, two existing subclasses HtmlFrame and HtmlInlineFrame and the modifications of their inheritance (see the status bar), and the pull-up fields/methods between them. Note that, in the second bracket, it summarizes that 17 children of the class HtmlFrame were moved-out.

Right-click the change-tree view, you can navigate the change-trees; refresh the selected tree node (after manually correcting erroneously reported changes or identifying missed renamings and/or moves); toggle to show only changed children (we say an entity is changed iff it is modified in some way or at least one of its children (may be direct or indirect) is modified in some way); compare the usage differences of the selected entity when it evolves from one version to next (By specifying DiffUsage=true, UMLDiff computes the usage differences at the end of UMLDiff process. It is a time-consuming step. You can toggle it off by setting DiffUsage="false" in the analysis set configuration file, and request the computation on demand when inspecting change-tree. But the usage differences are prerequisite for detecting moving behavior among methods and constructors. Thus, if you want UMLDiff to analyze the movement of operation behavior, you have to enable DiffUsage). The usage relations of the removed entities are all removed, while the usage relations of the new entities are all newly added. But the removals or additions of usage relations do not imply the removals or additions of the related entities. The small "(-)" or "(+)" at the end of a usage relation indicates that the related entities were also removed or newly added respectively, while no sign indicates that the related entities exist in both compared versions although the usage dependencies between the selected entity and them may be removed, newly added, increased or decreased.

When inspecting the change-trees, you may identify some instances of erroneously reported matches, renamings, and/or moves. JDEvAn supports you correcting them on demand. In change tree viewer, right-click on the entity whose change status you think is wrong, and select “Correct wrong XXX” in the context menu. JDEvAn will correct programmatically the selected wrong instance, all the consequently wrong instances, and all the dependencies it refers to or is referred by. For example, if a class is erroneously identified as renamed, then all matches/renamings of its fields and/or methods do not make sense consequently. You can choose to correct the erroneously reported class renaming, JDEvAn will programmatically correct the wrong class renaming and all the matched/renamed fields/methods in that class. The match status of usage dependencies, such as class creation, will also be altered as one is removed and the other is added. For the moves of fields/methods, you can select a pair of source-target entities and just correct that particular instance. Or if the move of a field/method is completely wrong wherever it comes from or goes to, you can select just the source or target and JDEvAn will correct all the moves related to the selected entity.

You may also find some instances of matches, renamings, moves that are missed by UMLDiff . You can select a pair of such entities and choose "Link As XXX" in the context menu to assert the missed instances as matches, renamings, or moves in database. it is just like running UMLDiff in a small context. That is, instead of starting from the system snapshots, UMLDiff is given two selected entities (for example, a missed pair of renamed class), then UMLDiff will recursively finished the differencing process of their children.

In addition to correct erroneous and missed instances individually on demand, you can collect all the erroneous and missed instances in the file wronggeneralmatch and missedgeneralmatches respectively and put them in the corresponding analysis set directory, such as workspace/.metadata/.plugins/cs.ualberta.jdevan/htmlunit_r40_m30. After opening the corresponding analysis set, you then can do "Batch correct erroneous and missed instances". You can also correct the erroneous and missed behavior movement in the similar way by collecting the instances of erroneously reported and missed moving behavior in the file wronglogicmoves and missedlogicmoves respectively. There are some examples of such files contained in the analysis set directory zipped in htmlunit_tmpl.zip and jfreechart_tmpl_xxx_xxx.zip.

doubleclick_sourceeditor

Double-click the entity tree node (if entity's isfromsrc=ture) will open the source editor and highlight the selected entity. For general-matched entities, both its before and after versions will be opened.

The general queries

The change-tree visualizations provide an intuitive way to see what have been changed and how. It is very useful for inspecting UMLDiff results. However, I normally start with some general queries to get an overview of model and change facts. Some of the queries are integrated in the change-tree views and they are executed when you select an entity or expand a query node (as discussed above). More pre-defined queries can be issued from within General Query view.

generalquery

The General Query view is simply a table view. It works in the similar way to any database frontend. You issue queries from it and then the results are displayed in a table. Some queries are built in JDEvAn for checking various types of UMLDiff results, identifying instances of refactorings, and some code/evolution smells. The above figure shows the result of Extract Superclass query, which returns the superclass, the subclass, the moved-out field/method, and the corresponding moved-in field/method (not all columns are visible here). The refactoring involving BaseFrame, HtmlFrame, HtmlInlineFrame is highlighted.

You can also create your own queries by New Query..., which simply display a dialog for you to enter the SQL queries. Some more query examples are given below.

For an entity whose id=x, find me what changes have been made to its successor in snapshot 5.
SELECT s.scategory, s.stype, s.prev, s.next
FROM statustc stc, entity e, status s
WHERE stc.prev=ARRAY[x] and stc.next[1]=e.id and e.snapshotid=5 and s.next[1]=e.id

Find me all the newly added subclasses (may be indirect) of the class whose name starts with "Html" in snapshot 7
SELECT qualifiedname(e1.id)
FROM status s, entity e1, extendtc etc, entity e2
WHERE s.scateogry=1 and s.stype=1 and s.next[1]=e1.id and e1.category=16 and etc.eid1=e1.id and etc.eid2=e2.id and e2.category=16 and e2.name like "Html%" and e2.snapshotid=7

Find all the renamed method that no more take parameter of the removed class whose name is "xxx" (I do not know which snapshot it belongs to)
SELECT e.snapshotid, qualifiedname(e.id)
FROM status s1, status s2, entity e
WHERE s1.scategory=65536 and s1.stype=2 and s1.prev[3]=s2.prev[1] and s2.scategory=1 and s2.stype=2 and s2.prev[1]=e.id and e.name="xxx"

If you find some of your queries are very useful, please send them to me. I will make them available to others who might also be interested in. Thanks.

The refactoring visualization

This tab view is obsolete. The UML-diagram-style visualization of UMLDiff change facts and the detected refactoring has been implemented in a separate Eclipse plugin, JDEvAn Viewer.


 
© 2008 Zhenchang Xing. All rights reserved.