Tuesday, June 1, 2010

Duplicate bug report identification results



The duplicate bug report analysis was done using linguistic features [1] alone. Let us look at process in detail below

Input : The fields "Summary", "Description" and "Comments" were considered, this is denoted as SDC (Summary, Description and Comments). Also the field "Comments" was left out, having "Summary" and "Description" alone. It is denoted as SD (Summart and Description).

Preprocessing : Stop words were removed. Also stemming was performed. In order to observe the impact of stemming, comparison is done with and without stemming.

Feature vector : Both term frequency (TF) and term frequency-inverse document frequency (TF-IDF) are used. Also comparison is done for the two.

OpenMRS dataset had 83 duplicate reports out of 2315. A pair of duplicated report and the report which it duplicates are taken manually from the OpenMRS TRAC page. 7 out of 83 reports do not have the pair mentioned i.e., they are just marked as duplicate. Also 2 out of 83 have more than one report that it duplicates. So we finally have 78 pair of duplicate reports dataset. The metrics for evaluation is chosen similar to [1]. For each report 5, 10 and 15 similar reports are predicted, this is given by "K". If the duplicate report is part of the prediction then it is taken as a hit. Else it is taken as a non-hit. The percentage of hits is given in the following table



S.noTF/TF-IDFStemmingSDC/SD KRatio of hits
1TFNoSDC50.346
2TFNoSDC100.372
3TFNoSDC150.423
4TF-IDFNoSDC50.474
5TF-IDFNoSDC100.55
6TF-IDFNoSDC150.55
7TFYesSDC50.5
8TFYesSDC100.576
9TFYesSDC150.5897
10TF-IDFYesSDC50.55
11TF-IDFYesSDC100.60
12TF-IDFYesSDC150.63
13TFNoSD50.37
14TFNoSD100.397
15TFNoSD150.410
16TF-IDFNoSD50.487
17TF-IDFNoSD100.538
18TF-IDFNoSD150.564
19TFYesSD50.474
20TFYesSD100.487
21TFYesSD150.512
22TF-IDFYesSD50.512
23TF-IDFYesSD100.63
24TF-IDFYesSD150.67


From the results, TF-IDF with stemming performs better than TF and without stemming. Also use of comments does not improve accuracy much, and even it degrades the performance slightly.

[1] Runeson, P., Alexandersson, M., and Nyholm, O. 2007. Detection of Duplicate Defect Reports Using Natural Language Processing. In Proceedings of the 29th international Conference on Software Engineering (May 20 - 26, 2007). International Conference on Software Engineering. IEEE Computer Society, Washington, DC, 499-510. DOI= http://dx.doi.org/10.1109/ICSE.2007.32

No comments:

Post a Comment