International Journal of Information Technology and Computer Science(IJITCS)

ISSN: 2074-9007 (Print), ISSN: 2074-9015 (Online)

Published By: MECS Press

IJITCS Vol.4, No.8, Jul. 2012

Using Negative Binomial Regression Analysis to Predict Software Faults: A Study of Apache Ant

Full Text (PDF, 1126KB), PP.63-70

Views:165   Downloads:0


Liguo Yu

Index Terms

Complexity Metrics, Software Faults, Negative Binomial Regression Analysis


Negative binomial regression has been proposed as an approach to predicting fault-prone software modules. However, little work has been reported to study the strength, weakness, and applicability of this method. In this paper, we present a deep study to investigate the effectiveness of using negative binomial regression to predict fault-prone software modules under two different conditions, self-assessment and forward assessment. The performance of negative binomial regression model is also compared with another popular fault prediction model—binary logistic regression method. The study is performed on six versions of an open-source objected-oriented project, Apache Ant. The study shows (1) the performance of forward assessment is better than or at least as same as the performance of self-assessment; (2) in predicting fault-prone modules, negative binomial regression model could not outperform binary logistic regression model; and (3) negative binomial regression is effective in predicting multiple errors in one module.

Cite This Paper

Liguo Yu, "Using Negative Binomial Regression Analysis to Predict Software Faults: A Study of Apache Ant", International Journal of Information Technology and Computer Science(IJITCS), vol.4, no.8, pp.63-70, 2012. DOI: 10.5815/ijitcs.2012.08.08


[1]Kastro Y, Bener A. A defect prediction method for software versioning. Software Quality Journal, 16 (4), 2008, pp. 543–562.

[2]Nagappan N, Ball T, Zeller A. Mining metrics to predict component failures. Proceedings of the 28th International Conference on Software Engineering, Shanghai, China, May 2006, pp. 452–461.

[3]Tosun A, Bener A B, Turhan B, Menzies T. Practical considerations in deploying statistical methods for defect prediction: a case study within the Turkish telecommunications industry. Information & Software Technology 52 (11), 2010, pp. 1242–1257.

[4]Williams C C, Hollingsworth J K. Automatic mining of source code repositories to improve bug finding techniques. IEEE Transactions on Software Engineering, 31 (6), 2005, pp. 466–480.

[5]Turhan B, Bener A. Analysis of Naive Bayes’ assumptions on software fault data: an empirical study. Data and Knowledge Engineering Journal, 68 (2), 2009, pp. 278–290.

[6]Turhan B, Bener A, Kocak G. Data mining source code for locating software bugs: a case study in telecommunication industry. Expert Systems with Applications, 36 (6), 2009, pp. 9986–9990.

[7]Kanmani S, Uthariaraj V R, Sankaranarayanan V, Thambidurai P. Object oriented software fault prediction using neural networks. Information and Software Technology 49 (5), 2007, pp. 483–492.

[8]Tosun A, Turhan B, Bener A. Ensemble of software defect predictors: a case study. Proceedings of the 2nd International Symposium on Empirical Software Engineering and Measurement, Bolzano/Bozen, Italy, September 16-17, 2010, pp. 318–320.

[9]Turhan B, Menzies T, Bener A B, Di Stefano J S. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering 14(5), 2009, pp.540–578.

[10]Pai G J, Dugan J B. Empirical analysis of software fault content and fault proneness using Bayesian methods. IEEE Transactions on Software Engineering, 33 (10), 2007, pp. 675–686.

[11]Jureczko M, Spinellis D. Using object-oriented design metrics to predict software defects. Models and Methodology of System Dependability—Proceedings of RELCOMEX 2010: Fifth International Conference on Dependability of Computer Systems DepCoS, Monographs of System Dependability, Wroclaw, Poland, 2010, pp. 69–81.

[12]Ostrand T J, Weyuker E J, Bell R M. Predicting the location and number of faults in large software systems. IEEE Transactions on Software Engineering, 31 (4), 2005, pp. 340–355.

[13]Janes A, Scotto M, Pedrycz W, Russo B, Stefanovic M, Succi G. Identification of defect-prone classes in telecommunication software systems using design metrics. Information Sciences 176 (24), 2006, pp. 3711–3734.

[14]Bell R M, Ostrand T J, Weyuker E J. Looking for bugs in all the right places. Proceedings of 2006 International Symposium on Software Testing and Analysis, Portland, Maine, USA, 2006, pp. 61–72.

[15]Ostrand T J, Weyuker E J, Bell R M. Where the bugs are. Proceedings of 2004 International Symposium on Software Testing and Analysis, Boston, MA, pp. 86–96.

[16]Hilbe J. Negative Binomial Regression, Cambridge University Press; 1 edition (July 29, 2007)

[17]Boetticher G, Menzies T, Ostrand T. PROMISE Repository of empirical software engineering data repository, West Virginia University, Department of Computer Science, 2007.

[18]Chidamber S, Kemerer C. A metrics suite for object-oriented design, IEEE Transactions on Software Engineering 20 (6) (1994) pp. 476–493.

[19]CKJM metrics description.



[22]Yu L, Mishra A. Experience in predicting fault-prone software modules using complexity metrics, Quality Technology & Quantitative Management (ISSN 1684-3703), to appear.

[23]Tsymbal A. The problem of concept drift: definitions and related work, Technical report, TCD-CS-2004-15, Computer Science Department, Trinity College Dublin, 2004. Available at:

[24]Yu L, Schach S R. Applying association mining to change propagation. International Journal of Software Engineering and Knowledge Engineering, 18 (8), 2008, pp. 1043–1061.

[25]Olague H M, Etzkorn L H, Messimer S L, Delugach H S. An empirical validation of object-oriented class complexity metrics and their ability to predict error-prone classes in highly iterative, or agile, software: a case study. Journal of Software Maintenance and Evolution: Research and Practice, 20 (3), 2008, 171–197.

[26]Olague H M, Etzkorn L H, Gholston S, Quattlebaum S. Empirical validation of three software metrics suites to predict fault-proneness of object-oriented classes developed using highly iterative or agile software development processes. IEEE Transactions on Software Engineering, 33 (6), 2007, pp. 402–419.

[27]Zhou Y, Leung H Empirical analysis of object-oriented design metrics for predicting high and low severity faults. IEEE Transactions on Software Engineering, 32 (10), 2006, pp. 771–789.

[28]Menzies T, Turhan B, Bener A, Gay G, Cukic B, Jiang Y. Implications of ceiling effects in defect predictors. Proceedings of the 4th International Workshop on Predictor Models in Software Engineering, Leipzig, Germany, May 10-18, 2008, pp. 47–54.

[29]Zhou Y, Xu B, Leung H. On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. Journal of Systems and Software, 83 (4), 2010, pp. 660–674.

[30]Shatnawi R, Li W. The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process. Journal of Systems and Software, 81 (11), 2008, pp. 1868–1882.