Penerapan PSO Over Sampling Dan Adaboost Random Forest Untuk Memprediksi Cacat Software

Richky Faizal Amir, Irwan Agus Sobari, Rousyati Rousyati

Abstract


Abstract: The dataset of software metrics, in general, are not balanced (Imbalanced). Class imbalance in Dataset can reduce the performance of software defect prediction models, because it tends to produce majority class predictions from minority classes, the dataset used in this study uses the National Aeronautics and Space Administration (NASA) Metrics Data Program (MDP), dataset From Stages Pre-processing proposed the Particle Swarm Optimization (PSO). method to overcome the problem of attributes in the training data and the Random Over Sampling (ROS) Resampling method. to deal with class imbalances. This study proposes that the Random Forest method combined with Adaboost can estimate the level of disability of software through training data. The results of this study indicate that the Resampling + Adaboost + Random Forest algorithm can be used to predict software defects with an average accuracy of 94.70% and a value of AUC 0.939. While the PSO + Random Forest algorithm only has an average accuracy of 89.60% and AUC 0.636 the difference in the accuracy of the two models is 5.10% and AUC 0.303. Statistical tests show that there is a significant influence between the proposed model and the Random Forest model with a p-value (0.036) smaller than the alpha value (0.05), which means there is a significant difference between the two models.

Keywords: Imbalanced Class, Resample, Particle Swarm Optimization, Random Forest, Adaboost, Software Defect

Abstrak: Dataset dari software matrik secara umum bersifat tidak seimbang (Imbalanced). Ketidak seimbangan kelas yang ada dalam dataset dapat menurunkan kinerja model prediksi cacat software, karena cenderung menghasilkan prediksi kelas mayoritas dari kelas minoritas. Dataset yang digunakan pada penelitian ini menggunakan dataset National Aeronautics and Space Administration (NASA) Metrics Data Program (MDP). Dari tahapan pra pemrosesan diusulkan metode Particle Swarm Optimization (PSO) untuk mengatasi masalah attribute pada data training dan metode Resampling Random Over Sampling (ROS). untuk menangani ketidak seimbangan kelas. Penelitian ini mengusulkan metode Random Forest yang dikombinasikan dengan Adaboost dapat mengestimasi tingkat kecacatan suatu Software melalui data training, Dari Hasil penelitian ini menunjukan bahwa algoritma Resampling+Adaboost+Random Forest dapat digunakan untuk memprediksi cacat software dengan rata-rata akurasi 94,70% dan nilai AUC 0,939. Sementara algoritma PSO+Random Forest hanya memiliki rata-rata akurasi 89,60% dan AUC 0,636 perbedaan akurasi dari kedua model tersebut 5,10% dan AUC 0,303. Uji statistik menunjukan bahwa adanya pengaruh yang signifikan antara model usulan dengan model Random Forest dengan nilai p (0,036) lebih kecil dari nilai alpha (0,05) yang artinya terdapat perbedaan yang siginifkan antara kedua model.

Kata kunci: Imbalanced Class, Resample, Particle Swarm Optimization, Random Forest, Adaboost, Kecacatan Software


Full Text:

PDF

References


Adegoke, V. F., Chen, D., Banissi, S., & Banissi, E. (2017). Predictive Ensemble Modelling - Experimental Comparison of Boosting Implementation Methods. Proceedings - UKSim-AMSS 11th European Modelling Symposium on Computer Modelling and Simulation, EMS 2017, 11–16. https://doi.org/10.1109/EMS.2017.13

Amalia, H., Lestari, A. F., & Puspita, A. (2017). Penerapan Metode Svm Berbasis Pso Untuk Penentuan Kebangkrutan Perusahaan. Jurnal Techno Nusa Mandiri, 14(2), 131–136.

Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25(2), 197–227. https://doi.org/10.1007/s11749-016-0481-7

Chopard, B., & Tomassini, M. (2018). Particle Swarm Optimization. In: An Introduction to Metaheuristics for Optimization. Natural Computing Series. Springer, Cham, 23(2), 145–156. https://doi.org/10.1007/978-3-319-93073-2

Hofman, R. (2011). Behavioral economics in software quality engineering. Empirical Software Engineering, 16(2), 278–293. https://doi.org/10.1007/s10664-010-9140-x

Laradji, I. H., Alshayeb, M., & Ghouti, L. (2014). Software defect prediction using ensemble learning on selected features. INFORMATION AND SOFTWARE TECHNOLOGY. https://doi.org/10.1016/j.infsof.2014.07.005

Li, R. H., Yu, J. X., Qin, L., Mao, R., & Jin, T. (2015). On random walk based graph sampling. Proceedings - International Conference on Data Engineering, 2015-May, 927–938. https://doi.org/10.1109/ICDE.2015.7113345

Muslikh, A. R., Santoso, H. A., Marjuni, A., Teknik, P., Universitas, I., & Nuswantoro, D. (2018). Klasifikasi Data Time Series Arus Lalu Lintas. 14, 24–38.

Onan, A., Korukoǧlu, S., & Bulut, H. (2016). Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications, 57, 232–247. https://doi.org/10.1016/j.eswa.2016.03.045

Rais, A. N., & Subekti, A. (2019). Integrasi SMOTE Dan Ensemble AdaBoost Untuk Mengatasi Imbalance Class Pada Data Bank Direct Marketing. Jurnal Informatika, 6(2), 278–285. https://doi.org/10.31311/ji.v6i2.6186

Saifudin, A., Teknik, F., Pamulang, U., Wahono, R. S., Komputer, F. I., & Nuswantoro, U. D. (2015). Penerapan Teknik Ensemble untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Software. 1(1).

Singleton, A. (2016). The Economics of Microservices. IEEE Cloud Computing, 3(5), 16–20. https://doi.org/10.1109/MCC.2016.109

Tang, X., & Chen, L. (2019). Artificial bee colony optimization-based weighted extreme learning machine for imbalanced data learning. Cluster Computing, 22, 6937–6952. https://doi.org/10.1007/s10586-018-1808-9

Wahono, R. S. (2007). A Systematic Literature Review of Software Defect Prediction: Research Trends, Datasets, Methods and Frameworks. Journal of Software Engineering, 1(1), 1–16. https://doi.org/10.3923/jse.2007.1.12

Wahono, R. S., Suryana, N., & Ahmad, S. (2014). Metaheuristic Optimization based Feature Selection for Software Defect Prediction. 9(5), 1324–1333. https://doi.org/10.4304/jsw.9.5.1324-1333




DOI: https://doi.org/10.31294/ijse.v6i2.9258

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

 

ISSN : 2714-9935 


Published by LPPM Universitas Bina Sarana Informatika

Jl. Kramat Raya No.98, Kwitang, Kec. Senen, Kota Jakarta Pusat, DKI Jakarta 10450


This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License