Penerapan Metode Random Over-Under Sampling dan Random Forest Untuk Klasifikasi Penilaian Kredit

Akhmad Syukron, Agus Subekti

Sari


                                         Abstrak

Penilaian kredit telah menjadi salah satu cara utama bagi sebuah lembaga keuangan untuk menilai resiko kredit,  meningkatkan arus kas, mengurangi kemungkinan resiko dan membuat keputusan manajerial. Salah satu permasalahan yang dihadapai pada penilaian kredit yaitu adanya ketidakseimbangan distribusi dataset. Metode untuk mengatasi ketidakseimbangan kelas yaitu dengan metode resampling, seperti menggunakan Oversampling, undersampling dan hibrida yaitu dengan menggabungkan kedua pendekatan sampling. Metode yang diusulkan pada penelitian ini adalah penerapan metode Random Over-Under Sampling Random Forest untuk meningkatkan kinerja akurasi klasifikasi penilaian kredit pada dataset German Credit.  Hasil pengujian menunjukan bahwa klasifikasi tanpa melalui proses resampling menghasilkan kinerja akurasi rata-rata 70 % pada semua classifier. Metode Random Forest memiliki nilai akurasi yang lebih baik dibandingkan dengan beberapa metode lainnya dengan nilai akurasi sebesar 0,76 atau 76%. Sedangkan klasifikasi dengan penerapan metode Random Over-under sampling Random Forest  dapat meningkatkan kinerja akurasi sebesar 14,1% dengan nilai akurasi sebesar 0,901 atau 90,1 %. Hasil penelitian menunjukan bahwa penerapan  resampling dengan metode Random Over-Under Sampling pada algoritma Random Forest dapat meningkatkan kinerja akurasi secara efektif pada klasifikasi  tidak seimbang untuk penilaian kredit pada dataset German Credit.

 

Kata kunci: Penilaian Kredit, Random Forest, Klasifikasi, ketidakseimbangan kelas, Random Over-Under Sampling

                                                  Abstract

Credit scoring has become one of the main ways for a financial institution to assess credit risk, improve cash flow, reduce the possibility of risk and make managerial decisions. One of the problems faced by credit scoring is the imbalance in the distribution of datasets. The method to overcome class imbalances is the resampling method, such as using Oversampling, undersampling and hybrids by combining both sampling approaches. The method proposed in this study is the application of the Random Over-Under Sampling Random Forest method to improve the accuracy of the credit scoring classification performance on German Credit dataset. The test results show that the classification without going through the resampling process results in an average accuracy performance of 70% for all classifiers. The Random Forest method has a better accuracy value compared to some other methods with an accuracy value of 0.76 or 76%. While classification by applying the Random Over-under sampling + Random Forest method can improve accuracy performance 14.1% with an accuracy value of 0.901 or 90.1%. The results showed that the application of resampling using Random Over-Under Sampling method in the Random Forest algorithm can improve accuracy performance effectively on an unbalanced classification for credit scoring on German Credit dataset.

 

Keywords: Imbalance Class, Credit Scoring, Random Forest, Classification, Resampling



Kata Kunci


Penilaian Kredit, Random Forest, Klasifikasi, ketidakseimbangan kelas, Random Over-Under Sampling

Teks Lengkap:

PDF

Referensi


Agrawal, K., Baweja, Y., Dwivedi, D., Saha, R., Prasad, P., Agrawal, S., … Dutt, V. (2018). A Comparison of Class Imbalance Techniques for Real-World Landslide Predictions. Proceedings - 2017 International Conference on Machine Learning and Data Science, MLDS 2017, 2018–Janua, 1–8. https://doi.org/10.1109/MLDS.2017.21

Dawson, C. W. (2009). Projects in Computing and Information Systems A Student’s Guide (2nd ed.). Pearson Education Limited.

Han, J., Kamber, M., Pei, J., (2012). Data Minning Concept And Techniques. California: Morgan Kaufmann.

He, H., Zhang, W., & Zhang, S. (2018). A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Systems with Applications, 98, 105–117. https://doi.org/10.1016/ j.eswa.2018.01.012

Jian, C., Gao, J., & Ao, Y. (2016). A new sampling method for classifying imbalanced data based on support vector machine ensemble. Neurocomputing, 193, 115–122. https://doi.org/10.1016/j.neucom.2016.02.006

Koutanaei, F. N., Sajedi, H., & Khanbabaei, M. (2015). A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring. Journal of Retailing and Consumer Services, 27, 11–23. https://doi.org/10.1016/j.jretconser.2015.07.003

Lin, L., Wang, F., Xie, X., & Zhong, S. (2017). Random forests-based extreme learning machine ensemble for multi-regime time series prediction. Expert Systems with Applications, 83, 164–176. https://doi.org/10.1016/j.eswa.2017.04.013

Rajesh, K. N. V. P. S., & Dhuli, R. (2018). Classification of imbalanced ECG beats using re-sampling techniques and AdaBoost ensemble classifier. Biomedical Signal Processing and Control, 41, 242–254. https://doi.org/ 10.1016/j.bspc.2017.12.004

Ren, F., Cao, P., Li, W., Zhao, D., & Zaiane, O. (2017). Ensemble based adaptive over-sampling method for imbalanced data learning in computer aided detection of microaneurysm. Computerized Medical Imaging and Graphics, 55, 54–67. https://doi.org/ 10.1016/j.compmedimag.2016.07.011

Saifudin, A., Teknik, F., Pamulang, U., Komputer, F. I., Nuswantoro, U. D., & Software, P. C. (2015). Pendekatan Level Data untuk Menangani Ketidakseimbangan Kelas pada Prediksi Cacat Software. Journal of Software Engineering, 1(2), 76–85.

Wang, G., Hao, J., Ma, J., & Jiang, H. (2011). A comparative assessment of ensemble learning for credit scoring. Expert Systems with Applications, 38(1), 223–230. https://doi.org/10.1016/j.eswa.2010. 06.048

Xiao, J., Xie, L., He, C., & Jiang, X. (2012). Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Systems with Applications, 39(3), 3668–3675. https://doi.org/10.1016/j.eswa.2011.09.059

Zhang, X., Yang, Y., & Zhou, Z. (2018). A Novel Credit Scoring Model based on Optimized Random Forest. Computing and Communicating Workshop and Conference (CCWD), 2018 IEEE 8th Annual, 978(1), 60–65.




DOI: https://doi.org/10.31311/ji.v5i2.4158

##submission.license.cc.by4.footer##

 dipublikasikan oleh LPPM UBSI
Jl. Kamal Raya No. 18 Cengkareng, Jakarta Barat