Optimization of Human Development Index in Indonesia Using Decision Tree C4.5, Support Vector Machine Algorithm, K-Nearest Neighbors, NaÃ¯ve Bayes, and Extreme Gradient Boosting

Ilham Ramadhan; Budiman Budiman; Nur Alamsyah

doi:10.31294/inf.v12i1.21874

Optimization of Human Development Index in Indonesia Using Decision Tree C4.5, Support Vector Machine Algorithm, K-Nearest Neighbors, NaÃ¯ve Bayes, and Extreme Gradient Boosting

Ilham Ramadhan, Budiman Budiman, Nur Alamsyah

Abstract

The Human Development Index (HDI) is a measure of human development achievement based on quality of life indicators such as Life Expectancy (LE), Mean Years of Schooling (MYS), Expected Years of Schooling (EYS), and Adjusted Per Capita Expenditure (AECE). HDI describes how people access development outcomes through income, health, and education. The determination of development programs implemented by local governments must be based on district/city priorities based on their HDI categories and must be right on target. Therefore, a decision system is needed that can accurately determine the HDI category in each district/city in Indonesia, using machine learning models such as Decision Tree C4.5, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), NaÃ¯ve Bayes, and Extreme Gradient Boosting (XGBoost). Machine learning models will be used to classify the HDI in Indonesia in 2022 and determine the performance of the most optimal model in classification. This research uses the CRISP-DM method with secondary data from the Central Statistics Agency (BPS) as much as 548 data. The analysis results show that the Decision Tree C4.5 models have an accuracy of 0.86, KNN of 0.95, NaÃ¯ve Bayes of 0.90, XGBoost of 0.93, and SVM provides the most optimal results with an accuracy of 0.97. UHH, RLS, and HLS variables significantly influence changes in HDI values in Indonesian regions based on the Chi-square, Pearson Correlation, Spearman, and Kendal test results.Â

Keywords

Human Development Index, Machine Learning, SVM Algorithm

Full Text:

PDF

References

Alamsyah, N., Budiman, B., Yoga, T. P., & Alamsyah, R. Y. R. (2024). Xgboost Hyperparameter Optimization Using Randomizedsearchcv For Accurate Forest Fire Drought Condition Prediction. Jurnal Pilar Nusa Mandiri, 20(2), Article 2. https://doi.org/10.33480/pilar.v20i2.5569

Anam, M. K., Pikir, B. N., & Firdaus, M. B. (2021). Penerapan Na ÌˆÄ±ve Bayes Classifier, K-Nearest Neighbor (KNN) dan Decision Tree untuk Menganalisis Sentimen pada Interaksi Netizen danPemeritah. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 21(1), 139â€“150. https://doi.org/10.30812/matrik.v21i1.1092

Arumnisaa, R. I., & Wijayanto, A. W. (2023). Comparison of Ensemble Learning Method: Random Forest, Support Vector Machine, AdaBoost for Classification Human Development Index (HDI). SISTEMASI, 12(1), 206. https://doi.org/10.32520/stmsi.v12i1.2501

Badan Pusat Statistik. (2022). Indeks Pembangunan Manusia. Badan Pusat Statistik, 178.

Bardab, S. N., Ahmed, T. M., & Mohammed, T. A. A. (2021). Data mining classification algorithms: An overview. 8(1), 41â€“49.

Budiman, B., Nursyanti, R., Alamsyah, R. Y. R., & Akbar, I. (2020). Data Mining Implementation Using NaÃ¯ve Bayes Algorithm and Decision Tree J48 In Determining Concentration Selection. 1(3).

Budiman, & Niqotaini, Z. (2021). Perbandingan Algoritma Klasifikasi Data Mining untuk Penelusuran Minat Calon Mahasiswa Baru. NUANSA INFORMATIKA, 15(2), 37â€“52. https://doi.org/10.25134/nuansa.v15i2.4162

Budiman, & Parama Yoga, T. (2023). Optimalisasi K-Means Berbasis Particle Swarm Optimization untuk Hasil Produksi Tanaman Sayuran di Indonesia. Jurnal Nuansa Informatika, 17, 2614â€“5405. https://doi.org/10.25134/nuansa

Grandini, M., Bagli, E., & Visani, G. (2020). Metrics for Multi-Class Classification: An Overview. https://doi.org/doi.org/10.48550/arXiv.2008.05756

Heydarian, M., Doyle, T. E., & Samavi, R. (2022). MLCM: Multi-Label Confusion Matrix. IEEE Access, 10, 19083â€“19095. https://doi.org/10.1109/ACCESS.2022.3151048

Huber, S., Wiemer, H., Schneider, D., & Ihlenfeldt, S. (2019). DMME: Data mining methodology for engineering applications â€“ a holistic extension to the CRISP-DM model. Procedia CIRP, 79(March), 403â€“408. https://doi.org/10.1016/j.procir.2019.02.106

Id, I. D. (2021). MACHINE LEARNING : Teori, Studi Kasus dan Implementasi Menggunakan Python. UR PRESS.

Jo, T. (2021). Machine Learning Foundations: Supervised, Unsupervised, and Advanced Learning. In Machine Learning Foundations: Supervised, Unsupervised, and Advanced Learning. Springer International Publishing. https://doi.org/10.1007/978-3-030-65900-4

Kemala, I., & Wijayanto, A. W. (2021). Perbandingan Kinerja Metode Bagging dan Non-Ensemble Machine Learning pada Klasifikasi Wilayah di Indonesia menurut Indeks Pembangunan Manusia. Jurnal Sistem Dan Teknologi Informasi (Justin), 9(2), 269. https://doi.org/10.26418/justin.v9i2.44166

Kumar, A., & Jain, M. (2020). Ensemble Learning for AI Developers. In Ensemble Learning for AI Developers: Learn Bagging, Stacking, and Boosting Methods with Use Cases. Apress. https://doi.org/10.1007/978-1-4842-5940-5

Majumdar, P. (2023). Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition). BPB Publications.

Mo, H., Sun, H., Liu, J., & Wei, S. (2019). Developing window behavior models for residential buildings using XGBoost algorithm. Energy and Buildings, 205, 109564. https://doi.org/10.1016/j.enbuild.2019.109564

Nurafidah, Suryowati, K., & Jatipaningrum, M. T. (2023). Perbandingan Metode K-Nearest Neighbor Dan Random Forest Pada Klasifikasi Indeks Pembangunan Manusia Di Kabupaten/Kota Seluruh Indonesia. Jurnal Statistika Industri â€¦, 08(1), 58â€“67.

Nurhalizah, & Sitompul, P. (2022). Analysis of Ordinary Least Square and Geographically Weighted Regression on the Human Development Index of North Sumatra 2021. Formosa Journal of Applied Sciences, 1(6), 981â€“1000. https://doi.org/10.55927/fjas.v1i6.1718

Quinto, B. (2020). Next-generation machine learning with spark: Covers XGBoost, LightGBM, Spark NLP, distributed deep learning with keras, and more. In Next-Generation Machine Learning with Spark: Covers XGBoost, LightGBM, Spark NLP, Distributed Deep Learning with Keras, and More. https://doi.org/10.1007/978-1-4842-5669-5

Sakarkar, G., Patil, G., & Dutta, P. (2021). Machine Learning Algorithms Using Python Programming. In Machine Learning Algorithms Using Python Programming. Nova Science Publishers, Inc.

SchrÃ¶er, C., Kruse, F., & GÃ³mez, J. M. (2021). A Systematic Literature Review on Applying CRISP-DM Process Model. Procedia Computer Science, 181, 526â€“534. https://doi.org/10.1016/j.procs.2021.01.199

Singh, H., Navaneeth, N. V., & Pillai, G. N. (2019). Multisurface Proximal SVM Based Decision Trees For Heart Disease Classification. TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON), 13â€“18. https://doi.org/10.1109/TENCON.2019.8929618

UNDP, (United Nations Development Programme). (2022). Human Development Report 2021-22. UNDP (United Nations Development Programme).

Verdhan, V. (2020). Supervised Learning with Python. In Supervised Learning with Python. Apress. https://doi.org/10.1007/978-1-4842-6156-9

Wirth, R., & Hipp, J. (2000). CRISP-DM: Towards a standard process model for data mining. Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining.

DOI: https://doi.org/10.31294/inf.v12i1.21874