Optimization of Human Development Index in Indonesia Using Decision Tree C4.5, Support Vector Machine Algorithm, K-Nearest Neighbors, Naïve Bayes, and Extreme Gradient Boosting
Abstract
The Human Development Index (HDI) is a measure of human development achievement based on quality of life indicators such as Life Expectancy (LE), Mean Years of Schooling (MYS), Expected Years of Schooling (EYS), and Adjusted Per Capita Expenditure (AECE). HDI describes how people access development outcomes through income, health, and education. The determination of development programs implemented by local governments must be based on district/city priorities based on their HDI categories and must be right on target. Therefore, a decision system is needed that can accurately determine the HDI category in each district/city in Indonesia, using machine learning models such as Decision Tree C4.5, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Naïve Bayes, and Extreme Gradient Boosting (XGBoost). Machine learning models will be used to classify the HDI in Indonesia in 2022 and determine the performance of the most optimal model in classification. This research uses the CRISP-DM method with secondary data from the Central Statistics Agency (BPS) as much as 548 data. The analysis results show that the Decision Tree C4.5 models have an accuracy of 0.86, KNN of 0.95, Naïve Bayes of 0.90, XGBoost of 0.93, and SVM provides the most optimal results with an accuracy of 0.97. UHH, RLS, and HLS variables significantly influence changes in HDI values in Indonesian regions based on the Chi-square, Pearson Correlation, Spearman, and Kendal test results.
Keywords
Full Text:
PDFReferences
Alamsyah, N., Budiman, B., Yoga, T. P., & Alamsyah, R. Y. R. (2024). Xgboost Hyperparameter Optimization Using Randomizedsearchcv For Accurate Forest Fire Drought Condition Prediction. Jurnal Pilar Nusa Mandiri, 20(2), Article 2. https://doi.org/10.33480/pilar.v20i2.5569
Anam, M. K., Pikir, B. N., & Firdaus, M. B. (2021). Penerapan Na ̈ıve Bayes Classifier, K-Nearest Neighbor (KNN) dan Decision Tree untuk Menganalisis Sentimen pada Interaksi Netizen danPemeritah. MATRIK : Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, 21(1), 139–150. https://doi.org/10.30812/matrik.v21i1.1092
Arumnisaa, R. I., & Wijayanto, A. W. (2023). Comparison of Ensemble Learning Method: Random Forest, Support Vector Machine, AdaBoost for Classification Human Development Index (HDI). SISTEMASI, 12(1), 206. https://doi.org/10.32520/stmsi.v12i1.2501
Badan Pusat Statistik. (2022). Indeks Pembangunan Manusia. Badan Pusat Statistik, 178.
Bardab, S. N., Ahmed, T. M., & Mohammed, T. A. A. (2021). Data mining classification algorithms: An overview. 8(1), 41–49.
Budiman, B., Nursyanti, R., Alamsyah, R. Y. R., & Akbar, I. (2020). Data Mining Implementation Using Naïve Bayes Algorithm and Decision Tree J48 In Determining Concentration Selection. 1(3).
Budiman, & Niqotaini, Z. (2021). Perbandingan Algoritma Klasifikasi Data Mining untuk Penelusuran Minat Calon Mahasiswa Baru. NUANSA INFORMATIKA, 15(2), 37–52. https://doi.org/10.25134/nuansa.v15i2.4162
Budiman, & Parama Yoga, T. (2023). Optimalisasi K-Means Berbasis Particle Swarm Optimization untuk Hasil Produksi Tanaman Sayuran di Indonesia. Jurnal Nuansa Informatika, 17, 2614–5405. https://doi.org/10.25134/nuansa
Grandini, M., Bagli, E., & Visani, G. (2020). Metrics for Multi-Class Classification: An Overview. https://doi.org/doi.org/10.48550/arXiv.2008.05756
Heydarian, M., Doyle, T. E., & Samavi, R. (2022). MLCM: Multi-Label Confusion Matrix. IEEE Access, 10, 19083–19095. https://doi.org/10.1109/ACCESS.2022.3151048
Huber, S., Wiemer, H., Schneider, D., & Ihlenfeldt, S. (2019). DMME: Data mining methodology for engineering applications – a holistic extension to the CRISP-DM model. Procedia CIRP, 79(March), 403–408. https://doi.org/10.1016/j.procir.2019.02.106
Id, I. D. (2021). MACHINE LEARNING : Teori, Studi Kasus dan Implementasi Menggunakan Python. UR PRESS.
Jo, T. (2021). Machine Learning Foundations: Supervised, Unsupervised, and Advanced Learning. In Machine Learning Foundations: Supervised, Unsupervised, and Advanced Learning. Springer International Publishing. https://doi.org/10.1007/978-3-030-65900-4
Kemala, I., & Wijayanto, A. W. (2021). Perbandingan Kinerja Metode Bagging dan Non-Ensemble Machine Learning pada Klasifikasi Wilayah di Indonesia menurut Indeks Pembangunan Manusia. Jurnal Sistem Dan Teknologi Informasi (Justin), 9(2), 269. https://doi.org/10.26418/justin.v9i2.44166
Kumar, A., & Jain, M. (2020). Ensemble Learning for AI Developers. In Ensemble Learning for AI Developers: Learn Bagging, Stacking, and Boosting Methods with Use Cases. Apress. https://doi.org/10.1007/978-1-4842-5940-5
Majumdar, P. (2023). Mastering Classification Algorithms for Machine Learning: Learn how to apply Classification algorithms for effective Machine Learning solutions (English Edition). BPB Publications.
Mo, H., Sun, H., Liu, J., & Wei, S. (2019). Developing window behavior models for residential buildings using XGBoost algorithm. Energy and Buildings, 205, 109564. https://doi.org/10.1016/j.enbuild.2019.109564
Nurafidah, Suryowati, K., & Jatipaningrum, M. T. (2023). Perbandingan Metode K-Nearest Neighbor Dan Random Forest Pada Klasifikasi Indeks Pembangunan Manusia Di Kabupaten/Kota Seluruh Indonesia. Jurnal Statistika Industri …, 08(1), 58–67.
Nurhalizah, & Sitompul, P. (2022). Analysis of Ordinary Least Square and Geographically Weighted Regression on the Human Development Index of North Sumatra 2021. Formosa Journal of Applied Sciences, 1(6), 981–1000. https://doi.org/10.55927/fjas.v1i6.1718
Quinto, B. (2020). Next-generation machine learning with spark: Covers XGBoost, LightGBM, Spark NLP, distributed deep learning with keras, and more. In Next-Generation Machine Learning with Spark: Covers XGBoost, LightGBM, Spark NLP, Distributed Deep Learning with Keras, and More. https://doi.org/10.1007/978-1-4842-5669-5
Sakarkar, G., Patil, G., & Dutta, P. (2021). Machine Learning Algorithms Using Python Programming. In Machine Learning Algorithms Using Python Programming. Nova Science Publishers, Inc.
Schröer, C., Kruse, F., & Gómez, J. M. (2021). A Systematic Literature Review on Applying CRISP-DM Process Model. Procedia Computer Science, 181, 526–534. https://doi.org/10.1016/j.procs.2021.01.199
Singh, H., Navaneeth, N. V., & Pillai, G. N. (2019). Multisurface Proximal SVM Based Decision Trees For Heart Disease Classification. TENCON 2019 - 2019 IEEE Region 10 Conference (TENCON), 13–18. https://doi.org/10.1109/TENCON.2019.8929618
UNDP, (United Nations Development Programme). (2022). Human Development Report 2021-22. UNDP (United Nations Development Programme).
Verdhan, V. (2020). Supervised Learning with Python. In Supervised Learning with Python. Apress. https://doi.org/10.1007/978-1-4842-6156-9
Wirth, R., & Hipp, J. (2000). CRISP-DM: Towards a standard process model for data mining. Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining.
DOI: https://doi.org/10.31294/inf.v12i1.21874
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Budiman Budiman, Ilham Ramadhan, Nur Alamsyah

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Index by:
![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() | ![]() |
![]() | ![]() | ![]() |
Published LPPM Universitas Bina Sarana Informatika with supported by Relawan Jurnal Indonesia
Jl. Kramat Raya No.98, Kwitang, Kec. Senen, Jakarta Pusat, DKI Jakarta 10450, Indonesia

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License