Penanganan Overfitting pada Klasifikasi Berita Hoax berbasis Neural Networks dengan Dropout dan Regularization
Abstract
Penelitian ini mengevaluasi efektivitas berbagai teknik deteksi hoaks di Indonesia menggunakan model klasifikasi teks dengan dua ukuran dataset berbeda, yaitu 250 dan 650 sampel. Hoaks di media sosial memiliki dampak signifikan pada masyarakat, sehingga deteksi yang akurat sangat penting. Penelitian ini menguji tiga algoritma machine learning—ID CNN, Bi-LSTM, dan LSTM—dengan teknik regulasi seperti original, regularization, dan dropout. Hasil penelitian menunjukkan bahwa teknik regularisasi pada ID CNN memberikan akurasi tertinggi pada dataset 250 sampel, sementara Bi-LSTM dengan teknik original mencapai akurasi tertinggi pada dataset yang sama. Dataset yang lebih besar (600 sampel) menunjukkan bahwa teknik regularisasi pada ID CNN tetap stabil, sedangkan teknik dropout memberikan hasil yang bervariasi. Analisis menggunakan confusion matrix dan grafik learning menunjukkan adanya overfitting pada model, terutama pada dataset yang lebih kecil. Temuan ini menegaskan pentingnya penerapan teknik regulasi untuk mengurangi overfitting dan meningkatkan generalisasi model dalam deteksi hoaks. Penelitian ini memberikan kontribusi pada pengembangan sistem deteksi hoaks yang lebih efektif di Indonesia.
Keywords
Full Text:
PDFReferences
N. H. Ummah and M. S. Al Fajri, ‘Communication Strategies Used in Teaching Media Information Literacy for Combating Hoaxes in Indonesia: A Case Study of Indonesian National Movements’, Informacijos Mokslai, 2020, doi: 10.15388/im.2020.90.48.
A. K. Darmawan, M. W. Al Wajieh, M. B. Setyawan, T. Yandi, and H. Hoiriyah, ‘Hoax News Analysis for the Indonesian National Capital Relocation Public Policy With the Support Vector Machine and Random Forest Algorithms’, Journal of Information Systems and Informatics, 2023, doi: 10.51519/journalisi.v5i1.438.
D. Kaplan, R. Iida, and T. Tokunaga, ‘Automatic extraction of citation contexts for research paper summarization: a coreference-chain based approach’, Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries, no. August, pp. 88–95, 2009.
Putri Aisyiyah Rachma Dewi, ‘Mapping Hoaxes, Disinformation, and Hate Speeches in Indonesia’, Technium Social Sciences Journal, 2023, doi: 10.47577/tssj.v50i1.9943.
A. B. Prasetijo, R. R. Isnanto, D. Eridani, Y. A. Adi Soetrisno, M. Arfan, and A. Sofwan, ‘Hoax Detection System on Indonesian News Sites Based on Text Classification Using SVM and SGD’, 2017, doi: 10.1109/icitacee.2017.8257673.
A. Sudrajat, R. R. Wulandari, and E. Syafwan, ‘Indonesian Language Hoax News Classification Basedn on Naïve Bayes’, Journal of Applied Intelligent System, 2022, doi: 10.33633/jais.v7i1.5985.
L. H. Suadaa, I. Santoso, and A. T. Bulan Panjaitan, ‘Transfer Learning of Pre-Trained Transformers for Covid-19 Hoax Detection in Indonesian Language’, Ijccs (Indonesian Journal of Computing and Cybernetics Systems), 2021, doi: 10.22146/ijccs.66205.
G. N. Syaifuddiin et al., ‘Hoax Identification of Indonesian Tweeters Using Ensemble Classifier’, Journal of Information Systems and Telecommunication (Jist), 2023, doi: 10.52547/jist.33532.11.42.94.
M. I. Kaer Sinapoy, ‘Comparison of LSTM and IndoBERT Method in Identifying Hoax on Twitter’, Jurnal Resti (Rekayasa Sistem Dan Teknologi Informasi), 2023, doi: 10.29207/resti.v7i3.4830.
A. Salsabila and T. Suhardijanto, ‘Sentiment Analysis on Indonesian Political Hoaxes’, 2020, doi: 10.2991/assehr.k.200729.004.
A. Afdal et al., ‘Hoax Behavior Tendencies Among Indonesian Students: An Analysis During the COVID-19 Pandemic’, International Journal of Evaluation and Research in Education (Ijere), 2023, doi: 10.11591/ijere.v12i1.23632.
A. Shafira, ‘Hoax COVID-19 News Detection Based on Sentiment Analysis in Indonesian Using Support Vector Machine (SVM) Method’, International Journal on Information and Communication Technology (Ijoict), 2023, doi: 10.21108/ijoict.v8i2.682.
S. Ediyono, ‘Analysis of the General Election Hoax News Phenomenon From the Perspective of Pancasila as the Integrity of the Indonesian Nation’, Brazilian Journal of Development, 2024, doi: 10.34117/bjdv10n6-049.
J. C. Cruz-Victoria, ‘Long Short-Term Memory and Bidirectional Long Short-Term Memory Modeling and Prediction of Hexavalent and Total Chromium Removal Capacity Kinetics of Cupressus Lusitanica Bark’, Sustainability, 2024, doi: 10.3390/su16072874.
H. U. May Marma, M. T. Iqbal, and C. T. Seary, ‘Short-Term Power Load Forecast of an Electrically Heated House in St. John’s, Newfoundland, Canada’, European Journal of Electrical Engineering and Computer Science, 2020, doi: 10.24018/ejece.2020.4.3.210.
Y. Fan, S. Zhou, Y. Li, and R. Zhang, ‘Deep Learning Approaches for Extracting Adverse Events and Indications of Dietary Supplements From Clinical Text’, Journal of the American Medical Informatics Association, 2020, doi: 10.1093/jamia/ocaa218.
F. Karim, S. Majumdar, H. Darabi, and S. Chen, ‘LSTM Fully Convolutional Networks for Time Series Classification’, Ieee Access, 2018, doi: 10.1109/access.2017.2779939.
I. Y. R. Pratiwi, R. A. Asmara, and F. Rahutomo, ‘Study of hoax news detection using naïve bayes classifier in Indonesian language’, in 2017 11th International Conference on Information & Communication Technology and System (ICTS), 2017, pp. 73–78. doi: 10.1109/ICTS.2017.8265649.
DOI: https://doi.org/10.31294/jtk.v10i2.23121
Copyright (c) 2024 Ridwan Ilyas, Fatan Kasyidi, Maulidina Norick Eriyadi
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
ISSN: 2442-2436 (print), and 2550-0120