Analysis of FastText with Support Vector Machine for Hate Speech Classification on Twitter Social Media

Nabila Nuraini; Asslia Johar Latipah; Naufal Azmi Verdikha

doi:10.31294/inf.v11i2.21107

Analysis of FastText with Support Vector Machine for Hate Speech Classification on Twitter Social Media

Nabila Nuraini, Asslia Johar Latipah, Naufal Azmi Verdikha

Abstract

Hate speech refers to sentences or words that aim to demean or insult individuals, groups, or communities based on factors such as ethnicity, religion, race, or social class. In this study, Natural Language Processing (NLP) techniques were employed using FastText feature extraction and SVM algorithm for text classification. The evaluation was conducted using F1 Score as the performance metric. The data was divided using the Cross-Validation method with 10 folds, and the experiment was performed with four SVM kernels: RBF, Linear, Polynomial, and Sigmoid. The results of this research, based on the effectiveness of the FastTextSVM method combination, demonstrate a strong performance in hate speech classification. By adopting FastText parameters from previous studies and involving four SVM kernels, this research achieved a satisfactory average F1 Score. The results obtained for the Polynomial kernel showed the best performance with an F1 Score of 0.813, followed by the Linear kernel with 0.809, the RBF kernel with 0.808, and the Sigmoid kernel with 0.805. This indicates that the F1 Score results do not show significant differences in outcomes.

Keywords

Hate Speech, FastText Feature Extraction, Support Vector Machine

Full Text:

PDF

References

Adhari, A., Nasrun, M., & ... (2021). Deteksi Ujaran Ancaman Berbasis Website Pada Media Sosial Twitter Menggunakan Metode Support Vector Machine. EProceedings …, 8(2), 1920–1925. https://openlibrarypublications.telkomuniversity.ac.id/index.php/engineering/article/viewFile/14602/14381

Amalia, A., Sitompul, O. S., Nababan, E. B., & Mantoro, T. (2020). An Efficient Text Classification Using fastText for Bahasa Indonesia Documents Classification. 2020 International Conference on Data Science, Artificial Intelligence, and Business Analytics, DATABIA 2020 - Proceedings, 69–75. https://doi.org/10.1109/DATABIA50434.2020.9190447

Antariksa, K., Purnomo WP, Y. S., & Ernawati, E. (2019). Klasifikasi Ujaran Kebencian pada Cuitan dalam Bahasa Indonesia. Jurnal Buana Informatika, 10(2), 164. https://doi.org/10.24002/jbi.v10i2.2451

Baskoro, F., Andrahsmara, R. A., Darnoto, B. R. P., & Tofan, Y. A. (2021). A Systematic Comparison of Software Requirements Classification. IPTEK The Journal for Technology and Science, 32(3), 184. https://doi.org/10.12962/j20882033.v32i3.13005

Berrar, D. (2018). Cross-validation. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, 1–3(January 2018), 542–545. https://doi.org/10.1016/B978-0-12-809633-8.20349-X

Chen-Wishart, M. (2014). Python Machine Learning Third Edition. In Vascular (Issue January 2010).

Ibrohim, M. O., & Budi, I. (2019). Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter. 46–57. https://doi.org/10.18653/v1/w19-3506

Joulin, A., Grave, E., Bojanowski, P., Douze, M., Jégou, H., & Mikolov, T. (2016). FastText.zip: Compressing text classification models. 1–13. http://arxiv.org/abs/1612.03651

Kedia, A., & Rasu, M. (2020). Hands-On - Python Natural Language Processing.

Nugroho, K.C., (2019). Confusion Matrix untuk Evaluasi Model pada Supervised Learning. [Online] Tersedia di [Diakses 18 Juli 2023]

Nurdin, A., Anggo Seno Aji, B., Bustamin, A., & Abidin, Z. (2020). Perbandingan Kinerja Word Embedding Word2Vec, Glove, Dan Fasttext Pada Klasifikasi Teks. Jurnal Tekno Kompak, 14(2), 74. https://doi.org/10.33365/jtk.v14i2.732

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Barupal, D. K., & Fiehn, O. (2011). Scikit-learn: Machine Learning in Python. Environmental Health Perspectives, 127(9), 2825–2830. https://doi.org/10.1289/EHP4713

Santosa, B. (1995). 1 . Ide Dasar Support Vector Machine (Issue x).

Verdikha, N. A., Adji, T. B., & Permanasari, A. E. (2018). Komparasi Metode Oversampling Untuk Klasifikasi Teks Ujaran Kebencian. Seminar Nasional Teknologi Informasi Dan Multimedia 2018, 85–90.

DOI: https://doi.org/10.31294/inf.v11i2.21107