Optimizing Sentiment Analysis on the Linux Desktop Using N-Gram Features

Muhamad Taufiq Hidayat, Rudi Kurniawan, Tati Suprapti

Abstract


Linux, or GNU/Linux, is a widely used open-source operating system built on the Linux kernel that is available for anyone to use, known for its security and privacy advantages. With advancements in information technology, protecting privacy has become increasingly challenging due to data extraction practices done by major tech companies. This has encouraged some Mastodon users to switch to Linux, with many expressing their opinions on using Linux as their main operating system. This research seeks to analyze the sentiments of Mastodon users toward Linux through sentiment analysis to understand whether the trend is predominantly positive, negative, or neutral. The methodology used includes collecting data with the help of the Mastodon.py library witch then gets manually labelled with the assistance of a linguistic expert as well as a linguistic rule proposed by previous research. The text mining process includes preprocessing steps which includes feature extraction with n-Gram to gain the most optimize result as well as employing feature selection using TF-IDF. The Naïve Bayes algorithm is employed for text classification. The entire process of data analysis is conducted with the help of AI Studio (RapidMiner) software. The results show that the highest-performing model for sentiment analysis is achieved with an n-gram value of 3, revealing user sentiment polarity towards Linux on Mastodon as follows: 42% positive, 28% negative, and 30% neutral. The sentiment analysis model has an accuracy of 63%, with a precision of 70%, recall of 80%, and an f1-score of 74% which shows that this method is able to optimize the sentiment analysis process.

Keywords


n-Gram Feature, Sentiment Analysis, Linux Desktop

Full Text:

PDF

References


Abbas, M., Kamran, A., Memon, Jamali, A. A., Saleemullah Memon, & Anees Ahmed. (2019). Multinomial Naive Bayes Classification Model for Sentiment Analysis. Unpublished. https://doi.org/10.13140/RG.2.2.30021.40169

Abdullah, N. A. S., & Rusli, N. I. A. (2021). Multilingual Sentiment Analysis: A Systematic Literature Review. Pertanika Journal of Science and Technology, 29(1). https://doi.org/10.47836/pjst.29.1.25

Aljameel, S. S., Alabbad, D. A., Alzahrani, N. A., Alqarni, S. M., Alamoudi, F. A., Babili, L. M., Aljaafary, S. K., & Alshamrani, F. M. (2021). A Sentiment Analysis Approach to Predict an Individual’s Awareness of the Precautionary Procedures to Prevent COVID-19 Outbreaks in Saudi Arabia. International Journal of Environmental Research and Public Health, 18(1), Article 1. https://doi.org/10.3390/ijerph18010218

Atmadja, A. R., Uriawan, W., Pritisen, F., Maylawati, D. S., & Arbain, A. (2019). Comparison of Naive Bayes and K-nearest neighbours for online transportation using sentiment analysis in social media. Journal of Physics: Conference Series, 1402(7), 077029. https://doi.org/10.1088/1742-6596/1402/7/077029

Bazuku, R., Anab, A., Gyemerah, S., & Mohammed, I. D. (2023). An Overview of Computer Operating Systems and Emerging Trends (SSRN Scholarly Paper No. 4609975). Social Science Research Network. https://papers.ssrn.com/abstract=4609975

Blum, R. (2023). Linux Fundamentals (2nd ed.). Jones & Bartlett Learning.

Bochkarev, V., Shevlyakova, A., & Solovyev, V. (2012). Average word length dynamics as indicator of cultural changes in society. Social Evolution and History, 14, 153–175.

Boras, M., Balen, J., & Vdovjak, K. (2020). Performance Evaluation of Linux Operating Systems. 2020 International Conference on Smart Systems and Technologies (SST), 115–120. https://doi.org/10.1109/SST49455.2020.9264055

Brembs, B., Lenardic, A., Murray-Rust, P., Chan, L., & Irawan, D. E. (2023). Mastodon over Mammon: Towards publicly owned scholarly knowledge. Royal Society Open Science, 10(7), 230207. https://doi.org/10.1098/rsos.230207

Cao, L., & Shen, H. (2022). CSS: Handling imbalanced data by improved clustering with stratified sampling. Concurrency and Computation: Practice and Experience, 34(2), e6071. https://doi.org/10.1002/cpe.6071

Cheng, C.-H., & Chen, H.-H. (2019). Sentimental text mining based on an additional features method for text classification. PLOS ONE, 14(6), e0217591. https://doi.org/10.1371/journal.pone.0217591

Humble, K. P. (2021). International law, surveillance and the protection of privacy. In The Right to Privacy Revisited. Routledge.

Kissell, J. (2024). Take Control of Your Online Privacy, 5th Edition. alt concepts.

Lazuardi, M. T., Suprapti, T., & Wijaya, Y. A. (2023). PERANCANGAN MODEL SENTIMEN TWEET TERHADAP PILKADA DKI JAKARTA TAHUN 2017 MENGGUNAKAN ALGORITMA NAÏVE BAYES. JATI (Jurnal Mahasiswa Teknik Informatika), 7(1), Article 1. https://doi.org/10.36040/jati.v7i1.6328

Lestandy, M., Abdurrahim, A., & Syafa’ah, L. (2021). Analisis Sentimen Tweet Vaksin COVID-19 Menggunakan Recurrent Neural Network dan Naïve Bayes. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(4), Article 4. https://doi.org/10.29207/resti.v5i4.3308

Pang, Y., Xue, X., & Namin, A. S. (2015). Predicting Vulnerable Software Components through N-Gram Analysis and Statistical Feature Selection. 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), 543–548. https://doi.org/10.1109/ICMLA.2015.99

Putu, N. L. P. M., Amrullah, A. Z., & Ismarmiaty. (2021). Analisis Sentimen dan Pemodelan Topik Pariwisata Lombok Menggunakan Algoritma Naive Bayes dan Latent Dirichlet Allocation. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(1), Article 1. https://doi.org/10.29207/resti.v5i1.2587

Saraswathi, N., Sasi Rooba, T., & Chakaravarthi, S. (2023). Improving the accuracy of sentiment analysis using a linguistic rule-based feature selection method in tourism reviews. Measurement: Sensors, 29, 100888. https://doi.org/10.1016/j.measen.2023.100888

Sianipar, J. F., Ramadhan, Y. R., & Jaelani, I. (2023). Analisis Sentimen Pembangunan Kereta Cepat Jakarta-Bandung di Media Sosial Twitter Menggunakan Metode Naive Bayes. KLIK: Kajian Ilmiah Informatika Dan Komputer, 4(1), Article 1. https://doi.org/10.30865/klik.v4i1.1033

Tiffani, I. E. (2020). Optimization of Naïve Bayes Classifier By Implemented Unigram, Bigram, Trigram for Sentiment Analysis of Hotel Review. Journal of Soft Computing Exploration, 1(1), Article 1. https://doi.org/10.52465/joscex.v1i1.4

Xia, X., & Yan, J. (2021). Construction of Music Teaching Evaluation Model Based on Weighted Naïve Bayes. Scientific Programming, 2021(1), 9. https://doi.org/1058-9244

Xu, S. (2016). Bayesian Naïve Bayes classifiers to text classification. 44(1). https://doi.org/10.1177/0165551516677946




DOI: https://doi.org/10.31294/inf.v12i1.24773

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Muhamad Taufiq Hidayat, Rudi Kurniawan, Tati Suprapti

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Index by:

 
 Published LPPM Universitas Bina Sarana Informatika with supported by Relawan Jurnal Indonesia

Jl. Kramat Raya No.98, Kwitang, Kec. Senen, Jakarta Pusat, DKI Jakarta 10450, Indonesia
Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License