SemetonBug: A Machine Learning Model for Automatic Bug Detection in Python Code Based on Syntactic Analysis
Abstract
Bug detection in Python programming is a crucial aspect of software development. This study develops an automated bug detection system using feature extraction based on Abstract Syntax Tree (AST) and a Random Forest Classifier model. The dataset consists of 100 manually classified bugged files and 100 non-bugged files. The model is trained using structural code features such as the number of functions, classes, variables, conditions, and exception handling. Evaluation results indicate an accuracy of 86.67%, with balanced precision and recall across both classes. Confusion matrix analysis identifies the presence of false positives and false negatives, albeit in relatively low numbers. The accuracy curve suggests a potential overfitting issue, as training accuracy is higher than testing accuracy. This study demonstrates that the combination of AST-based feature extraction and Random Forest can be an effective approach for automated bug detection, with potential improvements through model optimization and a larger dataset.
Keywords
Full Text:
PDFReferences
Adarsh, T. K., Sinchana, R., C, K. P., & Uday, R. (2023). Software Bug Prediction Using Machine Learning Approach. International Journal for Research in Applied Science & Engineering Technology (IJRASET), 11(Xii), 1401–1405.
Alam Zaidi, S. F., Awan, F. M., Lee, M., Woo, H., & Lee, C.-G. (2020). Applying Convolutional Neural Networks With Different Word Representation Techniques to Recommend Bug Fixers. Ieee Access. https://doi.org/10.1109/access.2020.3040065
Albattah, W., & Alzahrani, M. (2024). Software Defect Prediction based on Machine Learning and Deep Learning. AI, 116–122. https://doi.org/10.1109/ICICT54344.2022.9850643
Allamanis, M., Jackson-Flux, H., & Brockschmidt, M. (2021). Self-Supervised Bug Detection and Repair. https://doi.org/10.48550/arxiv.2105.12787
Chen, Z., Ma, W., Wei, L., Chen, L., Li, Y., & Xu, B. (2017). A Study on the Changes of Dynamic Feature Code When Fixing Bugs: Towards the Benefits and Costs of Python Dynamic Features. Science China Information Sciences. https://doi.org/10.1007/s11432-017-9153-3
Deng, W., Mang, Q., Zhang, C., & Rigger, M. (2024). Finding Logic Bugs in Spatial Database Engines Via Affine Equivalent Inputs. Proceedings of the Acm on Management of Data. https://doi.org/10.1145/3698810
Elmishali, A., Stern, R., & Kalech, M. (2019). DeBGUer: A Tool for Bug Prediction and Diagnosis. Proceedings of the Aaai Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v33i01.33019446
Hammouri, A., Hammad, M., Alnabhan, M. M., Alnabhan, M., & Alsarayrah, F. (2018). Software Bug Prediction using Machine Learning Approach Network Routing View project E-learning View project Software Bug Prediction using Machine Learning Approach. Article in International Journal of Advanced Computer Science and Applications, 9(2), 78–83. www.ijacsa.thesai.org
Hu, M., & Zhang, Y. (2022). An Empirical Study of the Python/C API on Evolution and Bug Patterns. Journal of Software Evolution and Process. https://doi.org/10.1002/smr.2507
Immaculate, S. D., Begam, M. F., & Floramary< M. (2022). Software Bug Prediction Using Supervised Machine Learning Algorithms. IEEE Access, 849–869. https://doi.org/10.4018/978-1-6684-6291-1.ch044
Khan, F., Kanwal, S., Alamri, S., & Mumtaz, B. (2020). Hyper-Parameter Optimization of Classifiers, Using an Artificial Immune Network and Its Application to Software Bug Prediction. Ieee Access. https://doi.org/10.1109/access.2020.2968362
Meenakshi, & Singh, D. S. (2019). Software Bug Prediction Using Supervised Machine Learning Algorithms. International Research Journal of Engineering and Technology (IRJET), 4968–4971. https://doi.org/10.1109/IconDSC.2019.8816965
Mostafa, S., Cynthia, S. T., Roy, B., & Mondal, D. (2025). Feature transformation for improved software bug detection and commit classification. Journal of Systems and Software, 219(July 2024), 112205. https://doi.org/10.1016/j.jss.2024.112205
Nguyen, A.-T. P., & Hoang, V.-D. (2024). Development of Code Evaluation System based on Abstract Syntax Tree. Journal of Technical Education Science, 19(1), 15–24. https://doi.org/10.54644/jte.2024.1514
P, R., & Kambli, P. (2020). Analysis on Detecting a Bug in a Software using Machine Learning. International Journal of Recent Technology and Engineering (IJRTE), 9(2), 1195–1199. https://doi.org/10.35940/ijrte.b4119.079220
Shukla, A., Hudemann, K. N., Vági, Z., Hügerich, L., Smaragdakis, G., Hecker, A., Schmid, S., & Feldmann, A. (2021). Fix With P6: Verifying Programmable Switches at Runtime. https://doi.org/10.1109/infocom42981.2021.9488772
Verma, K. (2024). Bug Prediction using Machine Learning. International Journal of Computer Science & Communication, 15(1), 15–23.
Widyasari, R., Sim, S. Q., Lok, C., Qi, H., Phan, J., Tay, Q., Tan, C., Wee, F., Tan, J. E., Yieh, Y., P. Goh, B. K., Thung, F., Kang, H. J., Hoang, T., Lo, D., & Ouh, E. L. (2020). BugsInPy: A Database of Existing Bugs in Python Programs to Enable Controlled Testing and Debugging Studies. https://doi.org/10.1145/3368089.3417943
Zhang, M., Wu, Y., Lu, S., Qi, S., Ren, J., & Zheng, W. (2014). AI: A Lightweight System for Tolerating Concurrency Bugs. https://doi.org/10.1145/2635868.2635885
DOI: https://doi.org/10.31294/inf.v12i2.25340
Refbacks
- There are currently no refbacks.
Copyright (c) 2025 Bahtiar Imran, Selamet Riadi, Emi Suryadi, M. Zulpahmi, Zaeniah Zaeniah, Erfan Wahyudi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Index by:
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Published LPPM Universitas Bina Sarana Informatika with supported by Relawan Jurnal Indonesia
Jl. Kramat Raya No.98, Kwitang, Kec. Senen, Jakarta Pusat, DKI Jakarta 10450, Indonesia

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License