SemetonBug: A Machine Learning Model for Automatic Bug Detection in Python Code Based on Syntactic Analysis

Bahtiar Imran, Selamet Riadi, Emi Suryadi, M. Zulpahmi, Zaeniah Zaeniah, Erfan Wahyudi

Abstract


Bug detection in Python programming is a crucial aspect of software development. This study develops an automated bug detection system using feature extraction based on Abstract Syntax Tree (AST) and a Random Forest Classifier model. The dataset consists of 100 manually classified bugged files and 100 non-bugged files. The model is trained using structural code features such as the number of functions, classes, variables, conditions, and exception handling. Evaluation results indicate an accuracy of 86.67%, with balanced precision and recall across both classes. Confusion matrix analysis identifies the presence of false positives and false negatives, albeit in relatively low numbers. The accuracy curve suggests a potential overfitting issue, as training accuracy is higher than testing accuracy. This study demonstrates that the combination of AST-based feature extraction and Random Forest can be an effective approach for automated bug detection, with potential improvements through model optimization and a larger dataset.


Keywords


Abstract Syntax Tree, Random Forest, Machine learning

Full Text:

PDF

References


Adarsh, T. K., Sinchana, R., C, K. P., & Uday, R. (2023). Software Bug Prediction Using Machine Learning Approach. International Journal for Research in Applied Science & Engineering Technology (IJRASET), 11(Xii), 1401–1405.

Alam Zaidi, S. F., Awan, F. M., Lee, M., Woo, H., & Lee, C.-G. (2020). Applying Convolutional Neural Networks With Different Word Representation Techniques to Recommend Bug Fixers. Ieee Access. https://doi.org/10.1109/access.2020.3040065

Albattah, W., & Alzahrani, M. (2024). Software Defect Prediction based on Machine Learning and Deep Learning. AI, 116–122. https://doi.org/10.1109/ICICT54344.2022.9850643

Allamanis, M., Jackson-Flux, H., & Brockschmidt, M. (2021). Self-Supervised Bug Detection and Repair. https://doi.org/10.48550/arxiv.2105.12787

Chen, Z., Ma, W., Wei, L., Chen, L., Li, Y., & Xu, B. (2017). A Study on the Changes of Dynamic Feature Code When Fixing Bugs: Towards the Benefits and Costs of Python Dynamic Features. Science China Information Sciences. https://doi.org/10.1007/s11432-017-9153-3

Deng, W., Mang, Q., Zhang, C., & Rigger, M. (2024). Finding Logic Bugs in Spatial Database Engines Via Affine Equivalent Inputs. Proceedings of the Acm on Management of Data. https://doi.org/10.1145/3698810

Elmishali, A., Stern, R., & Kalech, M. (2019). DeBGUer: A Tool for Bug Prediction and Diagnosis. Proceedings of the Aaai Conference on Artificial Intelligence. https://doi.org/10.1609/aaai.v33i01.33019446

Hammouri, A., Hammad, M., Alnabhan, M. M., Alnabhan, M., & Alsarayrah, F. (2018). Software Bug Prediction using Machine Learning Approach Network Routing View project E-learning View project Software Bug Prediction using Machine Learning Approach. Article in International Journal of Advanced Computer Science and Applications, 9(2), 78–83. www.ijacsa.thesai.org

Hu, M., & Zhang, Y. (2022). An Empirical Study of the Python/C API on Evolution and Bug Patterns. Journal of Software Evolution and Process. https://doi.org/10.1002/smr.2507

Immaculate, S. D., Begam, M. F., & Floramary< M. (2022). Software Bug Prediction Using Supervised Machine Learning Algorithms. IEEE Access, 849–869. https://doi.org/10.4018/978-1-6684-6291-1.ch044

Khan, F., Kanwal, S., Alamri, S., & Mumtaz, B. (2020). Hyper-Parameter Optimization of Classifiers, Using an Artificial Immune Network and Its Application to Software Bug Prediction. Ieee Access. https://doi.org/10.1109/access.2020.2968362

Meenakshi, & Singh, D. S. (2019). Software Bug Prediction Using Supervised Machine Learning Algorithms. International Research Journal of Engineering and Technology (IRJET), 4968–4971. https://doi.org/10.1109/IconDSC.2019.8816965

Mostafa, S., Cynthia, S. T., Roy, B., & Mondal, D. (2025). Feature transformation for improved software bug detection and commit classification. Journal of Systems and Software, 219(July 2024), 112205. https://doi.org/10.1016/j.jss.2024.112205

Nguyen, A.-T. P., & Hoang, V.-D. (2024). Development of Code Evaluation System based on Abstract Syntax Tree. Journal of Technical Education Science, 19(1), 15–24. https://doi.org/10.54644/jte.2024.1514

P, R., & Kambli, P. (2020). Analysis on Detecting a Bug in a Software using Machine Learning. International Journal of Recent Technology and Engineering (IJRTE), 9(2), 1195–1199. https://doi.org/10.35940/ijrte.b4119.079220

Shukla, A., Hudemann, K. N., Vági, Z., Hügerich, L., Smaragdakis, G., Hecker, A., Schmid, S., & Feldmann, A. (2021). Fix With P6: Verifying Programmable Switches at Runtime. https://doi.org/10.1109/infocom42981.2021.9488772

Verma, K. (2024). Bug Prediction using Machine Learning. International Journal of Computer Science & Communication, 15(1), 15–23.

Widyasari, R., Sim, S. Q., Lok, C., Qi, H., Phan, J., Tay, Q., Tan, C., Wee, F., Tan, J. E., Yieh, Y., P. Goh, B. K., Thung, F., Kang, H. J., Hoang, T., Lo, D., & Ouh, E. L. (2020). BugsInPy: A Database of Existing Bugs in Python Programs to Enable Controlled Testing and Debugging Studies. https://doi.org/10.1145/3368089.3417943

Zhang, M., Wu, Y., Lu, S., Qi, S., Ren, J., & Zheng, W. (2014). AI: A Lightweight System for Tolerating Concurrency Bugs. https://doi.org/10.1145/2635868.2635885




DOI: https://doi.org/10.31294/inf.v12i2.25340

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Bahtiar Imran, Selamet Riadi, Emi Suryadi, M. Zulpahmi, Zaeniah Zaeniah, Erfan Wahyudi

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Index by:

 
 
 Published LPPM Universitas Bina Sarana Informatika with supported by Relawan Jurnal Indonesia

Jl. Kramat Raya No.98, Kwitang, Kec. Senen, Jakarta Pusat, DKI Jakarta 10450, Indonesia
Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License