Classification of Obesity Levels Using Machine Learning Algorithms
DOI:
https://doi.org/10.58190/imiens.2025.157Keywords:
Artificial Neural Networks, Classification, Ensemble Learning, Machine Learning, Obesity, Random ForestAbstract
Obesity has become a critical public health issue on a global scale due to the serious comorbidities and economic burden it brings. The aim of this study is to develop an effective machine learning model that can accurately determine obesity levels based on data including individuals' demographic characteristics and dietary habits, and to compare the performance of tree-based ensemble learning algorithms and Artificial Neural Network (ANN) approaches. In this context, classification was performed using Random Forest, XGBoost, CatBoost, and ANN (Artificial Neural Network) algorithms based on the open-source “Obesity Dataset” obtained from 1,610 participants and containing 14 different attributes. The models' performance was tested using a 5-fold cross-validation method and evaluated based on accuracy, f-score, precision, and recall using a confusion matrix. Experimental results show that tree-based ensemble models outperform the ANN approach in this dataset. The Random Forest algorithm was the most successful model with an accuracy rate of 94.34% and an F-score of 94.36, followed by XGBoost with an accuracy rate of 92.80%. In contrast, YSA remained at an accuracy rate of 82.98% and spent approximately 93 times more time in terms of training duration compared to Random Forest. When considering the obtained outputs, this study demonstrates that ensemble learning methods such as Random Forest are more efficient than ANN models in terms of both prediction accuracy and computational cost in the analysis of tabular health data, and that the developed model can be used as a reliable tool in clinical decision support systems.
Downloads
References
[1] Blüher, M. (2019). Obesity: global epidemiology and pathogenesis. Nature reviews endocrinology, 15(5), 288-298. https://doi.org/10.1038/s41574-019-0176-8
[2] Swinburn, B. A., Sacks, G., Hall, K. D., McPherson, K., Finegood, D. T., Moodie, M. L., & Gortmaker, S. L. (2011). The global obesity pandemic: shaped by global drivers and local environments. The lancet, 378 (9793), 804-814. https://doi.org/10.1016/S0140-6736(11)60813-1
[3] Bayram, S. Ş., & Aktaş, N. (2020). Selçuk Üniversitesi Öğrencilerinin Akdeniz Diyet Kalitelerinin Değerlendirilmesi. Beslenme ve Diyet Dergisi, 48(3), 65-75. https://doi.org/10.33076/2020.BDD.1386
[4] Guh, D.P., Zhang, W., Bansback, N. et al. (2009). The incidence of co-morbidities related to obesity and overweight: A systematic review and meta-analysis. BMC Public Health 9, 88. https://doi.org/10.1186/1471-2458-9-88
[5] Di Angelantonio, E., Bhupathiraju, S. N., Wormser, D., Gao, P., Kaptoge, S., De Gonzalez, A. B., ... & Hu, F. B. (2016). Body-mass index and all-cause mortality: individual-participant-data meta-analysis of 239 prospective studies in four continents. The lancet, 388 (10046), 776-786. https://doi.org/10.1016/S0140-6736(16)30175-1
[6] Tremmel, M., Gerdtham, U.-G., Nilsson, P. M., & Saha, S. (2017). Economic Burden of Obesity: A Systematic Literature Review. International Journal of Environmental Research and Public Health, 14 (4), 435. https://doi.org/10.3390/ijerph14040435
[7] Shaban, W.M., El-Din Moustafa, H. & El-Seddek, M.M. (2025). Machine learning framework for predicting susceptibility to obesity. Scientific Reports 15, 35040. https://doi.org/10.1038/s41598-025-20505-9
[8] Al Khushi Joshi, E. (2023). Comparison of Different Machine Learning and Self-Learning Methods for Predicting Obesity on Generalized and Gender-Segregated Data. International Journal of Recent Innovation in Clouds, Computing, Analytics and Networking, 11(10), 464-471. https://doi.org/10.17762/ijritcc.v11i10.8510
[9] Ölçer, E. Makine Öğrenmesi Temelli Obezite Durum Tahminlemesi. Bilgisayar Bilimleri ve Mühendisliği Dergisi, 17(2), 156-164. https://doi.org/10.54525/bbmd.1469701
[10] Suwarno, Murnaka, N. P., Prasetyo, P. W., & Arifin, S. (2023). Performance comparison of machine learning algorithms for predicting obesity level. AIP Conf. Proc. 2733 (1): 020002. https://doi.org/10.1063/5.0140856
[11] Musa, F., & Basaky, F. (2022). Obesity prediction using machine learning techniques. Journal of Applied Artificial Intelligence, 3 (1), 24-33. https://doi.org/10.48185/jaai.v3i1.470
[12] Kıvrak, M. (2021). Deep learning-based prediction of obesity levels according to eating habits and physical condition. The Journal of Cognitive Systems, 6 (1), 24-27. https://doi.org/10.52876/jcs.939875
[13] Safaei, M., Sundararajan, E. A., Driss, M., Boulila, W., & Shapi'i, A. (2021). A systematic literature review on obesity: Understanding the causes & consequences of obesity and reviewing various machine learning approaches used to predict obesity. Computers in biology and medicine, 136, 104754. https://doi.org/10.1016/j.compbiomed.2021.104754
[14] DeGregory, K. W., Kuiper, P., DeSilvio, T., Pleuss, J. D., Miller, R., Roginski, J. W., ... & Thomas, D. M. (2018). A review of machine learning in obesity. Obesity reviews, 19 (5), 668-685. https://doi.org/10.1111/obr.12667
[15] Jindal, K., Baliyan, N., & Rana, P. S. (2018). Obesity Prediction Using Ensemble Machine Learning Approaches. In: Sa, P., Bakshi, S., Hatzilygeroudis, I., & Sahoo, M. (Eds.), Recent Findings in Intelligent Computing Techniques. Advances in Intelligent Systems and Computing, vol 708, pp. 397-405. Springer, Singapore. https://doi.org/10.1007/978-981-10-8636-6_37
[16] Sulak, S. A. (2024). Obesity Dataset. Kaggle. https://www.kaggle.com/datasets/suleymansulak/obesity-dataset (Access date: 12 November 2025).
[17] Koklu, N., & Sulak, S. A. (2024). Using Artificial Intelligence Techniques for the Analysis of Obesity Status According to the Individuals' Social and Physical Activities. Sinop Üniversitesi Fen Bilimleri Dergisi, 9(1), 217-239. https://doi.org/10.33484/sinopfbd.1445215
[18] Mcgill, R., Tukey, J. W., & Larsen, W. A. (1978). Variations of Box Plots. The American Statistician, 32 (1), 12–16. https://doi.org/10.1080/00031305.1978.10479236
[19] Gu, Z. (2022). Complex heatmap visualization. Imeta, 1 (3), e43. https://doi.org/10.1002/imt2.43
[20] Koklu, N., & Sulak, S. A. (2024). Classification of Environmental Attitudes with Artificial Intelligence Algorithms. Intelligent Methods In Engineering Sciences, 3(2), 54-62. https://doi.org/10.58190/imiens.2024.99
[21] Butuner, R., Cinar, I., Taspinar, Y. S., Kursun, R., Calp, M. H., & Koklu, M. (2023). Classification of deep image features of lentil varieties with machine learning techniques. European Food Research and Technology, 249(5), 1303-1316. https://doi.org/10.1007/s00217-023-04214-z
[22] Koklu, M., & Ozkan, I. A. (2020). Multiclass classification of dry beans using computer vision and machine learning techniques. Computers and Electronics in Agriculture, 174, 105507. https://doi.org/10.1016/j.compag.2020.105507
[23] Kursun, R., & Koklu, M. (2025). Classification of Eggplant Diseases Using Feature Extraction with AlexNet and Random Forest. Sinop University Journal of Natural Sciences, 10(1), 1-15.
[24] Cinar, I., & Koklu, M. (2019). Classification of rice varieties using artificial intelligence methods. International Journal of Intelligent Systems and Applications in Engineering, 7(3), 188-194. https://doi.org/10.18201/ijisae.2019355381
[25] Dong, X., Yu, Z., Cao, W. et al. A survey on ensemble learning. Frontiers of Computer Science. 14(2), 241–258 (2020). https://doi.org/10.1007/s11704-019-8208-z
[26] Ozkan, I. A., & Koklu, M. (2017). Skin lesion classification using machine learning algorithms. International Journal of Intelligent Systems and Applications in Engineering, 5(4), 285-289. https://doi.org/10.18201/ijisae.2017534420
[27] Breiman, L. (2001). Random Forests. Machine Learning 45, 5–32. https://doi.org/10.1023/A:1010933404324
[28] Yasin, E., & Koklu, M. (2023, December). Utilizing Random forests for the classification of pudina leaves through feature extraction with inceptionV3 and VGG19. In Proceedings of the International Conference on New Trends in Applied Sciences (Vol. 1, pp. 1-8).
[29] Chen, T. & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Cornell University. https://doi.org/10.1145/2939672.2939785
[30] Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, 31. https://doi.org/10.48550/arXiv.1706.09516
[31] Koklu, M., Sarigil, S., & Ozbek, O. (2021). The use of machine learning methods in classification of pumpkin seeds (Cucurbita pepo L.). Genetic Resources and Crop Evolution, 68(7), 2713-2726. https://doi.org/10.1007/s10722-021-01226-0
[32] Koklu, M., Unlersen, M. F., Ozkan, I. A., Aslan, M. F., & Sabanci, K. (2022). A CNN-SVM study based on selected deep features for grapevine leaves classification. Measurement, 188, 110425. https://doi.org/10.1016/j.measurement.2021.110425
[33] Krogh, A. (2008). What are artificial neural networks?. Nature Biotechnology 26, 195–197. https://doi.org/10.1038/nbt1386
[34] Berrar, D. (2019) Cross-Validation. In: Ranganathan, S., Gribskov, M., Nakai, K. and Christian Schönbach, C., Eds., Reference Module in Life Sciences Encyclopedia of Bioinformatics and Computational Biology, Vol. 1, Elsevier, Amsterdam, 542-545. https://doi.org/10.1016/B978-0-12-809633-8.20349-X
[35] Visa, S., Ramsay, B., Ralescu, A. L., & Van Der Knaap, E. (2011). Confusion matrix-based feature selection. Midwest Artificial Intelligence and Cognitive Science Conference, 710 (1), 120-127.
[36] Yasin, E. T., & Koklu, M. (2025). A comparative analysis of machine learning algorithms for waste classification: inceptionv3 and chi-square features. International Journal of Environmental Science and Technology, 22(10), 9415-9428. https://doi.org/10.1007/s13762-024-06233-z
[37] Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International Journal of Data Mining and Knowledge Management Process, 5 (2), 1. https://doi.org/10.5121/ijdkp.2015.5201
[38] Saritas, M. M., & Koklu, M. (2024). Classification of cauliflower leaf diseases using features extracted from Squeezenet with decision tree and random forest. In: Proceedings of the 4th International Conference on Frontiers in Academic Research (ICFARI 2024), Konya, Türkiye, pp. 563-572.
[39] Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511809071
[40] Koklu, M., Cinar, I., & Taspinar, Y. S. (2022). CNN-based bi-directional and directional long-short term memory network for determination of face mask. Biomedical signal processing and control, 71, 103216. https://doi.org/10.1016/j.bspc.2021.103216
[41] Koklu, N., & Sulak, S. A. (2024). Recent Developments in Educational Data Mining: A Four-Year Bibliometric Analysis. Advances in Education Sciences, M. Dalkılıç and O. Soslu, Eds. Platanus Publishing, 5-29.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Intelligent Methods In Engineering Sciences

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

