Cancer Detection in Breast Histopathological Images Using Extremely Randomized Trees

Mahendra Kanojia

doi:10.58190/imiens.2026.167

Authors

Mahendra Kanojia Department of Computer Science; Sheth L.U.J. and Sir M.V. College, India https://orcid.org/0000-0002-7628-8683

DOI:

https://doi.org/10.58190/imiens.2026.167

Keywords:

Breast Cancer Detection, Extremely Randomized Trees, Histopathological Images, Image Processing, Machine Learning, Recursive Feature Elimination

Abstract

This research aimed to develop an effective machine learning-based system for the automated detection of breast cancer using histopathological images, overcoming the limitations of manual examination. The study utilized a diverse dataset of 13,347 histopathological images from three secondary sources and one primary source. The inclusion of multiple image sources was intended to enhance the model’s versatility. Initially, images underwent pre-processing to reduce noise using a median filter and were converted to grayscale. Otsu's thresholding was then applied to enhance nucleus edges and reduce background noise. A recursive feature elimination algorithm was employed to reduce the initial 98 features to these 48 key ones, focusing on the area and shape of the nucleus, color-based features, and image texture. For classification, the Extremely Randomized Trees Classifier was used. The model was trained to classify images as benign or malignant. The results demonstrated high performance, with the model achieving an accuracy of 98.95%. Further evaluation revealed a sensitivity of 99.48%, indicating a low false negative rate. Specificity was 94.67%, correctly identifying benign cases. The model also achieved precision of 98.97% and recall of 99.48%, with a Kappa statistic of 97.62%, suggesting substantial agreement beyond chance. The ROC performance was 98.67%, indicating robust performance. This study highlights the potential of machine learning, specifically the Extremely Randomized Trees Classifier, for automated and accurate breast cancer detection from histopathological images. The high-performance metrics suggest the model can enhance diagnostic accuracy and assist pathologists in clinical decision-making.

Downloads

Download data is not yet available.

References

[1] World Health Organization, “Breast cancer,” World Health Organization, Fact sheet, Aug. 2025. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/breast-cancer

[2] C. G. Yedjou, S. S. Tchounwou, R. A. Aló, R. Elhag, B. Mochona, and L. Latinwo, “Application of Machine Learning Algorithms in Breast Cancer Diagnosis and Classification,” Int. J. Sci. Acad. Res., vol. 2, no. 1, pp. 3081–3086, Jan. 2021.

[3] M. G. Kanojia, Mohd. A. Mohd. H. Ansari, N. Gandhi, and S. K. Yadav, “Image Processing Techniques for Breast Cancer Detection: A Review,” in Intelligent Systems Design and Applications, vol. 1181, A. Abraham, P. Siarry, K. Ma, and A. Kaklauskas, Eds., in Advances in Intelligent Systems and Computing, vol. 1181. , Cham: Springer International Publishing, 2021, pp. 649–660. doi: 10.1007/978-3-030-49342-4_63.

[4] R. Krithiga and P. Geetha, “Breast Cancer Detection, Segmentation and Classification on Histopathology Images Analysis: A Systematic Review,” Arch. Comput. Methods Eng., vol. 28, no. 4, pp. 2607–2619, Jun. 2021, doi: 10.1007/s11831-020-09470-w.

[5] A. S. Boddu and A. Jan, “A systematic review of machine learning algorithms for breast cancer detection,” Tissue Cell, vol. 95, p. 102929, Aug. 2025, doi: 10.1016/j.tice.2025.102929.

[6] S. Nabajja, M. Kanojia, and T. Yadav, “Choledochal Cancer Region Detection in Hyperspectral Tissue Images Using U-Net,” in Intelligent Systems Design and Applications, vol. 1046, A. Abraham, A. Bajaj, T. Hanne, and P. Siarry, Eds., in Lecture Notes in Networks and Systems, vol. 1046. , Cham: Springer Nature Switzerland, 2024, pp. 316–325. doi: 10.1007/978-3-031-64813-7_33.

[7] A. Kuşcu and H. Erol, “Diagnosis of Breast Cancer by K-Mean Clustering and Otsu Thresholding Segmentation Methods,” Osman. Korkut Ata Üniversitesi Fen Bilim. Enstitüsü Derg., vol. 5, no. 1, pp. 258–281, Mar. 2022, doi: 10.47495/okufbed.994481.

[8] G. Alfian et al., “Predicting Breast Cancer from Risk Factors Using SVM and Extra-Trees-Based Feature Selection Method,” Computers, vol. 11, no. 9, p. 136, Sep. 2022, doi: 10.3390/computers11090136.

[9] Sajiv. G and G. Ramkumar, “A Robust Breast Cancer Classification Model using Extra-Trees Classifier for Histopathological Image,” in 2023 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI), Chennai, India: IEEE, May 2023, pp. 1–7. doi: 10.1109/ACCAI58221.2023.10199852.

[10] D. Sharma, R. Kumar, and A. Jain, “Breast cancer prediction based on neural networks and extra tree classifier using feature ensemble learning,” Meas. Sens., vol. 24, p. 100560, Dec. 2022, doi: 10.1016/j.measen.2022.100560.

[11] M. G. Kanojia and S. Abraham, “Breast cancer detection using RBF neural network,” in 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I), Greater Noida, India: IEEE, Dec. 2016, pp. 363–368. doi: 10.1109/IC3I.2016.7917990.

[12] M. F. Ak, “A Comparative Analysis of Breast Cancer Detection and Diagnosis Using Data Visualization and Machine Learning Applications,” Healthcare, vol. 8, no. 2, p. 111, Apr. 2020, doi: 10.3390/healthcare8020111.

[13] H. Tabrizchi, M. Tabrizchi, and H. Tabrizchi, “Breast cancer diagnosis using a multi-verse optimizer-based gradient boosting decision tree,” SN Appl. Sci., vol. 2, no. 4, p. 752, Apr. 2020, doi: 10.1007/s42452-020-2575-9.

[14] M. Phankokkruad, “Cost-Sensitive Extreme Gradient Boosting for Imbalanced Classification of Breast Cancer Diagnosis,” in 2020 10th IEEE International Conference on Control System, Computing and Engineering (ICCSCE), Penang, Malaysia: IEEE, Aug. 2020, pp. 46–51. doi: 10.1109/ICCSCE50387.2020.9204948.

[15] G. N. Gurav and M. G. Kanojia, “A review on classification of breast cancer histopathological images using convolutional neural networks,” Spec. Issue Int. J. Comput. Sci. Appl., vol. 13, no. 1, 2020.

[16] T. D. Murugan and M. G. Kanojia, “Breast Cancer Detection Using Texture Features and KNN Algorithm,” in Hybrid Intelligent Systems, vol. 1375, A. Abraham, T. Hanne, O. Castillo, N. Gandhi, T. Nogueira Rios, and T.-P. Hong, Eds., in Advances in Intelligent Systems and Computing, vol. 1375. , Cham: Springer International Publishing, 2021, pp. 793–802. doi: 10.1007/978-3-030-73050-5_77.

[17] M. G. Kanojia, Mohd. A. Mohd. H. Ansari, N. Gandhi, and S. K. Yadav, “Computer Aided System for Nuclei Localization in Histopathological Images Using CNN,” in Proceedings of the 11th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2019), vol. 1182, A. Abraham, M. A. Jabbar, S. Tiwari, and I. M. S. Jesus, Eds., in Advances in Intelligent Systems and Computing, vol. 1182. , Cham: Springer International Publishing, 2021, pp. 226–234. doi: 10.1007/978-3-030-49345-5_24.

[18] S. Abbas et al., “BCD-WERT: a novel approach for breast cancer detection using whale optimization based efficient features and extremely randomized tree algorithm,” PeerJ Comput. Sci., vol. 7, p. e390, Mar. 2021, doi: 10.7717/peerj-cs.390.

[19] A. Gupta et al., “Prediction of Breast Cancer Using Extremely Randomized Clustering Forests (ERCF) Technique: Prediction of Breast Cancer,” Int. J. Distrib. Syst. Technol., vol. 12, no. 4, pp. 1–15, Dec. 2021, doi: 10.4018/IJDST.287859.

[20] T. Elizabeth Mathew, “An optimized extremely randomized tree model for breast cancer classification,” J. Theor. Appl. Inf. Technol., vol. 100, no. 16, pp. 5234–5246, Aug. 2022.

[21] N. Binsaif, “Application of Machine Learning Models to the Detection of Breast Cancer,” Mob. Inf. Syst., vol. 2022, pp. 1–8, Mar. 2022, doi: 10.1155/2022/7340689.

[22] H. Liang, J. Li, H. Wu, L. Li, X. Zhou, and X. Jiang, “Mammographic Classification of Breast Cancer Microcalcifications through Extreme Gradient Boosting,” Electronics, vol. 11, no. 15, p. 2435, Aug. 2022, doi: 10.3390/electronics11152435.

[23] T. Tran, U. Le, and Y. Shi, “An effective up-sampling approach for breast cancer prediction with imbalanced data: A machine learning model-based comparative analysis,” PLOS ONE, vol. 17, no. 5, p. e0269135, May 2022, doi: 10.1371/journal.pone.0269135.

[24] E. A. Algehyne, M. L. Jibril, N. A. Algehainy, O. A. Alamri, and A. K. Alzahrani, “Fuzzy Neural Network Expert System with an Improved Gini Index Random Forest-Based Feature Importance Measure Algorithm for Early Diagnosis of Breast Cancer in Saudi Arabia,” Big Data Cogn. Comput., vol. 6, no. 1, p. 13, Jan. 2022, doi: 10.3390/bdcc6010013.

[25] A. Batool and Y.-C. Byun, “Toward Improving Breast Cancer Classification Using an Adaptive Voting Ensemble Learning Algorithm,” IEEE Access, vol. 12, pp. 12869–12882, 2024, doi: 10.1109/ACCESS.2024.3356602.

[26] M. Momtahen, S. Momtahen, R. Remaseshan, and F. Golnaraghi, “Early Detection of Breast Cancer using Diffuse Optical Probe and Ensemble Learning Method,” in 2023 IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization (NEMO), Winnipeg, MB, Canada: IEEE, Jun. 2023, pp. 139–142. doi: 10.1109/NEMO56117.2023.10202520.

[27] S. Naveed, “Prediction of Breast Cancer Through Random Forest,” Curr. Med. Imaging Rev., vol. 19, no. 10, p. e300922209414, Sep. 2023, doi: 10.2174/1573405618666220930150625.

[28] R. Sinha, M. Patel, S. Gupta, K. K. Sinha, and Prateeksha, “Performance Analysis of Breast Cancer Predictor using Machine Learning Techniques,” in 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India: IEEE, Jun. 2024, pp. 1–5. doi: 10.1109/ICCCNT61001.2024.10724436.

[29] S. Sasidharan Nair and M. Subaji, “Automated Identification of Breast Cancer Type Using Novel Multipath Transfer Learning and Ensemble of Classifier,” IEEE Access, vol. 12, pp. 87560–87578, 2024, doi: 10.1109/ACCESS.2024.3415482.

[30] B. N. Ravi Kumar, Naveen Chandra Gowda, A. B. J., V. H. N., B. Ben Sujitha, and D. Roja Ramani, “An Efficient Breast Cancer Detection Using Machine Learning Classification Models,” Int. J. Online Biomed. Eng. IJOE, vol. 20, no. 13, pp. 24–40, Oct. 2024, doi: 10.3991/ijoe.v20i13.50289.

[31] I. Kadhim Ajlan, H. Murad, A. A. Salim, and A. Fadhil Bin Yousif, “Extreme Learning machine algorithm for breast Cancer diagnosis,” Multimed. Tools Appl., vol. 84, no. 15, pp. 14739–14758, Jun. 2024, doi: 10.1007/s11042-024-19515-y.

[32] P. Sarker, A. Ksibi, M. M. Jamjoom, K. Choi, A. A. Nahid, and M. A. Samad, “Breast cancer prediction with feature-selected XGB classifier, optimized by metaheuristic algorithms,” J. Big Data, vol. 12, no. 1, p. 78, Apr. 2025, doi: 10.1186/s40537-025-01132-7.

[33] J. C. Caicedo et al., “Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl,” Nat. Methods, vol. 16, no. 12, pp. 1247–1253, Dec. 2019, doi: 10.1038/s41592-019-0612-7.

[34] F. A. Spanhol, L. S. Oliveira, C. Petitjean, and L. Heutte, “A Dataset for Breast Cancer Histopathological Image Classification,” IEEE Trans. Biomed. Eng., vol. 63, no. 7, pp. 1455–1462, Jul. 2016, doi: 10.1109/TBME.2015.2496264.

[35] E. Drelie Gelasca, B. Obara, D. Fedorov, K. Kvilekval, and B. Manjunath, “A biosegmentation benchmark for evaluation of bioimage analysis methods,” BMC Bioinformatics, vol. 10, no. 1, p. 368, Dec. 2009, doi: 10.1186/1471-2105-10-368.

[36] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for Cancer Classification using Support Vector Machines,” Mach. Learn., vol. 46, no. 1–3, pp. 389–422, Jan. 2002, doi: 10.1023/A:1012487302797.

[37] S. R. Vupulluri and J. K. Munagala, “Histopathological Image Analysis Using Deep Learning Framework,” in RAiSE-2023, MDPI, Dec. 2023, p. 132. doi: 10.3390/engproc2023059132.

[38] P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Mach. Learn., vol. 63, no. 1, pp. 3–42, Apr. 2006, doi: 10.1007/s10994-006-6226-1.

[39] L. Breiman, “Random Forests,” Mach. Learn., vol. 45, no. 1, pp. 5–32, Oct. 2001, doi: 10.1023/A:1010933404324.

[40] Jude Chukwura Obi, “A comparative study of several classification metrics and their performances on data,” World J. Adv. Eng. Technol. Sci., vol. 8, no. 1, pp. 308–314, Feb. 2023, doi: 10.30574/wjaets.2023.8.1.0054.

[41] Z. Hameed, S. Zahia, B. Garcia-Zapirain, J. Javier Aguirre, and A. María Vanegas, “Breast Cancer Histopathology Image Classification Using an Ensemble of Deep Learning Models,” Sensors, vol. 20, no. 16, p. 4373, Aug. 2020, doi: 10.3390/s20164373.

[42] S. Krishna, S. S. Suganthi, A. Bhavsar, J. Yesodharan, and S. Krishnamoorthy, “An interpretable decision-support model for breast cancer diagnosis using histopathology images,” J. Pathol. Inform., vol. 14, p. 100319, 2023, doi: 10.1016/j.jpi.2023.100319.

[43] E. M. Othman, “Breast Cancer Multi-Class Classification Using ViT Model,” Int. J. Comput. Appl., vol. 186, no. 13, pp. 13–18, 2024.

[44] A. Rafiq, A. Jaffar, G. Latif, S. Masood, and S. E. Abdelhamid, “Enhanced Multi-Class Breast Cancer Classification from Whole-Slide Histopathology Images Using a Proposed Deep Learning Model,” Diagnostics, vol. 15, no. 5, p. 582, Feb. 2025, doi: 10.3390/diagnostics15050582.