Digitization and Archiving of Company Invoices using Deep Learning and Text Recognition-Processing Techniques

Ensar GUNAYDIN; Bunyamin GENCTURK; Cuneyt ERGEN; Murat KÖKLÜ

doi:10.58190/imiens.2023.69

Authors

Ensar GUNAYDIN ALISAN Logistic Inc., Istanbul https://orcid.org/0000-0002-3341-1346
Bunyamin GENCTURK Technology Faculty, Selcuk University, Konya https://orcid.org/0009-0001-0944-2898
Cuneyt ERGEN ALISAN Logistic Inc., Istanbul https://orcid.org/0000-0002-4385-8600
Murat KÖKLÜ Selcuk.edu.tr https://orcid.org/0000-0002-2737-2360

DOI:

https://doi.org/10.58190/imiens.2023.69

Keywords:

Optical Character Recognition, Image Preprocessing, Document Digitization, Artificial Intelligence, Deep Learning

Abstract

Nowadays, it is crucial to transfer official documents such as invoices, dispatch notes, and receipts into digital environments and establish correct semantic relationships. However, understanding and processing these documents is a difficult process that requires significant time and effort. In recent years, the use of deep learning, image preprocessing, text detection, and optical character recognition (OCR) technologies have made this process easier. However, for text recognition and processing techniques to produce accurate results, documents must be clean and readable. Additionally, difficulties arising from time-consuming, tiring, error-prone, and cost-incurring human-powered digitalization processes must be reduced. The aim of this study is to digitize and archive scanned invoices and similar official documents using current artificial intelligence technologies, thereby enabling the most effective use of components such as time, cost, and human resources. The dataset used in the study includes 10,000 ".jpg" image files and 10,000 ".xml" data files. The model trained with the ResNet-50 architecture can detect text with accuracy rates of up to 97% on randomly selected images from the dataset. In an environment where a person can process an average of 2,112 documents per month, it is predicted that the trained artificial intelligence model can process 108,000 documents per month. With this developed method, businesses can quickly digitize and archive official documents such as invoices, dispatch notes, and receipts. Future studies propose the development of new methods that can produce better results using larger and more diverse datasets.

Downloads

Download data is not yet available.

References

Zhao, G., et al. Skip-connected deep convolutional autoencoder for restoration of document images. in 2018 24th International Conference on Pattern Recognition (ICPR). 2018. IEEE.

Subramani, N., et al., A survey of deep learning approaches for ocr and document understanding. arXiv preprint arXiv:2011.13534, 2020.

Saha, R., A. Mondal, and C. Jawahar. Graphical object detection in document images. in 2019 International Conference on Document Analysis and Recognition (ICDAR). 2019. IEEE.

Haralick, R.M. Document image understanding: Geometric and logical layout. in CVPR. 1994.

Hashmi, K.A., et al., Current status and performance analysis of table recognition in document images with deep neural networks. IEEE Access, 2021. 9: p. 87663-87685.

Soto, C. and S. Yoo. Visual detection with context for document layout analysis. in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.

Li, K., et al. Cross-domain document object detection: Benchmark suite and method. in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

Li, X.-H., F. Yin, and C.-L. Liu. Page object detection from pdf document images by deep structured prediction and supervised clustering. in 2018 24th International Conference on Pattern Recognition (ICPR). 2018. IEEE.

Tran, D.N., et al., Table detection from document image using vertical arrangement of text blocks. International Journal of Contents, 2015. 11(4): p. 77-85.

Qadri, M.T. and M. Asif. Automatic number plate recognition system for vehicle identification using optical character recognition. in 2009 International Conference on Education Technology and Computer. 2009. IEEE.

Mori, S., H. Nishida, and H. Yamada, Optical character recognition. 1999: John Wiley & Sons, Inc.

Chaudhuri, A., et al., Optical character recognition systems, in Optical Character Recognition Systems for Different Languages with Soft Computing. 2017, Springer. p. 9-41.

Islam, N., Z. Islam, and N. Noor, A survey on optical character recognition system. arXiv preprint arXiv:1710.05703, 2017.

Satyawan, W., et al. Citizen ID card detection using image processing and optical character recognition. in Journal of Physics: Conference Series. 2019. IOP Publishing.

Le, A.D., D.V. Pham, and T.A. Nguyen. Deep learning approach for receipt recognition. in International Conference on Future Data and Security Engineering. 2019. Springer.

Banerjee, J., A.M. Namboodiri, and C. Jawahar. Contextual restoration of severely degraded document images. in 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009. IEEE.

Cai, J.-F., et al. Blind motion deblurring from a single image using sparse approximation. in 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009. IEEE.

Bieniecki, W., S. Grabowski, and W. Rozenberg. Image preprocessing for improving ocr accuracy. in 2007 international conference on perspective technologies and methods in MEMS design. 2007. IEEE.

Chen, X., et al. An effective document image deblurring algorithm. in CVPR 2011. 2011. IEEE.

Bow, S.T., Pattern recognition and image preprocessing. 2002: CRC press.

Cho, H., J. Wang, and S. Lee. Text image deblurring using text-specific properties. in European Conference on Computer Vision. 2012. Springer.

Shan, Q., J. Jia, and A. Agarwala, High-quality motion deblurring from a single image. Acm transactions on graphics (tog), 2008. 27(3): p. 1-10.

Tupaj, S., et al., Extracting tabular information from text files. EECS Department, Tufts University, Medford, USA, 1996. 1.

Soni, R., B. Kumar, and S. Chand, Text detection and localization in natural scene images based on text awareness score. Applied Intelligence, 2019. 49(4): p. 1376-1405.

Sun, W., et al., Deep-learning-based complex scene text detection algorithm for architectural images. Mathematics, 2022. 10(20): p. 3914.

Patel, C., A. Patel, and D. Patel, Optical character recognition by open source OCR tool tesseract: A case study. International Journal of Computer Applications, 2012. 55(10): p. 50-56.

Mursari, L.R. and A. Wibowo, The effectiveness of image preprocessing on digital handwritten scripts recognition with the implementation of OCR Tesseract. Computer Engineering and Applications Journal, 2021. 10(3): p. 177-186.

Hegghammer, T., OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment. Journal of Computational Social Science, 2022. 5(1): p. 861-882.

Matas, J., et al., Robust wide-baseline stereo from maximally stable extremal regions. Image and vision computing, 2004. 22(10): p. 761-767.

Rezende, E., et al. Malicious software classification using transfer learning of resnet-50 deep neural network. in 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA). 2017. IEEE.

Lowe, D.G., Distinctive image features from scale-invariant keypoints. International journal of computer vision, 2004. 60(2): p. 91-110.

Liao, M., et al. Textboxes: A fast text detector with a single deep neural network. in Thirty-first AAAI conference on artificial intelligence. 2017.

Ren, S., et al., Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 2015. 28.

Al-Doori, S.K.S., Y.S. Taspinar, and M. Koklu, Distracted driving detection with machine learning methods by cnn based feature extraction. International Journal of Applied Mathematics Electronics and Computers, 2021. 9(4): p. 116-121.

Gu, J., et al., Recent advances in convolutional neural networks. Pattern recognition, 2018. 77: p. 354-377.

Koklu, M., I. Cinar, and Y.S. Taspinar, Classification of rice varieties with deep learning methods. Computers and electronics in agriculture, 2021. 187: p. 106285.

Lin, P., et al., A deep convolutional neural network architecture for boosting image discrimination accuracy of rice species. Food and Bioprocess Technology, 2018. 11(4): p. 765-773.

Krizhevsky, A., I. Sutskever, and G.E. Hinton, Imagenet classification with deep convolutional neural networks. Communications of the ACM, 2017. 60(6): p. 84-90.

Yasin, E.T., I.A. Ozkan, and M. Koklu, Detection of fish freshness using artificial intelligence methods. European Food Research and Technology, 2023: p. 1-12.

Unal, Y., et al., Application of pre-trained deep convolutional neural networks for coffee beans species detection. Food Analytical Methods, 2022. 15(12): p. 3232-3243.

Koklu, M., H. Kahramanli, and N. Allahverdi, A new approach to classification rule extraction problem by the real value coding. International Journal of Innovative Computing, Information and Control, 2012. 8(9): p. 6303-6315.

Altun, A. and M. Koklu, Optimizing the learning process of multi-layer perceptrons using a hybrid algorithm based on MVO and SA. International Journal of Industrial Engineering Computations, 2022. 13(4): p. 617-640.

Gencturk, B., et al., Detection of hazelnut varieties and development of mobile application with CNN data fusion feature reduction-based models. European Food Research and Technology, 2023: p. 1-14.

Taspinar, Y.S., M. Koklu, and M. Altin, Fire Detection in Images Using Framework Based on Image Processing, Motion Detection and Convolutional Neural Network. International Journal of Intelligent Systems and Applications in Engineering, 2021. 9(4): p. 171-177.

Butuner, R., et al., Classification of deep image features of lentil varieties with machine learning techniques. European Food Research and Technology, 2023. 249(5): p. 1303-1316.

Koklu, M., et al. Identification of sheep breeds by CNN-based pre-trained InceptionV3 model. in 2022 11th Mediterranean Conference on Embedded Computing (MECO). 2022. IEEE.

He, K., et al. Deep residual learning for image recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

Simonyan, K. and A. Zisserman, Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

Wen, L., X. Li, and L. Gao, A transfer convolutional neural network for fault diagnosis based on ResNet-50. Neural Computing and Applications, 2020. 32: p. 6111-6124.

Albawi, S., T.A. Mohammed, and S. Al-Zawi. Understanding of a convolutional neural network. in 2017 international conference on engineering and technology (ICET). 2017. Ieee.

Hu, J., L. Shen, and G. Sun. Squeeze-and-excitation networks. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

Taspinar, Y.S., et al., Computer vision classification of dry beans (Phaseolus vulgaris L.) based on deep transfer learning techniques. European Food Research and Technology, 2022. 248(11): p. 2707-2725.

Dogan, M., et al., Dry bean cultivars classification using deep cnn features and salp swarm algorithm based extreme learning machine. Computers and Electronics in Agriculture, 2023. 204: p. 107575.

Kursun, R., et al. Flower recognition system with optimized features for deep features. in 2022 11th Mediterranean Conference on Embedded Computing (MECO). 2022. IEEE.

Memon, J., et al., Handwritten optical character recognition (OCR): A comprehensive systematic literature review (SLR). IEEE Access, 2020. 8: p. 142642-142668.

Mayya, V., et al., An empirical study of preprocessing techniques with convolutional neural networks for accurate detection of chronic ocular diseases using fundus images. Applied Intelligence, 2023. 53(2): p. 1548-1566.

Rehman, A. and T. Saba, Neural networks for document image preprocessing: state of the art. Artificial Intelligence Review, 2014. 42: p. 253-273.

Mittal, R. and A. Garg. Text extraction using OCR: a systematic review. in 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA). 2020. IEEE.