Implementation of YOLOv12 and PaddleOCR for Indonesian Bank Statement Table Extraction

Authors

  • Samuel Miracle Kristanto Universitas Ciputra Surabaya
  • Evan Tanuwijaya Universitas Ciputra Surabaya

DOI:

10.33395/sinkron.v10i1.15383

Keywords:

Bank statements, Financial data extraction, PaddleOCR, Table detection, YOLOv12

Abstract

The increasing reliance on digital financial documents has highlighted the need for automated methods to extract structured information from bank statements. Traditional optical character recognition (OCR) systems often fail to capture complex tabular structures, leading to incomplete or error-prone transaction records. To address this challenge, this research proposes a two-stage detection and recognition pipeline that combines YOLOv12 for table and structural element detection with PaddleOCR for text extraction, followed by automated Excel conversion. The objective of this study is to improve accuracy in localizing tables, detecting rows and columns, and generating structured financial data that can be directly utilized for downstream applications. The methods involve training a YOLOv12-n model in two stages: Stage 1 focuses on detecting entire table regions, while Stage 2 focuses on identifying row and column structures within the detected tables. A lightweight AdamW optimizer with conservative augmentation strategies was applied to preserve the geometric integrity of document layouts. Results show that Stage 1 achieved precision of 0.998, recall of 1.0, and mAP50-95 of 0.989, while Stage 2 achieved precision of 0.992, recall of 0.964, and mAP50-95 of 0.899, demonstrating strong localization and structural recognition. The conclusions confirm that the proposed two-stage pipeline is effective for financial document processing, with potential applications in digital banking, auditing, and automated record management. Future research may focus on expanding datasets and addressing domain-specific variability.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Agarwal, M., Mondal, A., & Jawahar, C. V. (2020). CDeC-Net: Composite Deformable Cascade Network for Table Detection in Document Images.

Agrawal, P., Chaudhary, D., Madaan, V., Zabrovskiy, A., Prodan, R., Kimovski, D., & Timmerer, C. (2021). Automated bank cheque verification using image processing and deep learning methods. Multimedia Tools and Applications, 80(4), 5319–5350. https://doi.org/10.1007/s11042-020-09818-1

Bai, Y., Chen, M., Zhou, P., Zhao, T., Lee, J. D., Kakade, S., … Xiong, C. (2021). How Important is the Train-Validation Split in Meta-Learning?

Du, Y., Li, C., Guo, R., Yin, X., Liu, W., Zhou, J., … Wang, H. (2020). PP-OCR: A Practical Ultra Lightweight OCR System.

Hou, Q., & Wang, J. (2025). TABLET: Table Structure Recognition using Encoder-only Transformers.

Kannaopat U. (2025, July 28). Paddle OCR vs Tesseract (OCR Features Comparison). Retrieved September 19, 2025, from https://ironsoftware.com/csharp/ocr/blog/compare-to-other-components/paddle-ocr-vs-tesseract/#trial-license

Li, H., Huang, C., & Gu, L. (2021). Image pattern recognition in identification of financial bills risk management. Neural Computing and Applications, 33(3), 867–876. https://doi.org/10.1007/s00521-020-05261-3

Li, M., Cui, L., Huang, S., Wei, F., Zhou, M., & Li, Z. (2020). TableBank: A Benchmark Dataset for Table Detection and Recognition.

Li, Y., Huang, Z., Yan, J., Zhou, Y., Ye, F., & Liu, X. (2020). GFTE: Graph-based Financial Table Extraction.

Mursari, L. R., & Wibowo, A. (2021). The Effectiveness of Image Preprocessing on Digital Handwritten Scripts Recognition with The Implementation of OCR Tesseract. Computer Engineering and Applications Journal, 10(3), 177–186. https://doi.org/10.18495/comengapp.v10i3.386

Prasad, D., Gadpal, A., Kapadni, K., Visave, M., & Sultanpure, K. (2020). CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents.

Smock, B., Pesala, R., & Abraham, R. (2021). PubTables-1M: Towards comprehensive table extraction from unstructured documents.

Terven, J., & Cordova-Esparza, D. (2024). A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. https://doi.org/10.3390/make5040083

Tian, Y., Ye, Q., & Doermann, D. (2025). YOLOv12: Attention-Centric Real-Time Object Detectors.

Timothy, & Tanuwijaya, E. (2024). Dangerous Objects Detection and Segmentation in X-Ray Images of Passenger Goods Using YOLOV8. 2024 2nd International Conference on Technology Innovation and Its Applications (ICTIIA), 1–6. IEEE. https://doi.org/10.1109/ICTIIA61827.2024.10761162

Trivedi, A., Mukherjee, S., Singh, R. K., Agarwal, V., Ramakrishnan, S., & Bhatt, H. S. (2024). TabSniper: Towards Accurate Table Detection & Structure Recognition for Bank Statements.

Vo-Nguyen, T.-A., Nguyen, P., & Le, H.-S. (2021). An Efficient Method to Extract Data from Bank Statements Based on Image-Based Table Detection. 2021 15th International Conference on Advanced Computing and Applications (ACOMP), 186–190. IEEE. https://doi.org/10.1109/ACOMP53746.2021.00033

Xu, Y., Xu, Y., Lv, T., Cui, L., Wei, F., Wang, G., … Zhou, L. (2022). LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding.

Zhong, X., Tang, J., & Yepes, A. J. (2019). PubLayNet: largest dataset ever for document layout analysis.

Downloads


Crossmark Updates

How to Cite

Kristanto, S. M., & Tanuwijaya, E. (2026). Implementation of YOLOv12 and PaddleOCR for Indonesian Bank Statement Table Extraction. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 10(1), 573-585. https://doi.org/10.33395/sinkron.v10i1.15383