Application of Two-Stream Late Fusion on EfficientNetV2 based on Transfer Learning to classify AI-generated paintings

Authors

  • Muhammad Kevin Rinaldi Informatics Study Program, Faculty of Engineering, University of Bengkulu, Bengkulu, Indonesia
  • Ernawati Informatics Study Program, Faculty of Engineering, University of Bengkulu, Bengkulu, Indonesia
  • Desi Andreswari Informatics Study Program, Faculty of Engineering, University of Bengkulu, Bengkulu, Indonesia
  • Julia Purnama Sari Informatics Study Program, Faculty of Engineering, University of Bengkulu, Bengkulu, Indonesia

DOI:

10.33395/sinkron.v10i2.15814

Keywords:

AI-Generated Art, EfficientNetV2, Grad-CAM, Late Fusion, Spatial Rich Models (SRM), Two-Stream Network

Abstract

The rapid advancement of generative artificial intelligence (AI) has made synthetic digital paintings increasingly difficult to distinguish from human-made artworks, raising concerns regarding authenticity, copyright protection, and digital forensics. The main objective of this research is to develop a reliable and interpretable framework for distinguishing AI-generated paintings from human-created artworks by integrating visual and noise-based features. To address the limitations of conventional single-stream CNN models, this study proposes a Two-Stream Network with a Late Fusion strategy, combining a visual stream based on EfficientNetV2-S and a noise stream based on Xception with Spatial Rich Models (SRM).The proposed architecture processes semantic visual features and residual noise characteristics independently, followed by weighted decision-level fusion with a ratio of 0.7:0.3. Experiments were conducted using the AI-Artwork public dataset from Kaggle, consisting of 15,000 images with a data split of 64% training, 16% validation, and 20% testing. Model performance was evaluated using accuracy, precision, recall, F1-score, and ROC-AUC, ensuring a comprehensive assessment beyond accuracy alone. The results demonstrate that the proposed method achieves 98% accuracy, 98% precision, a 99% F1-score, and high discriminative capability compared to single-stream baselines. Model interpretability was analyzed using Grad-CAM to examine the contribution of each stream. Despite promising results, this study is limited by evaluation on a single dataset and static fusion weights, which may affect generalization to unseen generative models. Future work includes cross-dataset evaluation, adaptive fusion strategies, and exploration of lightweight architectures. Practically, this approach has potential applications in digital art authentication, forensic analysis, and content moderation systems, as well as supporting emerging policies for AI-generated content regulation and copyright protection.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Adventino Gulo, S., Amelia Pertiwi, A., Putri Syaifullah Nasution, S., & Syahputra, H. (2025). Deteksi Deepfake Dalam Citra Menggunakan Convolutional Neural Network (Cnn). JATI (Jurnal Mahasiswa Teknik Informatika), 9(5), 8655–8660. https://doi.org/10.36040/jati.v9i5.14896

Akbar, M. H. (2025). Forensik Citra Digital Berbasis XceptionNet dengan Kerangka Kerja DFRWS untuk Deteksi Deepfake. xx(xx), 221–228.

Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M. A., Al-Amidie, M., & Farhan, L. (2021). Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. In Journal of Big Data (Vol. 8, Issue 1). Springer International Publishing. https://doi.org/10.1186/s40537-021-00444-8

Anggraini, D., Handayaningrum, W., Rahayu, E. W., Suryandoko, W., & Sabri, I. (2024). Kolaborasi seniman dan kecerdasan buatan (AI) dalam membangkitkan gelombang kreativitas di era revolusi seni digital. Imaji: Jurnal Seni Dan Pendidikan Seni, 22(2), 111–119. https://doi.org/10.21831/imaji.v22i2.69734

Aris, S., Aeini, B., & Nosrati, S. (2023). A Digital Aesthetics? Artificial Intelligence and the Future of the Art. Journal of Cyberspace Studies, 7(2), 219–236. https://doi.org/10.22059/JCSS.2023.366256.1097

Atrey, P. K., Hossain, M. A., El Saddik, A., & Kankanhalli, M. S. (2010). Multimodal fusion for multimedia analysis: A survey. In Multimedia Systems (Vol. 16, Issue 6). https://doi.org/10.1007/s00530-010-0182-0

Bianco, T., Castellano, G., Scaringi, R., & Vessio, G. (2023). Identifying AI-Generated Art with Deep Learning. October.

Castellano, G., Grazia Miccoli, M., Scaringi, R., Vessio, G., & Zaza, G. (2024). Using LLMs to explain AI-generated art classification via Grad-CAM heatmaps. CEUR Workshop Proceedings, 3839, 65–74.

Cetinic, E., Lipic, T., & Grgic, S. (2018). Fine-tuning Convolutional Neural Networks for fine art classification. Expert Systems with Applications, 114, 107–118. https://doi.org/10.1016/j.eswa.2018.07.026

Fridrich, J., & Kodovský, J. (2012). Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security, 7(3), 868–882. https://doi.org/10.1109/TIFS.2012.2190402 He, K. (2015). Deep Residual Learning for Image Recognition.

Kuncheva, L. I. (2014). Combining pattern classifiers: Methods and algorithms (2nd ed.). Wiley.

Li, M., & Stamp, M. (2025). Detecting AI-generated Artwork. Detecting AI-generated Artwork. arXiv preprint arXiv:2504.07078. https://arxiv.org/abs/2504.07078

Loshchilov, I., & Hutter, F. (2017). SGDR: Stochastic gradient descent with warm restarts. Proceedings of the International Conference on Learning Representations (ICLR). Mahara, A., & Rishe, N. (2025). Methods and Trends in Detecting AI-Generated Images : A Comprehensive Review ⋆. 3552, 1–35.

Morariu, V. I., & Davis, L. S. (n.d.). Two-Stream Neural Networks for Tampered Face Detection.

Nasir, A., & Tariq, Z. A. (2024). Hybrid Deep Learning EfficientNetV2 and Vision Transformer ( EffNetV2-ViT ) Model for Breast Cancer Histopathological Image Classification. IEEE Access, 12(October), 184119–184131. https://doi.org/10.1109/ACCESS.2024.3503413

Simonyan, K. (n.d.). Two-Stream Convolutional Networks for Action Recognition in Videos arXiv : 1406 . 2199v2 [ cs . CV ] 12 Nov 2014. 1–11.

Subkhi, M. B., Setiawan, A. B., & Candra, M. Y. A. (2023). Klasifikasi Gambar: Membedakan Lukisan Buatan Manusia dan AI dengan CNN. Paradigma: Jurnal Filsafat, Sains, Teknologi, Dan Sosial Budaya, 29(4), 149–155.

Tan, M., & Le, Q. V. (2021). EfficientNetV2 : Smaller Models and Faster Training.

Vivekananda, G. N., Mahesh, T. R., Gupta, M., Thakur, A., & Sayal, A. (2025). Refining digital security with EfficientNetV2-B2 deepfake detection techniques. Egyptian Informatics Journal, 30(February), 100699. https://doi.org/10.1016/j.eij.2025.100699

Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks ? 27.

Zhang, Y., Pang, Z., Huang, S., Wang, C., & Zhou, X. (2025). Unmasking AI-created visual content: a review of generated images and deepfake detection technologies. Journal of King Saud University - Computer and Information Sciences, 37(6). https://doi.org/10.1007/s44443-025-00154-8

Downloads


Crossmark Updates

How to Cite

Rinaldi, M. K., Ernawati , E. ., Andreswari , D. ., & Sari , J. P. . (2026). Application of Two-Stream Late Fusion on EfficientNetV2 based on Transfer Learning to classify AI-generated paintings. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 10(2), 976-990. https://doi.org/10.33395/sinkron.v10i2.15814