Application of Two-Stream Late Fusion on EfficientNetV2 based on Transfer Learning to classify AI-generated paintings
DOI:
10.33395/sinkron.v10i2.15814Keywords:
AI-Generated Art, EfficientNetV2, Grad-CAM, Late Fusion, Spatial Rich Models (SRM), Two-Stream NetworkAbstract
The rapid advancement of generative artificial intelligence (AI) has made synthetic digital paintings increasingly difficult to distinguish from human-made artworks, raising concerns regarding authenticity, copyright protection, and digital forensics. The main objective of this research is to develop a reliable and interpretable framework for distinguishing AI-generated paintings from human-created artworks by integrating visual and noise-based features. To address the limitations of conventional single-stream CNN models, this study proposes a Two-Stream Network with a Late Fusion strategy, combining a visual stream based on EfficientNetV2-S and a noise stream based on Xception with Spatial Rich Models (SRM).The proposed architecture processes semantic visual features and residual noise characteristics independently, followed by weighted decision-level fusion with a ratio of 0.7:0.3. Experiments were conducted using the AI-Artwork public dataset from Kaggle, consisting of 15,000 images with a data split of 64% training, 16% validation, and 20% testing. Model performance was evaluated using accuracy, precision, recall, F1-score, and ROC-AUC, ensuring a comprehensive assessment beyond accuracy alone. The results demonstrate that the proposed method achieves 98% accuracy, 98% precision, a 99% F1-score, and high discriminative capability compared to single-stream baselines. Model interpretability was analyzed using Grad-CAM to examine the contribution of each stream. Despite promising results, this study is limited by evaluation on a single dataset and static fusion weights, which may affect generalization to unseen generative models. Future work includes cross-dataset evaluation, adaptive fusion strategies, and exploration of lightweight architectures. Practically, this approach has potential applications in digital art authentication, forensic analysis, and content moderation systems, as well as supporting emerging policies for AI-generated content regulation and copyright protection.
Downloads
References
Adventino Gulo, S., Amelia Pertiwi, A., Putri Syaifullah Nasution, S., & Syahputra, H. (2025). Deteksi Deepfake Dalam Citra Menggunakan Convolutional Neural Network (Cnn). JATI (Jurnal Mahasiswa Teknik Informatika), 9(5), 8655–8660. https://doi.org/10.36040/jati.v9i5.14896
Akbar, M. H. (2025). Forensik Citra Digital Berbasis XceptionNet dengan Kerangka Kerja DFRWS untuk Deteksi Deepfake. xx(xx), 221–228.
Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M. A., Al-Amidie, M., & Farhan, L. (2021). Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. In Journal of Big Data (Vol. 8, Issue 1). Springer International Publishing. https://doi.org/10.1186/s40537-021-00444-8
Anggraini, D., Handayaningrum, W., Rahayu, E. W., Suryandoko, W., & Sabri, I. (2024). Kolaborasi seniman dan kecerdasan buatan (AI) dalam membangkitkan gelombang kreativitas di era revolusi seni digital. Imaji: Jurnal Seni Dan Pendidikan Seni, 22(2), 111–119. https://doi.org/10.21831/imaji.v22i2.69734
Aris, S., Aeini, B., & Nosrati, S. (2023). A Digital Aesthetics? Artificial Intelligence and the Future of the Art. Journal of Cyberspace Studies, 7(2), 219–236. https://doi.org/10.22059/JCSS.2023.366256.1097
Atrey, P. K., Hossain, M. A., El Saddik, A., & Kankanhalli, M. S. (2010). Multimodal fusion for multimedia analysis: A survey. In Multimedia Systems (Vol. 16, Issue 6). https://doi.org/10.1007/s00530-010-0182-0
Bianco, T., Castellano, G., Scaringi, R., & Vessio, G. (2023). Identifying AI-Generated Art with Deep Learning. October.
Castellano, G., Grazia Miccoli, M., Scaringi, R., Vessio, G., & Zaza, G. (2024). Using LLMs to explain AI-generated art classification via Grad-CAM heatmaps. CEUR Workshop Proceedings, 3839, 65–74.
Cetinic, E., Lipic, T., & Grgic, S. (2018). Fine-tuning Convolutional Neural Networks for fine art classification. Expert Systems with Applications, 114, 107–118. https://doi.org/10.1016/j.eswa.2018.07.026
Fridrich, J., & Kodovský, J. (2012). Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security, 7(3), 868–882. https://doi.org/10.1109/TIFS.2012.2190402 He, K. (2015). Deep Residual Learning for Image Recognition.
Kuncheva, L. I. (2014). Combining pattern classifiers: Methods and algorithms (2nd ed.). Wiley.
Li, M., & Stamp, M. (2025). Detecting AI-generated Artwork. Detecting AI-generated Artwork. arXiv preprint arXiv:2504.07078. https://arxiv.org/abs/2504.07078
Loshchilov, I., & Hutter, F. (2017). SGDR: Stochastic gradient descent with warm restarts. Proceedings of the International Conference on Learning Representations (ICLR). Mahara, A., & Rishe, N. (2025). Methods and Trends in Detecting AI-Generated Images : A Comprehensive Review ⋆. 3552, 1–35.
Morariu, V. I., & Davis, L. S. (n.d.). Two-Stream Neural Networks for Tampered Face Detection.
Nasir, A., & Tariq, Z. A. (2024). Hybrid Deep Learning EfficientNetV2 and Vision Transformer ( EffNetV2-ViT ) Model for Breast Cancer Histopathological Image Classification. IEEE Access, 12(October), 184119–184131. https://doi.org/10.1109/ACCESS.2024.3503413
Simonyan, K. (n.d.). Two-Stream Convolutional Networks for Action Recognition in Videos arXiv : 1406 . 2199v2 [ cs . CV ] 12 Nov 2014. 1–11.
Subkhi, M. B., Setiawan, A. B., & Candra, M. Y. A. (2023). Klasifikasi Gambar: Membedakan Lukisan Buatan Manusia dan AI dengan CNN. Paradigma: Jurnal Filsafat, Sains, Teknologi, Dan Sosial Budaya, 29(4), 149–155.
Tan, M., & Le, Q. V. (2021). EfficientNetV2 : Smaller Models and Faster Training.
Vivekananda, G. N., Mahesh, T. R., Gupta, M., Thakur, A., & Sayal, A. (2025). Refining digital security with EfficientNetV2-B2 deepfake detection techniques. Egyptian Informatics Journal, 30(February), 100699. https://doi.org/10.1016/j.eij.2025.100699
Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks ? 27.
Zhang, Y., Pang, Z., Huang, S., Wang, C., & Zhou, X. (2025). Unmasking AI-created visual content: a review of generated images and deepfake detection technologies. Journal of King Saud University - Computer and Information Sciences, 37(6). https://doi.org/10.1007/s44443-025-00154-8
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2026 Muhammad Kevin Rinaldi, Ernawati , Desi Andreswari , Julia Purnama Sari

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






















Moraref
PKP Index
Indonesia OneSearch
OCLC Worldcat
Index Copernicus
Scilit
