Efficient CNN-Based Classification of SARS-CoV-2 Spike Gene Sequences Using Alignment-Free Encoding

Authors

  • Rengga Anggarah Informatics Study Program, Faculty of Engineering, University of Bengkulu, Bengkulu, Indonesia
  • Ernawati Informatics Study Program, Faculty of Engineering, University of Bengkulu, Bengkulu, Indonesia
  • Widhia KZ Oktoeberza Informatics Study Program, Faculty of Engineering, University of Bengkulu, Bengkulu, Indonesia

DOI:

10.33395/sinkron.v10i1.15691

Keywords:

Convolutional Neural Network , Deep learning, Genom, Klasifikasi Varian, SARS-CoV-2

Abstract

The COVID-19 pandemic caused by SARS-CoV-2 continues to challenge the global health system through the emergence of various variants with genetic characteristics that affect vaccine transmission and effectiveness. Conventional identification methods such as Whole-Genome Sequencing (WGS) have high accuracy but are constrained by significant cost and time. Most classification studies today still rely on complex hybrid architectures such as CNN-LSTM or image-based representations that increase computational load. This study aims to develop  an  efficient and lightweight pure Convolutional Neural Network model based on alignment-free encoding to classify five Variant of Concern (VOC) variants of SARS-CoV-2 (Alpha, Beta, Delta, Gamma, and Omicron) with an exclusive focus on the Spike gene sequence. The dataset consists of 5,000 Spike gene sequences that are represented using integer encoding and standardized with zero-padding. CNN  proposed Lightweight architecture  consists of four 1D convolution layers with a total of approximately 1.6 million parameters. The test results show that the model achieves excellent performance with an overall accuracy of 98.93%. The precision, recall, and F1-score values averaged 0.99, while the analysis of the ROC curve showed AUC values above 0.99 for all variants. This approach has proven to be efficient and effective, offering a fast, scalable, and resource-efficient solution to support real-time genomic surveillance systems in future pandemic mitigation.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Andre, M., Lau, L. S., Pokharel, M. D., Ramelow, J., Owens, F., Souchak, J., Akkaoui, J., Ales, E., Brown, H., Shil, R., Nazaire, V., Manevski, M., Paul, N. P., Esteban-Lopez, M., Ceyhan, Y., & El-Hage, N. (2023). From Alpha to Omicron: How Different Variants of Concern of the SARS-Coronavirus-2 Impacted the World. Biology, 12(9). https://doi.org/10.3390/biology12091267

Awe, O. I., obura, hesborn omwandho, Mwanga, M. J., & Evans, M. (2023). Enhanced Deep Convolutional Neural Network for SARS-CoV-2 Variants Classification. BioRxiv, 2023–2028.

Awe, O. I., Obura, H., Ssemuyiga, C., Mudibo, E., & Mwanga, M. J. (2025). Enhanced deep Convolutional Neural Network for SARS-CoV-2 variants classification. September, 1–16. https://doi.org/10.3389/frai.2025.1512003

Azevedo, K. S., de Souza, L. C., Coutinho, M. G. F., de M. Barbosa, R., & Fernandes, M. A. C. (2024). Deepvirusclassifier: a deep learning tool for classifying SARS-CoV-2 based on viral subtypes within the coronaviridae family. BMC Bioinformatics, 25(1), 1–21. https://doi.org/10.1186/s12859-024-05754-1

Bezerra, G., Câmara, M., Prof, O., Augusto, M., & Fernandes, C. (2024). Advanced Convolutional Neural Network Techniques for Classification of SARS-CoV-2 Variants and Other Viruses : A Study Using k -mers and Chaos Game Representation.

Câmara, G. B. M., Coutinho, M. G. F., Silva, L. M. D. d., Gadelha, W. V. d. N., Torquato, M. F., Barbosa, R. de M., & Fernandes, M. A. C. (2022). Convolutional Neural Network Applied to SARS-CoV-2 Sequence Classification. Sensors, 22(15), 1–15. https://doi.org/10.3390/s22155730

Chourasia, P., Murad, T., Tayebi, Z., Ali, S., Khan, I. U., & Patterson, M. (2024). Efficient Classification of SARS-CoV-2 Spike Sequences Using Federated Learning. Communications in Computer and Information Science, 2142 CCIS, 80–96. https://doi.org/10.1007/978-3-031-63616-5_6

Coutinho, M. G. F., Câmara, G. B. M., Barbosa, R. de M., & Fernandes, M. A. C. (2023). SARS-CoV-2 virus classification based on stacked sparse autoencoder. Computational and Structural Biotechnology Journal, 21, 284–298. https://doi.org/10.1016/j.csbj.2022.12.007

de Souza, L. C., Azevedo, K. S., de Souza, J. G., Barbosa, R. de M., & Fernandes, M. A. C. (2023). New proposal of viral genome representation applied in the classification of SARS-CoV-2 with deep learning. BMC Bioinformatics, 24(1), 1–19. https://doi.org/10.1186/s12859-023-05188-1

Gadelha, W. V. N., Torquato, M. F., & Barbosa, R. D. M. (2022). Sequence Classification. 1–15.

Guerrero-Tamayo, A., Sanz Urquijo, B., Olivares, I., Moragues Tosantos, M. D., Casado, C., & Pastor-López, I. (2024). Classification of SARS-CoV-2 sequences as recombinants via a pre-trained CNN and identification of a mathematical signature relative to recombinant feature at Spike, via interpretability. PLoS ONE, 19(8), 1–27. https://doi.org/10.1371/journal.pone.0309391

Harikrishnan, N. B., Pranay, S. Y., & Nagaraj, N. (2022). Classification of SARS-CoV-2 viral genome sequences using Neurochaos Learning. Medical and Biological Engineering and Computing, 60(8), 2245–2255. https://doi.org/10.1007/s11517-022-02591-3

Hatami, P., Annan, R., Miranda, L. U., Gorman, J., Xie, M., Qingge, L., & Agricultural, N. C. (n.d.). 1,5* 1.

Kingma, D. P., & Ba, J. L. (2015). A : a m s o. 1–15.

Li, Z., Liu, F., Yang, W., Peng, S., & Zhou, J. (2022). A Survey of Convolutional Neural Network s: Analysis, Applications, and Prospects. IEEE Transactions on Neural Networks and Learning Systems, 33(12), 6999–7019. https://doi.org/10.1109/TNNLS.2021.3084827

Nerkar, V., & Kimbahune, V. (2024). Deep learning Approaches in Genomic Analysis : A Review of DNA Sequence Classification Techniques. 10(2), 439–445.

Nguyen, N. G., Tran, V. A., Ngo, D. L., & Phan, D. (2016). DNA Sequence Classification by Convolutional Neural Network . April, 280–286.

Ullah, W., Ullah, A., Malik, K. M., Saudagar, A. K. J., Khan, M. B., Hasanat, M. H. A., AlTameem, A., & AlKhathami, M. (2022). Multi-Stage Temporal Convolution Network for COVID-19 Variant Classification. Diagnostics, 12(11), 1–12. https://doi.org/10.3390/diagnostics12112736

Walz, W. (2023). Machine learning for Brain Disorders Series Editor.

Wang, H., Tsinda, E. K., Dunn, A. J., Chikweto, F., Ahmed, N., Pelosi, E., & Zemkoho, A. B. (2022). Deep learning forward and reverse primer design to detect SARS-CoV-2 emerging variants. http://arxiv.org/abs/2209.13591

Whata, A., & Chimedza, C. (2021). Deep learning for SARS COV-2 Genome Sequences. IEEE Access, 9, 59597–59611. https://doi.org/10.1109/ACCESS.2021.3073728

Zhao, X., Wang, L., Zhang, Y., Han, X., Deveci, M., & Parmar, M. (2024). A review of Convolutional Neural Network s in computer vision. In Artificial Intelligence Review (Vol. 57, Issue 4). Springer Netherlands. https://doi.org/10.1007/s10462-024-10721-6

Downloads


Crossmark Updates

How to Cite

Anggarah, R. ., Ernawati, E., & Oktoeberza, W. K. . (2026). Efficient CNN-Based Classification of SARS-CoV-2 Spike Gene Sequences Using Alignment-Free Encoding. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 10(1), 361-372. https://doi.org/10.33395/sinkron.v10i1.15691