Implementation of Semantic Search in an Academic Repository Using Sentence-BERT and FAISS
DOI:
10.33395/sinkron.v10i2.15940Keywords:
Academic Repository;, Semantic Search;, Sentence-BERT;, FAISS;, Information RetrievalAbstract
Academic repositories serve as centralized platforms for storing and managing scientific documents, including research papers, reports, and administrative records. Yet, traditional keyword-based search systems often struggle to deliver relevant results. These systems typically fail to capture the contextual meaning of user queries, which leads to mismatches when the query terms differ from those found in the documents. To overcome this limitation, this study introduces a semantic search approach for academic repositories by combining Sentence-BERT as the text embedding model with FAISS as the vector-based similarity search engine. In the proposed system, documents stored in a MySQL database are first preprocessed to remove HTML tags, then converted into semantic vector representations using Sentence-BERT. These vectors are indexed with FAISS, enabling fast and efficient similarity searches compared to conventional keyword matching. The system architecture integrates FastAPI as the backend service for indexing, searching, and evaluation, while CodeIgniter 4 functions as the frontend framework for document management by administrators and end users. Evaluation was carried out using three test sets, each containing ten queries. Performance was measured using Recall@K, normalized Discounted Cumulative Gain (nDCG), Mean Reciprocal Rank (MRR), Mean Average Precision (MAP), and search latency. Experimental results show that the system achieved an average Recall@K of 0.61, a MAP of 0.39, and a No-Hit rate of 0.033, meaning all queries successfully retrieved results. Although the nDCG value declined in the third test set, the system consistently returned relevant documents.
Downloads
References
Acharya, S., Sornalakshmi, K., Paul, B., & Singh, A. (2022). Question Answering System using NLP and BERT. 3rd International Conference on Smart Electronics and Communication, ICOSEC 2022 - Proceedings, 925–929. https://doi.org/10.1109/ICOSEC54921.2022.9952050
Amur, Z. H., Kwang Hooi, Y., Bhanbhro, H., Dahri, K., & Soomro, G. M. (2023). Short-Text Semantic Similarity (STSS): Techniques, Challenges and Future Perspectives. Applied Sciences (Switzerland), 13(6), 3911. https://doi.org/10.3390/app13063911
Diana, D., & Ekasari, M. H. (2021). Manajemen Tata Kelola Sistem Informasi Dokumentasi Surat Bagian Administrasi Umum Perguruan Tinggi. Jurnal Ilmiah Komputasi, 20(1), 109–116. https://doi.org/10.32409/jikstik.20.1.2702
Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazaré, P. E., Lomeli, M., Hosseini, L., & Jégou, H. (2025). the Faiss Library. IEEE Transactions on Big Data. https://doi.org/10.1109/TBDATA.2025.3618474
Gao, L., Dai, Z., Chen, T., Fan, Z., Van Durme, B., & Callan, J. (2021). Complement Lexical Retrieval Model with Semantic Residual Embeddings. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 12656 LNCS, 146–160. https://doi.org/10.1007/978-3-030-72113-8_10
Gardazi, N. M., Daud, A., Malik, M. K., Bukhari, A., Alsahfi, T., & Alshemaimri, B. (2025). BERT applications in natural language processing: a review. Artificial Intelligence Review, 58(6), 1–49. https://doi.org/10.1007/s10462-025-11162-5
Ghali, M.-K., Farrag, A., Won, D., & Jin, Y. (2025). Enhancing knowledge retrieval with in-context learning and semantic search through generative AI. Knowledge-Based Systems, 311, 113047.
Heriani, A. P. S., Wahyudi, I., & Marsehan, A. (2025). Aplikasi Mobile untuk Meningkatkan Efisiensi Administrasi Kampus Universitas PGRI Silampari. Sudo Jurnal Teknik Informatika, 4(2), 64–74. https://doi.org/10.56211/sudo.v4i2.854
Kadang, M., & Nasaruddin, N. (2025). Desain dan Implementasi Sistem Repositori Dokumen Akademik Universitas DIPA Makassar. E-Jurnal JUSITI (Jurnal Sistem Informasi Dan Teknologi Informasi), 14(1), 13–25. https://doi.org/10.36774/jusiti.v14i1.1712
Karri, N., & Jangam, S. K. (2024). Semantic Search with AI Vector Search. International Journal of AI, BigData, Computational and Management Studies, 5(2), 141–150. https://doi.org/10.63282/3050-9416.ijaibdcms-v5i2p114
Khan, M. Q., Shahid, A., Uddin, M. I., Roman, M., Alharbi, A., Alosaimi, W., Almalki, J., & Alshahrani, S. M. (2022). Impact analysis of keyword extraction using contextual word embedding. PeerJ Computer Science, 8, e967. https://doi.org/10.7717/peerj-cs.967
Krisnawati, L. D., Mahastama, A. W., Haw, S. C., Ng, K. W., & Naveen, P. (2024). Indonesian-English Textual Similarity Detection Using Universal Sentence Encoder (USE) and Facebook AI Similarity Search (FAISS). CommIT Journal, 18(2), 183–195. https://doi.org/10.21512/commit.v18i2.11274
Kulkarni, H., MacAvaney, S., Goharian, N., & Frieder, O. (2023). Lexically-Accelerated Dense Retrieval. SIGIR 2023 - Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, 152–162. https://doi.org/10.1145/3539618.3591715
Naqvi, S. M. R., Ghufran, M., Varnier, C., Nicod, J. M., Javed, K., & Zerhouni, N. (2024). Unlocking maintenance insights in industrial text through semantic search. Computers in Industry, 157–158, 104083. https://doi.org/10.1016/j.compind.2024.104083
Patel, Y., Tolias, G., & Matas, J. (2022). Recall@k Surrogate Loss with Large Batches and Similarity Mixup. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022-June, 7492–7501. https://doi.org/10.1109/CVPR52688.2022.00735
Safira, F. (2021). Kebijakan Open Access Repositori Institusi di Perpustakaan Perguruan Tinggi: Kajian Best Practice Studi Literature. Pustakaloka, 13(1), 116–136. https://doi.org/10.21154/pustakaloka.v13i1.2457
Santander-Cruz, Y., Salazar-Colores, S., Paredes-García, W. J., Guendulain-Arenas, H., & Tovar-Arriaga, S. (2022). Semantic Feature Extraction Using SBERT for Dementia Detection. Brain Sciences, 12(2), 270. https://doi.org/10.3390/brainsci12020270
Tupan, T., & Rahayu, R. N. (2022). Narrative review: faktor-faktor yang berpengaruh terhadap pertumbuhan repositori akses terbuka (open access repositories) di Indonesia. Al-Kuttab : Jurnal Kajian Perpustakaan, Informasi Dan Kearsipan, 4(1), 18–28. http://103.189.235.125/index.php/Kuttab/article/view/4992
Wang, J., Huang, J. X., Tu, X., Wang, J., Huang, A. J., Laskar, M. T. R., & Bhuiyan, A. (2024). Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges. ACM Computing Surveys, 56(7), 1–33. https://doi.org/10.1145/3648471
Wang, J., Zeng, J., & Sheng, J. (2024). Enhancing and Accelerating Image-Text Retrieval with Knowledge Graphs and FAISS. 2024 IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 1–6.
Xing, L. (2024). Secure Official Document Management and intelligent Information Retrieval System based on recommendation algorithm. International Journal of Intelligent Networks, 5, 110–119. https://doi.org/10.1016/j.ijin.2024.02.003
Xiong, H., Bian, J., Li, Y., Li, X., Du, M., Wang, S., Yin, D., & Helal, S. (2024). When Search Engine Services Meet Large Language Models: Visions and Challenges. IEEE Transactions on Services Computing, 17(6), 4558–4577. https://doi.org/10.1109/TSC.2024.3451185
Xu, S., Zhang, C., & Hong, D. (2022). BERT-based NLP techniques for classification and severity modeling in basic warranty data study. Insurance: Mathematics and Economics, 107, 57–67. https://doi.org/10.1016/j.insmatheco.2022.07.013
Yang, W., Chen, J., Zhang, S., Wu, P., Sun, Y., Feng, Y., Chen, C., & Wang, C. (2025). Breaking the Top- K Barrier: Advancing Top- K Ranking Metrics Optimization in Recommender Systems . Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2, 3542–3552. https://doi.org/10.1145/3711896.3736866
Zhu, P., Lang, Q., & Liu, X. (2023). Word Embedding of Dimensionality Reduction for Document Clustering. Proceedings of the 35th Chinese Control and Decision Conference, CCDC 2023, 4371–4376. https://doi.org/10.1109/CCDC58219.2023.10327354
Zoupanos, S., Kolovos, S., Kanavos, A., Papadimitriou, O., & Maragoudakis, M. (2022). Efficient comparison of sentence embeddings. ACM International Conference Proceeding Series, 1–6. https://doi.org/10.1145/3549737.3549752
Downloads
How to Cite
Issue
Section
License
Copyright (c) 2026 Ihsan Lubis, Husni Lubis, Inaya Nur Wahidah

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.






















Moraref
PKP Index
Indonesia OneSearch
OCLC Worldcat
Index Copernicus
Scilit
