Paraphrase Generation For Reading Comprehension

Authors

  • Faishal Januarahman School of Computing, Telkom University, Indonesia
  • Ade Romadhony School of Computing, Telkom University, Indonesia

DOI:

10.33395/sinkron.v8i4.12873

Keywords:

BLEU, Human Evaluation, Paraphrase Generation, ROUGE, Reading Comprehension, Thesaurus

Abstract

Reading comprehension is an assessment that tests readers understanding of a concept from the given text. The testing process is conducted by providing questions related to the content within the context of the text. The purpose of this research is to create new question variations from existing questions, and one of the methods to achieve this is by paraphrasing questions through the task of paraphrase generation. This can help ensure that readers have fully grasped a concept of a text. This study employs a traditional approach known as the thesaurus-based approach, in which the process involves substituting synonyms using the Indonesian Thesaurus dictionary. The data used consists of a list of Indonesian language reading comprehension assessment questions ranging from elementary to high school levels. To measure the quality of the generated paraphrased questions, two evaluation processes are conducted which are automatic evaluation with the scores ranging from 0-1 and human evaluation with score ranging from 1-4. The automatic evaluation includes the BLEU-4 metric, resulting in a score of 0.044, and the ROUGE-L metric, resulting an F1-score of 0.421. As for human evaluation, the obtained relevancy score is 2.533, and the fluency score is 3.186. The results from both evaluation metrics indicate that the generated paraphrased questions exhibit diverse new word choices but tend to have slightly different meanings compared to the reference questions.

GS Cited Analysis

Downloads

Download data is not yet available.

References

Soemantri, A. S. (2011). READING COMPREHENSION PROBLEMS ENCOUNTED BY THE STUDENTS OF HIGHER EDUCATION. JURNAL COMPUTECH & BISNIS, 5(2), 74–80. Retrieved from https://jurnal.stmik-mi.ac.id/index.php/jcb/article/download/69/64

Rathod, M., Tu, T., & Stasaski, K. (2022). Educational Multi-Question Generation for Reading Comprehension. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022). https://doi.org/10.18653/v1/2022.bea-1.26

Bolshakov, I. A., & Gelbukh, A. (2004). Synonymous paraphrasing using WordNet and internet. In Lecture Notes in Computer Science (pp. 312–323). https://doi.org/10.1007/978-3-540-27779-8_27

Zhou, J., & Bhat, S. (2021). Paraphrase Generation: a survey of the state of the art. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/2021.emnlp-main.414

Prakash, A., Hasan, S. A., Lee, K., Datla, V. V., Qadir, A., Liu, J., & Farri, O. (2016). Neural Paraphrase Generation with Stacked Residual LSTM Networks. International Conference on Computational Linguistics, 2923–2934. Retrieved from https://www.aclweb.org/anthology/C16-1275.pdf

Lin, Z., & Wan, X. (2021). Pushing Paraphrase Away from Original Sentence: A Multi-Round Paraphrase Generation Approach. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. https://doi.org/10.18653/v1/2021.findings-acl.135

Barmawi, A. M., & Muhammad, A. (2019). Paraphrasing method based on contextual synonym substitution. Journal of ICT Research and Applications, 13(3), 257. https://doi.org/10.5614/itbj.ict.res.appl.2019.13.3.6

Gadag, A., & Sagar, B. M. (2016). N-gram based paraphrase generator from large text document. International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS). https://doi.org/10.1109/csitss.2016.7779447

Sugono, D., Sugiyono, Maryani, Y., Meity, D., Qodratillah, T., Budiwiyanto, A., Puspita D., Amalia D., Santoso, T. (2008). Tesaurus Bahasa Indonesia Pusat Bahasa. Departemen Pendidikan Nasional Indonesia.

Dinakaramani, A., Fam, R., Luthfi, A., & Manurung, R. (2014). Designing an Indonesian part of speech tagset and manually tagged Indonesian corpus. 2014 International Conference on Asian Language Processing (IALP). https://doi.org/10.1109/ialp.2014.6973519

Dong, L., Mallinson, J., Reddy, S., & Lapata, M. (2017). Learning to Paraphrase for Question Answering. EMNLP 2017. https://doi.org/10.18653/v1/d17-1091

Thompson, B. J., & Post, M. (2020). Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). https://doi.org/10.18653/v1/2020.emnlp-main.8

Cao, R., Zhu, S., Yang, C., Liu, C., Ma, R., Zhao, Y., . . . Yu, K. (2020). Unsupervised Dual Paraphrasing for Two-stage Semantic Parsing. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.608

Quirk, C., Brockett, C., & Dolan, W. B. (2004). Monolingual machine translation for paraphrase generation. Empirical Methods in Natural Language Processing, 142–149. Retrieved from https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/paraphrase_emnlp_2004_corrected.pdf

Zhao, S., Lan, X., Liu, T., & Li, S. (2009). Application-driven statistical paraphrase generation. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. https://doi.org/10.3115/1690219.1690263

Papineni, K., Roukos, S., Ward, T. J., & Zhu, W. (2002). Bleu: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. https://doi.org/10.3115/1073083.1073135

Lin, C. (2004). ROUGE: a package for automatic evaluation of summaries. Meeting of the Association for Computational Linguistics, 74–81. Retrieved from http://anthology.aclweb.org/W/W04/W04-1013.pdf

Downloads


Crossmark Updates

How to Cite

Januarahman, F., & Romadhony, A. (2023). Paraphrase Generation For Reading Comprehension. Sinkron : Jurnal Dan Penelitian Teknik Informatika, 8(4), 2018-2026. https://doi.org/10.33395/sinkron.v8i4.12873