Comparison of Sentiment Analysis Methods on Topic Haram of Music In Youtube


  • Rahmat Saudi Al Fathir As Magister Of Informatics Engineering, Universitas Amikom, Indonesia
  • Ema Utami Magister Of Informatics Engineering, Universitas Amikom, Indonesia
  • Anggit Dwi Hartono Magister Of Informatics Engineering, Universitas Amikom, Indonesia




Analysis Sentiment, Classification, Imbalance Data, Machine Learning, Natural Language Processing, Word Embedding


Sentiment analysis on video lectures on YouTube that discuss the haram of music is an exciting topic to find out public opinion. This study aims to find what factors affect the model's accuracy in sentiment analysis, especially on video lecture content on YouTube. The data used is comment data on three video lectures that discuss the haram of music, which has been labelled into two categories: positive and negative. The data is divided into two categories, namely primary data, as many as 2099 data that have not been normalized, while secondary data has 1001 data that have been normalized. The experiment shows that the validity of the data, labelling the data, the amount of data, and preprocessing are essential points in forming a good sentiment analysis classification model because, from the test results, it was found that imbalance techniques such as SMOTE, word embedding word2Vec and FastText, and SVM and KNN classification algorithms do not provide maximum accuracy if the data used primary data. However, the data imbalance process, such as oversampling and SVM and KNN classification algorithms, will provide better model accuracy if used with secondary data. Based on the trial results, it is found that when using the SVM algorithm, primary data produces the highest accuracy at 58.35%, while secondary data is 72.23%. If using KNN, the primary data provides the highest model accuracy at 53.54%, while the secondary data has the highest accuracy at 72.81%. Based on these results, it was found that the validity of the data or data must be appropriate and related to the case raised and labelling the data must be done carefully because the most crucial is the inappropriate data in preprocessing the data must be done correctly, if data preprocessing is done in an inappropriate way then data imbalance techniques such as oversampling do not have enough influence on increasing accuracy, but if on the contrary then accuracy will increase. The selection of the right word embedding also affects accuracy. It is necessary to do many experiments to select the correct algorithm and follow the data owned because selecting the correct algorithm will provide maximum accuracy model results

