Comparison Classification for Indonesian Twitter Hate Speech and Abusive Detection
Comparison Classification for Indonesian Twitter Hate Speech and Abusive Detection
DOI:
https://doi.org/10.33050/11sqgk94Keywords:
Hate speech, abusive language, multi-label classification, Indonesian Twitter, Random Forest.Abstract
Hate speech and offensive language on social media, particularly Twitter in Indonesia, have become a serious problem that can threaten the social and psychological stability of users. This study aims to analyze and detect such harmful content using a multi-label classification approach, which is more representative in capturing the complexity of real-world language. The research methodology involves collecting data through the Twitter API, which is then subjected to an intensive preprocessing stage, including data cleaning and text normalization using a slang dictionary. We apply machine learning algorithms such as Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest Decision Tree (RFDT). To handle the multi-label characteristics, Binary Relevance (BR), Label Power-set (LP), and Classifier Chains (CC) transformation techniques are used. The results show that the RFDT algorithm with LP transformation provides the best performance with an accuracy rate of 81.2%. This finding confirms that text normalization and the selection of appropriate label transformation techniques are crucial in improving detection accuracy. The results of this study are expected to provide a foundation for the development of a smarter automated content moderation system for Indonesian-language social media.
