Comparison Classification for Indonesian Twitter Hate Speech and Abusive Detection: Comparison Classification for Indonesian Twitter Hate Speech and Abusive Detection

Ibnu  Mas’ud; Tubagus  Toifur; Aolia  Ikhwanudin; Muhamad  Yusuf; Agianto Syamhalim; Anas  Nasrulloh

doi:10.33050/11sqgk94

Authors

Ibnu Mas’ud UIN Sultan Maulana Hasanuddin Banten
Tubagus Toifur Institut Teknologi Tangerang Selatan
Aolia Ikhwanudin Institut Teknologi Tangerang Selatan
Muhamad Yusuf Institut Teknologi Tangerang Selatan
Agianto Syamhalim Institut Teknologi Tangerang Selatan
Anas Nasrulloh Institut Teknologi Tangerang Selatan

DOI:

https://doi.org/10.33050/11sqgk94

Keywords:

Hate speech, abusive language, multi-label classification, Indonesian Twitter, Random Forest.

Abstract

Hate speech and offensive language on social media, particularly Twitter in Indonesia, have become a serious problem that can threaten the social and psychological stability of users. This study aims to analyze and detect such harmful content using a multi-label classification approach, which is more representative in capturing the complexity of real-world language. The research methodology involves collecting data through the Twitter API, which is then subjected to an intensive preprocessing stage, including data cleaning and text normalization using a slang dictionary. We apply machine learning algorithms such as Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest Decision Tree (RFDT). To handle the multi-label characteristics, Binary Relevance (BR), Label Power-set (LP), and Classifier Chains (CC) transformation techniques are used. The results show that the RFDT algorithm with LP transformation provides the best performance with an accuracy rate of 81.2%. This finding confirms that text normalization and the selection of appropriate label transformation techniques are crucial in improving detection accuracy. The results of this study are expected to provide a foundation for the development of a smarter automated content moderation system for Indonesian-language social media.

Downloads

Download data is not yet available.

Comparison Classification for Indonesian Twitter Hate Speech and Abusive Detection