During the recent years, online users, particularly in social networks, have witnessed an upsurge in racism, sexism, and other types of aggressive and cyberbully content, which are often manifested through offensive, abusive, or hateful speech and harassment. This can lead to severe physical and psychological stress in young children and adolescents, leading to even suicides and negatively affecting social policies. Therefore, there is a significant need to identify and regulate harassing content posted on the Internet in a smart, automated, and accurate manner. With this aim, in this paper, we design and develop a hierarchical framework comprising machine learning algorithms in order of higher computational complexity to adaptatively switch among them for efficiently detecting hateful and abusive content. We combine simple machine learning models such as Naive Bayes/Logistic Regression classifiers with customized calibration and Expectation-Maximization (EM) algorithms, and compare them with the much stronger deep learning techniques. Our proposed hierarchical framework demonstrates a significant improvement of the automated detection of abusive contents in social networks with a relatively small twitter dataset in contrast with the deep learning-based counterpart, namely the Bidirectional Encoder Representations from Transformers (BERT) model, training of which typically requires a much higher volume of labeled documents to detect abusive comments. © 2022 IEEE.
- Bidirectional Encoder Representations from Transformers (BERT),
- calibration,
- Cyberbully,
- Expectation-Maximization (EM),
- Naive Bayes,
- racism,
- Automation,
- Classifiers,
- Computational efficiency,
- Deep learning,
- Learning algorithms,
- Learning systems,
- Machinery,
- Maximum principle,
- Signal encoding,
- Social networking (online)
IR Deposit conditions: non-described