IIT Home Page CNR Home Page

Six Things I Hate About You (in Italian) and Six Classification Strategies to More and More Effectively Find Them

While favouring communications and easing information sharing, Online Social Networks are increasingly used to launch harmful campaigns against specific groups and individuals. Although providers struggle to keep pace by manually removing hate content published on their platforms, recent research efforts rely on automatic text classification techniques, whose performances are usually measured on annotated corpora. In this work, we propose six distinct machine learning classification strategies: three based on conventional machine learning approaches, three based on neural networks. The latter are able to process texts almost from scratch, avoiding the need of i) NLP tools specialised for a specific language, ii) the phase of time-consuming feature engineering, and iii) the high computational cost usually derived from processing a huge amount of features. Thus, the main goal of the paper is to investigate whether it is possible to rely on neural networks and to achieve performance results at least comparable with those of NLP-based classifiers. The performances of the six configurations are evaluated over an annotated dataset consisting of 4,000 Italian tweets and 4,000 Italian Facebook comments. By comparing the classification results, we demonstrate that relying on deep learning techniques for hate speech detection is more than encouraging. In particular, a deep learning model, based on an ensemble approach, obtains a F1 score of 0.786 on the Twitter data and 0.775 on the Facebook ones, the best results, compared to the ones obtained with the other tested configurations.

Proceedings of the Third Italian Conference on Cyber Security, Pisa, Italy, 2019

Autori IIT:

Tipo: Contributo in atti di convegno
Area di disciplina: Computer Science & Engineering

File: ItaSec2019.pdf

Attività: Algoritmica per tecnologie web