RESUMO
Deep neural networks have been proven effective in classifying human interactions into emotions, especially by encoding multiple input modalities. In this work, we assess the robustness of a transformer-based multimodal audio-text classifier for emotion recognition, by perturbing the input at inference time using attacks which we design specifically to corrupt information deemed important for emotion recognition. To measure the impact of the attacks on the classifier, we compare between the accuracy of the classifier on the perturbed input and on the original, unperturbed input. Our results show that the multimodal classifier is more resilient to perturbation attacks than the equivalent unimodal classifiers, suggesting that the two modalities are encoded in a way that allows the classifier to benefit from one modality even when the other one is slightly damaged.
RESUMO
Childhood sexual abuse (CSA) is a worldwide phenomenon that has negative long-term consequences for the victims and their families, and inflicts a considerable economic toll on society. One of the main difficulties in treating CSA is victims' reluctance to disclose their abuse, and the failure of professionals to detect it when there is no forensic evidence (Bottoms et al., 2014; McElvaney, 2013). Estimated disclosure rates for child sexual abuse based on retrospective adult reports range from 23 % to 45 % (e.g., Bottoms et al., 2014). This study reports the four stages in the development of a Convolutional Neural Network (CNN) system designed to detect abuse in self-figure drawings: (1) A preliminary study to build a Gender CNN; (2) Expert-level performance evaluation, (3) validation of the CSA CNN, (4) testing of the CSA CNN model. The findings indicate that the Gender CNN achieved 88 % detection accuracy and outperformed the CSA CNN by 19 percentage points. The CSA CNN achieved 72 % accuracy on the test set with 80 % precision and 79 % recall for the abuse class prediction. However, human experts outperformed the CSA CNN by 16 percentage points, probably due to the complexity of the task. These preliminary results suggest that CNN, when further developed, can contribute to the detection of child sexual abuse.