Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
1.
J Am Acad Child Adolesc Psychiatry ; 62(9): 1010-1020, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37182586

RESUMO

OBJECTIVE: Suicide is a leading cause of death among adolescents. However, there are no clinical tools to detect proximal risk for suicide. METHOD: Participants included 13- to 18-year-old adolescents (N = 103) reporting a current depressive, anxiety, and/or substance use disorder who owned a smartphone; 62% reported current suicidal ideation, with 25% indicating a past-year attempt. At baseline, participants were administered clinical interviews to assess lifetime disorders and suicidal thoughts and behaviors (STBs). Self-reports assessing symptoms and suicide risk factors also were obtained. In addition, the Effortless Assessment of Risk States (EARS) app was installed on adolescent smartphones to acquire daily mood and weekly suicidal ideation severity during the 6-month follow-up period. Adolescents completed STB and psychiatric service use interviews at the 1-, 3-, and 6-month follow-up assessments. RESULTS: K-means clustering based on aggregates of weekly suicidal ideation scores resulted in a 3-group solution reflecting high-risk (n = 26), medium-risk (n = 47), and low-risk (n = 30) groups. Of the high-risk group, 58% reported suicidal events (ie, suicide attempts, psychiatric hospitalizations, emergency department visits, ideation severity requiring an intervention) during the 6-month follow-up period. For participants in the high-risk and medium-risk groups (n = 73), mood disturbances in the preceding 7 days predicted clinically significant ideation, with a 1-SD decrease in mood doubling participants' likelihood of reporting clinically significant ideation on a given week. CONCLUSION: Intensive longitudinal assessment through use of personal smartphones offers a feasible method to assess variability in adolescents' emotional experiences and suicide risk. Translating these tools into clinical practice may help to reduce the needless loss of life among adolescents.


Assuntos
Ideação Suicida , Tentativa de Suicídio , Humanos , Adolescente , Tentativa de Suicídio/prevenção & controle , Tentativa de Suicídio/psicologia , Transtornos do Humor , Transtornos de Ansiedade , Fatores de Risco
2.
J Affect Disord ; 333: 543-552, 2023 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-37121279

RESUMO

BACKGROUND: Expert consensus guidelines recommend Cognitive Behavioral Therapy (CBT) and Interpersonal Psychotherapy (IPT), interventions that were historically delivered face-to-face, as first-line treatments for Major Depressive Disorder (MDD). Despite the ubiquity of telehealth following the COVID-19 pandemic, little is known about differential outcomes with CBT versus IPT delivered in-person (IP) or via telehealth (TH) or whether working alliance is affected. METHODS: Adults meeting DSM-5 criteria for MDD were randomly assigned to either 8 sessions of IPT or CBT (group). Mid-trial, COVID-19 forced a change of therapy delivery from IP to TH (study phase). We compared changes in Hamilton Rating Scale for Depression (HRSD-17) and Working Alliance Inventory (WAI) scores for individuals by group and phase: CBT-IP (n = 24), CBT-TH (n = 11), IPT-IP (n = 25) and IPT-TH (n = 17). RESULTS: HRSD-17 scores declined significantly from pre to post treatment (pre: M = 17.7, SD = 4.4 vs. post: M = 11.7, SD = 5.9; p < .001; d = 1.45) without significant group or phase effects. WAI scores did not differ by group or phase. Number of completed therapy sessions was greater for TH (M = 7.8, SD = 1.2) relative to IP (M = 7.2, SD = 1.6) (Mann-Whitney U = 387.50, z = -2.24, p = .025). LIMITATIONS: Participants were not randomly assigned to IP versus TH. Sample size is small. CONCLUSIONS: This study provides preliminary evidence supporting the efficacy of both brief IPT and CBT, delivered by either TH or IP, for depression. It showed that working alliance is preserved in TH, and delivery via TH may improve therapy adherence. Prospective, randomized controlled trials are needed to definitively test efficacy of brief IPT and CBT delivered via TH versus IP.


Assuntos
COVID-19 , Terapia Cognitivo-Comportamental , Transtorno Depressivo Maior , Psicoterapia Interpessoal , Telemedicina , Adulto , Humanos , Depressão/terapia , Transtorno Depressivo Maior/terapia , Pandemias , Estudos Prospectivos , Psicoterapia , Resultado do Tratamento
3.
Behav Res Methods ; 55(5): 2333-2352, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-35877024

RESUMO

Eye tracking and other behavioral measurements collected from patient-participants in their hospital rooms afford a unique opportunity to study natural behavior for basic and clinical translational research. We describe an immersive social and behavioral paradigm implemented in patients undergoing evaluation for surgical treatment of epilepsy, with electrodes implanted in the brain to determine the source of their seizures. Our studies entail collecting eye tracking with other behavioral and psychophysiological measurements from patient-participants during unscripted behavior, including social interactions with clinical staff, friends, and family in the hospital room. This approach affords a unique opportunity to study the neurobiology of natural social behavior, though it requires carefully addressing distinct logistical, technical, and ethical challenges. Collecting neurophysiological data synchronized to behavioral and psychophysiological measures helps us to study the relationship between behavior and physiology. Combining across these rich data sources while participants eat, read, converse with friends and family, etc., enables clinical-translational research aimed at understanding the participants' disorders and clinician-patient interactions, as well as basic research into natural, real-world behavior. We discuss data acquisition, quality control, annotation, and analysis pipelines that are required for our studies. We also discuss the clinical, logistical, and ethical and privacy considerations critical to working in the hospital setting.


Assuntos
Encéfalo , Comportamento Social , Humanos , Privacidade
4.
Front Psychol ; 13: 774547, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36329758

RESUMO

Participants in a conversation must carefully monitor the turn-management (speaking and listening) willingness of other conversational partners and adjust their turn-changing behaviors accordingly to have smooth conversation. Many studies have focused on developing actual turn-changing (i.e., next speaker or end-of-turn) models that can predict whether turn-keeping or turn-changing will occur. Participants' verbal and non-verbal behaviors have been used as input features for predictive models. To the best of our knowledge, these studies only model the relationship between participant behavior and turn-changing. Thus, there is no model that takes into account participants' willingness to acquire a turn (turn-management willingness). In this paper, we address the challenge of building such models to predict the willingness of both speakers and listeners. Firstly, we find that dissonance exists between willingness and actual turn-changing. Secondly, we propose predictive models that are based on trimodal inputs, including acoustic, linguistic, and visual cues distilled from conversations. Additionally, we study the impact of modeling willingness to help improve the task of turn-changing prediction. To do so, we introduce a dyadic conversation corpus with annotated scores of speaker/listener turn-management willingness. Our results show that using all three modalities (i.e., acoustic, linguistic, and visual cues) of the speaker and listener is critically important for predicting turn-management willingness. Furthermore, explicitly adding willingness as a prediction task improves the performance of turn-changing prediction. Moreover, turn-management willingness prediction becomes more accurate when this joint prediction of turn-management willingness and turn-changing is performed by using multi-task learning techniques.

5.
Proc ACM Int Conf Multimodal Interact ; 2022: 487-494, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-36913231

RESUMO

The relationship between a therapist and their client is one of the most critical determinants of successful therapy. The working alliance is a multifaceted concept capturing the collaborative aspect of the therapist-client relationship; a strong working alliance has been extensively linked to many positive therapeutic outcomes. Although therapy sessions are decidedly multimodal interactions, the language modality is of particular interest given its recognized relationship to similar dyadic concepts such as rapport, cooperation, and affiliation. Specifically, in this work we study language entrainment, which measures how much the therapist and client adapt toward each other's use of language over time. Despite the growing body of work in this area, however, relatively few studies examine causal relationships between human behavior and these relationship metrics: does an individual's perception of their partner affect how they speak, or does how they speak affect their perception? We explore these questions in this work through the use of structural equation modeling (SEM) techniques, which allow for both multilevel and temporal modeling of the relationship between the quality of the therapist-client working alliance and the participants' language entrainment. In our first experiment, we demonstrate that these techniques perform well in comparison to other common machine learning models, with the added benefits of interpretability and causal analysis. In our second analysis, we interpret the learned models to examine the relationship between working alliance and language entrainment and address our exploratory research questions. The results reveal that a therapist's language entrainment can have a significant impact on the client's perception of the working alliance, and that the client's language entrainment is a strong indicator of their perception of the working alliance. We discuss the implications of these results and consider several directions for future work in multimodality.

6.
Schizophr Res ; 245: 97-115, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-34456131

RESUMO

OBJECTIVES: This study aimed to (1) determine the feasibility of collecting behavioral data from participants hospitalized with acute psychosis and (2) begin to evaluate the clinical information that can be computationally derived from such data. METHODS: Behavioral data was collected across 99 sessions from 38 participants recruited from an inpatient psychiatric unit. Each session started with a semi-structured interview modeled on a typical "clinical rounds" encounter and included administration of the Positive and Negative Syndrome Scale (PANSS). ANALYSIS: We quantified aspects of participants' verbal behavior during the interview using lexical, coherence, and disfluency features. We then used two complementary approaches to explore our second objective. The first approach used predictive models to estimate participants' PANSS scores from their language features. Our second approach used inferential models to quantify the relationships between individual language features and symptom measures. RESULTS: Our predictive models showed promise but lacked sufficient data to achieve clinically useful accuracy. Our inferential models identified statistically significant relationships between numerous language features and symptom domains. CONCLUSION: Our interview recording procedures were well-tolerated and produced adequate data for transcription and analysis. The results of our inferential modeling suggest that automatic measurements of expressive language contain signals highly relevant to the assessment of psychosis. These findings establish the potential of measuring language during a clinical interview in a naturalistic setting and generate specific hypotheses that can be tested in future studies. This, in turn, will lead to more accurate modeling and better understanding of the relationships between expressive language and psychosis.


Assuntos
Mania , Transtornos Psicóticos , Humanos , Idioma , Transtornos Psicóticos/psicologia
7.
Affect Sci ; 2: 32-47, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34337430

RESUMO

The common view of emotional expressions is that certain configurations of facial-muscle movements reliably reveal certain categories of emotion. The principal exemplar of this view is the Duchenne smile, a configuration of facial-muscle movements (i.e., smiling with eye constriction) that has been argued to reliably reveal genuine positive emotion. In this paper, we formalized a list of hypotheses that have been proposed regarding the Duchenne smile, briefly reviewed the literature weighing on these hypotheses, identified limitations and unanswered questions, and conducted two empirical studies to begin addressing these limitations and answering these questions. Both studies analyzed a database of 751 smiles observed while 136 participants completed experimental tasks designed to elicit amusement, embarrassment, fear, and physical pain. Study 1 focused on participants' self-reported positive emotion and Study 2 focused on how third-party observers would perceive videos of these smiles. Most of the hypotheses that have been proposed about the Duchenne smile were either contradicted by or only weakly supported by our data. Eye constriction did provide some information about experienced positive emotion, but this information was lacking in specificity, already provided by other smile characteristics, and highly dependent on context. Eye constriction provided more information about perceived positive emotion, including some unique information over other smile characteristics, but context was also important here as well. Overall, our results suggest that accurately inferring positive emotion from a smile requires more sophisticated methods than simply looking for the presence/absence (or even the intensity) of eye constriction.

8.
Artigo em Inglês | MEDLINE | ID: mdl-35937037

RESUMO

Early client dropout is one of the most significant challenges facing psychotherapy: recent studies suggest that at least one in five clients will leave treatment prematurely. Clients may terminate therapy for various reasons, but one of the most common causes is the lack of a strong working alliance. The concept of working alliance captures the collaborative relationship between a client and their therapist when working toward the progress and recovery of the client seeking treatment. Unfortunately, clients are often unwilling to directly express dissatisfaction in care until they have already decided to terminate therapy. On the other side, therapists may miss subtle signs of client discontent during treatment before it is too late. In this work, we demonstrate that nonverbal behavior analysis may aid in bridging this gap. The present study focuses primarily on the head gestures of both the client and therapist, contextualized within conversational turn-taking actions between the pair during psychotherapy sessions. We identify multiple behavior patterns suggestive of an individual's perspective on the working alliance; interestingly, these patterns also differ between the client and the therapist. These patterns inform the development of predictive models for self-reported ratings of working alliance, which demonstrate significant predictive power for both client and therapist ratings. Future applications of such models may stimulate preemptive intervention to strengthen a weak working alliance, whether explicitly attempting to repair the existing alliance or establishing a more suitable client-therapist pairing, to ensure that clients encounter fewer barriers to receiving the treatment they need.

9.
Proc ACM Int Conf Multimodal Interact ; 2021: 728-734, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35128550

RESUMO

This paper studies the hypothesis that not all modalities are always needed to predict affective states. We explore this hypothesis in the context of recognizing three affective states that have shown a relation to a future onset of depression: positive, aggressive, and dysphoric. In particular, we investigate three important modalities for face-to-face conversations: vision, language, and acoustic modality. We first perform a human study to better understand which subset of modalities people find informative, when recognizing three affective states. As a second contribution, we explore how these human annotations can guide automatic affect recognition systems to be more interpretable while not degrading their predictive performance. Our studies show that humans can reliably annotate modality informativeness. Further, we observe that guided models significantly improve interpretability, i.e., they attend to modalities similarly to how humans rate the modality informativeness, while at the same time showing a slight increase in predictive performance.

10.
Adv Neural Inf Process Syst ; 2021(DB1): 1-20, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38774625

RESUMO

Learning multimodal representations involves integrating information from multiple heterogeneous sources of data. It is a challenging yet crucial area with numerous real-world applications in multimedia, affective computing, robotics, finance, human-computer interaction, and healthcare. Unfortunately, multimodal research has seen limited resources to study (1) generalization across domains and modalities, (2) complexity during training and inference, and (3) robustness to noisy and missing modalities. In order to accelerate progress towards understudied modalities and tasks while ensuring real-world robustness, we release MultiBench, a systematic and unified large-scale benchmark for multimodal learning spanning 15 datasets, 10 modalities, 20 prediction tasks, and 6 research areas. MultiBench provides an automated end-to-end machine learning pipeline that simplifies and standardizes data loading, experimental setup, and model evaluation. To enable holistic evaluation, MultiBench offers a comprehensive methodology to assess (1) generalization, (2) time and space complexity, and (3) modality robustness. MultiBench introduces impactful challenges for future research, including scalability to large-scale multimodal datasets and robustness to realistic imperfections. To accompany this benchmark, we also provide a standardized implementation of 20 core approaches in multimodal learning spanning innovations in fusion paradigms, optimization objectives, and training approaches. Simply applying methods proposed in different research areas can improve the state-of-the-art performance on 9/15 datasets. Therefore, MultiBench presents a milestone in unifying disjoint efforts in multimodal machine learning research and paves the way towards a better understanding of the capabilities and limitations of multimodal models, all the while ensuring ease of use, accessibility, and reproducibility. MultiBench, our standardized implementations, and leaderboards are publicly available, will be regularly updated, and welcomes inputs from the community.

11.
Proc ACM Int Conf Multimodal Interact ; 2020: 548-557, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-33969360

RESUMO

Emotional expressiveness captures the extent to which a person tends to outwardly display their emotions through behavior. Due to the close relationship between emotional expressiveness and behavioral health, as well as the crucial role that it plays in social interaction, the ability to automatically predict emotional expressiveness stands to spur advances in science, medicine, and industry. In this paper, we explore three related research questions. First, how well can emotional expressiveness be predicted from visual, linguistic, and multimodal behavioral signals? Second, how important is each behavioral modality to the prediction of emotional expressiveness? Third, which behavioral signals are reliably related to emotional expressiveness? To answer these questions, we add highly reliable transcripts and human ratings of perceived emotional expressiveness to an existing video database and use this data to train, validate, and test predictive models. Our best model shows promising predictive performance on this dataset (RMSE = 0.65, R 2 = 0.45, r = 0.74). Multimodal models tend to perform best overall, and models trained on the linguistic modality tend to outperform models trained on the visual modality. Finally, examination of our interpretable models' coefficients reveals a number of visual and linguistic behavioral signals-such as facial action unit intensity, overall word count, and use of words related to social processes-that reliably predict emotional expressiveness.

12.
Proc Conf Empir Methods Nat Lang Process ; 2020: 1801-1812, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33969362

RESUMO

Modeling multimodal language is a core research area in natural language processing. While languages such as English have relatively large multimodal language resources, other widely spoken languages across the globe have few or no large-scale datasets in this area. This disproportionately affects native speakers of languages other than English. As a step towards building more equitable and inclusive multimodal systems, we introduce the first large-scale multimodal language dataset for Spanish, Portuguese, German and French. The proposed dataset, called CMU-MOSEAS (CMU Multimodal Opinion Sentiment, Emotions and Attributes), is the largest of its kind with 40, 000 total labelled sentences. It covers a diverse set topics and speakers, and carries supervision of 20 labels including sentiment (and subjectivity), emotions, and attributes. Our evaluations on a state-of-the-art multimodal model demonstrates that CMU-MOSEAS enables further research for multilingual studies in multimodal language.

13.
Proc Conf Empir Methods Nat Lang Process ; 2020: 1823-1833, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33969363

RESUMO

The human language can be expressed through multiple sources of information known as modalities, including tones of voice, facial gestures, and spoken language. Recent multimodal learning with strong performances on human-centric tasks such as sentiment analysis and emotion recognition are often black-box, with very limited interpretability. In this paper we propose Multimodal Routing, which dynamically adjusts weights between input modalities and output representations differently for each input sample. Multimodal routing can identify relative importance of both individual modalities and cross-modality features. Moreover, the weight assignment by routing allows us to interpret modality-prediction relationships not only globally (i.e. general trends over the whole dataset), but also locally for each single input sample, mean-while keeping competitive performance compared to state-of-the-art methods.

14.
Artigo em Inglês | MEDLINE | ID: mdl-33679265

RESUMO

Knowing how much to trust a prediction is important for many critical applications. We describe two simple approaches to estimate uncertainty in regression prediction tasks and compare their performance and complexity against popular approaches. We operationalize uncertainty in regression as the absolute error between a model's prediction and the ground truth. Our two proposed approaches use a secondary model to predict the uncertainty of a primary predictive model. Our first approach leverages the assumption that similar observations are likely to have similar uncertainty and predicts uncertainty with a non-parametric method. Our second approach trains a secondary model to directly predict the uncertainty of the primary predictive model. Both approaches outperform other established uncertainty estimation approaches on the MNIST, DISFA, and BP4D+ datasets. Furthermore, we observe that approaches that directly predict the uncertainty generally perform better than approaches that indirectly estimate uncertainty.

15.
Proc Conf Assoc Comput Linguist Meet ; 2020: 2359-2369, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-33782629

RESUMO

Recent Transformer-based contextual word representations, including BERT and XLNet, have shown state-of-the-art performance in multiple disciplines within NLP. Fine-tuning the trained contextual models on task-specific datasets has been the key to achieving superior performance downstream. While fine-tuning these pre-trained models is straight-forward for lexical applications (applications with only language modality), it is not trivial for multimodal language (a growing area in NLP focused on modeling face-to-face communication). Pre-trained models don't have the necessary components to accept two extra modalities of vision and acoustic. In this paper, we proposed an attachment to BERT and XLNet called Multimodal Adaptation Gate (MAG). MAG allows BERT and XLNet to accept multimodal nonverbal data during fine-tuning. It does so by generating a shift to internal representation of BERT and XLNet; a shift that is conditioned on the visual and acoustic modalities. In our experiments, we study the commonly used CMU-MOSI and CMU-MOSEI datasets for multimodal sentiment analysis. Fine-tuning MAG-BERT and MAG-XLNet significantly boosts the sentiment analysis performance over previous baselines as well as language-only fine-tuning of BERT and XLNet. On the CMU-MOSI dataset, MAG-XLNet achieves human-level multimodal sentiment analysis performance for the first time in the NLP community.

16.
Artigo em Inglês | MEDLINE | ID: mdl-33782675

RESUMO

Recent progress in artificial intelligence has led to the development of automatic behavioral marker recognition, such as facial and vocal expressions. Those automatic tools have enormous potential to support mental health assessment, clinical decision making, and treatment planning. In this paper, we investigate nonverbal behavioral markers of depression severity assessed during semi-structured medical interviews of adolescent patients. The main goal of our research is two-fold: studying a unique population of adolescents at high risk of mental disorders and differentiating mild depression from moderate or severe depression. We aim to explore computationally inferred facial and vocal behavioral responses elicited by three segments of the semi-structured medical interviews: Distress Assessment Questions, Ubiquitous Questions, and Concept Questions. Our experimental methodology reflects best practise used for analyzing small sample size and unbalanced datasets of unique patients. Our results show a very interesting trend with strongly discriminative behavioral markers from both acoustic and visual modalities. These promising results are likely due to the unique classification task (mild depression vs. moderate and severe depression) and three types of probing questions.

17.
Proc Conf Assoc Comput Linguist Meet ; 2019: 6558-6569, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32362720

RESUMO

Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an end-to-end manner without explicitly aligning the data. At the heart of our model is the directional pairwise cross-modal attention, which attends to interactions between multimodal sequences across distinct time steps and latently adapt streams from one modality to another. Comprehensive experiments on both aligned and non-aligned multimodal time-series show that our model outperforms state-of-the-art methods by a large margin. In addition, empirical analysis suggests that correlated crossmodal signals are able to be captured by the proposed crossmodal attention mechanism in MulT.

18.
Artigo em Inglês | MEDLINE | ID: mdl-32363090

RESUMO

The Duchenne smile hypothesis is that smiles that include eye constriction (AU6) are the product of genuine positive emotion, whereas smiles that do not are either falsified or related to negative emotion. This hypothesis has become very influential and is often used in scientific and applied settings to justify the inference that a smile is either true or false. However, empirical support for this hypothesis has been equivocal and some researchers have proposed that, rather than being a reliable indicator of positive emotion, AU6 may just be an artifact produced by intense smiles. Initial support for this proposal has been found when comparing smiles related to genuine and feigned positive emotion; however, it has not yet been examined when comparing smiles related to genuine positive and negative emotion. The current study addressed this gap in the literature by examining spontaneous smiles from 136 participants during the elicitation of amusement, embarrassment, fear, and pain (from the BP4D+ dataset). Bayesian multilevel regression models were used to quantify the associations between AU6 and self-reported amusement while controlling for smile intensity. Models were estimated to infer amusement from AU6 and to explain the intensity of AU6 using amusement. In both cases, controlling for smile intensity substantially reduced the hypothesized association, whereas the effect of smile intensity itself was quite large and reliable. These results provide further evidence that the Duchenne smile is likely an artifact of smile intensity rather than a reliable and unique indicator of genuine positive emotion.

19.
Proc AAAI Conf Artif Intell ; 33(1): 7216-7223, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32219010

RESUMO

Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations.

20.
IEEE Trans Pattern Anal Mach Intell ; 41(2): 423-443, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-29994351

RESUMO

Our experience of the world is multimodal - we see objects, hear sounds, feel texture, smell odors, and taste flavors. Modality refers to the way in which something happens or is experienced and a research problem is characterized as multimodal when it includes multiple such modalities. In order for Artificial Intelligence to make progress in understanding the world around us, it needs to be able to interpret such multimodal signals together. Multimodal machine learning aims to build models that can process and relate information from multiple modalities. It is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential. Instead of focusing on specific multimodal applications, this paper surveys the recent advances in multimodal machine learning itself and presents them in a common taxonomy. We go beyond the typical early and late fusion categorization and identify broader challenges that are faced by multimodal machine learning, namely: representation, translation, alignment, fusion, and co-learning. This new taxonomy will enable researchers to better understand the state of the field and identify directions for future research.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...