RESUMO
Profile similarity measures are used to quantify the similarity of two sets of ratings on multiple variables. Yet, it remains unclear how different measures are distinct or overlap and what type of information they precisely convey, making it unclear what measures are best applied under varying circumstances. With this study, we aim to provide clarity with respect to how existing measures interrelate and provide recommendations for their use by comparing a wide range of profile similarity measures. We have taken four steps. First, we reviewed 88 similarity measures by applying them to multiple cross-sectional and intensive longitudinal data sets on emotional experience and retained 43 useful profile similarity measures after eliminating duplicates, complements, or measures that were unsuitable for the intended purpose. Second, we have clustered these 43 measures into similarly behaving groups, and found three general clusters: one cluster with difference measures, one cluster with product measures that could be split into four more nuanced groups and one miscellaneous cluster that could be split into two more nuanced groups. Third, we have interpreted what unifies these groups and their subgroups and what information they convey based on theory and formulas. Last, based on our findings, we discuss recommendations with respect to the choice of measure, propose to avoid using the Pearson correlation, and suggest to center profile items when stereotypical patterns threaten to confound the computation of similarity.
RESUMO
BACKGROUND: Emotions and mood are important for overall well-being. Therefore, the search for continuous, effortless emotion prediction methods is an important field of study. Mobile sensing provides a promising tool and can capture one of the most telling signs of emotion: language. OBJECTIVE: The aim of this study is to examine the separate and combined predictive value of mobile-sensed language data sources for detecting both momentary emotional experience as well as global individual differences in emotional traits and depression. METHODS: In a 2-week experience sampling method study, we collected self-reported emotion ratings and voice recordings 10 times a day, continuous keyboard activity, and trait depression severity. We correlated state and trait emotions and depression and language, distinguishing between speech content (spoken words), speech form (voice acoustics), writing content (written words), and writing form (typing dynamics). We also investigated how well these features predicted state and trait emotions using cross-validation to select features and a hold-out set for validation. RESULTS: Overall, the reported emotions and mobile-sensed language demonstrated weak correlations. The most significant correlations were found between speech content and state emotions and between speech form and state emotions, ranging up to 0.25. Speech content provided the best predictions for state emotions. None of the trait emotion-language correlations remained significant after correction. Among the emotions studied, valence and happiness displayed the most significant correlations and the highest predictive performance. CONCLUSIONS: Although using mobile-sensed language as an emotion marker shows some promise, correlations and predictive R2 values are low.