Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 88
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 121(14): e2319837121, 2024 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-38530887

RESUMO

Depression has robust natural language correlates and can increasingly be measured in language using predictive models. However, despite evidence that language use varies as a function of individual demographic features (e.g., age, gender), previous work has not systematically examined whether and how depression's association with language varies by race. We examine how race moderates the relationship between language features (i.e., first-person pronouns and negative emotions) from social media posts and self-reported depression, in a matched sample of Black and White English speakers in the United States. Our findings reveal moderating effects of race: While depression severity predicts I-usage in White individuals, it does not in Black individuals. White individuals use more belongingness and self-deprecation-related negative emotions. Machine learning models trained on similar amounts of data to predict depression severity performed poorly when tested on Black individuals, even when they were trained exclusively using the language of Black individuals. In contrast, analogous models tested on White individuals performed relatively well. Our study reveals surprising race-based differences in the expression of depression in natural language and highlights the need to understand these effects better, especially before language-based models for detecting psychological phenomena are integrated into clinical practice.


Assuntos
Depressão , Mídias Sociais , Humanos , Estados Unidos , Depressão/psicologia , Emoções , Idioma
2.
Proc Natl Acad Sci U S A ; 118(39)2021 09 28.
Artigo em Inglês | MEDLINE | ID: mdl-34544875

RESUMO

On May 25, 2020, George Floyd, an unarmed Black American male, was killed by a White police officer. Footage of the murder was widely shared. We examined the psychological impact of Floyd's death using two population surveys that collected data before and after his death; one from Gallup (117,568 responses from n = 47,355) and one from the US Census (409,652 responses from n = 319,471). According to the Gallup data, in the week following Floyd's death, anger and sadness increased to unprecedented levels in the US population. During this period, more than a third of the US population reported these emotions. These increases were more pronounced for Black Americans, nearly half of whom reported these emotions. According to the US Census Household Pulse data, in the week following Floyd's death, depression and anxiety severity increased among Black Americans at significantly higher rates than that of White Americans. Our estimates suggest that this increase corresponds to an additional 900,000 Black Americans who would have screened positive for depression, associated with a burden of roughly 2.7 million to 6.3 million mentally unhealthy days.


Assuntos
Ansiedade/epidemiologia , Depressão/epidemiologia , Emoções/fisiologia , Homicídio/psicologia , Saúde Mental/etnologia , Polícia/estatística & dados numéricos , Racismo/psicologia , Adolescente , Adulto , Negro ou Afro-Americano/psicologia , Ira/fisiologia , Ansiedade/psicologia , Depressão/psicologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estados Unidos/epidemiologia , População Branca/psicologia , Adulto Jovem
3.
Proc Natl Acad Sci U S A ; 117(19): 10165-10171, 2020 05 12.
Artigo em Inglês | MEDLINE | ID: mdl-32341156

RESUMO

Researchers and policy makers worldwide are interested in measuring the subjective well-being of populations. When users post on social media, they leave behind digital traces that reflect their thoughts and feelings. Aggregation of such digital traces may make it possible to monitor well-being at large scale. However, social media-based methods need to be robust to regional effects if they are to produce reliable estimates. Using a sample of 1.53 billion geotagged English tweets, we provide a systematic evaluation of word-level and data-driven methods for text analysis for generating well-being estimates for 1,208 US counties. We compared Twitter-based county-level estimates with well-being measurements provided by the Gallup-Sharecare Well-Being Index survey through 1.73 million phone surveys. We find that word-level methods (e.g., Linguistic Inquiry and Word Count [LIWC] 2015 and Language Assessment by Mechanical Turk [LabMT]) yielded inconsistent county-level well-being measurements due to regional, cultural, and socioeconomic differences in language use. However, removing as few as three of the most frequent words led to notable improvements in well-being prediction. Data-driven methods provided robust estimates, approximating the Gallup data at up to r = 0.64. We show that the findings generalized to county socioeconomic and health outcomes and were robust when poststratifying the samples to be more representative of the general US population. Regional well-being estimation from social media data seems to be robust when supervised data-driven methods are used.

4.
Proc Natl Acad Sci U S A ; 117(9): 4571-4577, 2020 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-32071251

RESUMO

Machine learning is proving invaluable across disciplines. However, its success is often limited by the quality and quantity of available data, while its adoption is limited by the level of trust afforded by given models. Human vs. machine performance is commonly compared empirically to decide whether a certain task should be performed by a computer or an expert. In reality, the optimal learning strategy may involve combining the complementary strengths of humans and machines. Here, we present expert-augmented machine learning (EAML), an automated method that guides the extraction of expert knowledge and its integration into machine-learned models. We used a large dataset of intensive-care patient data to derive 126 decision rules that predict hospital mortality. Using an online platform, we asked 15 clinicians to assess the relative risk of the subpopulation defined by each rule compared to the total sample. We compared the clinician-assessed risk to the empirical risk and found that, while clinicians agreed with the data in most cases, there were notable exceptions where they overestimated or underestimated the true risk. Studying the rules with greatest disagreement, we identified problems with the training data, including one miscoded variable and one hidden confounder. Filtering the rules based on the extent of disagreement between clinician-assessed risk and empirical risk, we improved performance on out-of-sample data and were able to train with less data. EAML provides a platform for automated creation of problem-specific priors, which help build robust and dependable machine-learning models in critical applications.


Assuntos
Sistemas Inteligentes , Aprendizado de Máquina/normas , Informática Médica/métodos , Gerenciamento de Dados/métodos , Sistemas de Gerenciamento de Base de Dados , Informática Médica/normas
5.
Alcohol Clin Exp Res ; 46(5): 836-847, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35575955

RESUMO

BACKGROUND: Assessing risk for excessive alcohol use is important for applications ranging from recruitment into research studies to targeted public health messaging. Social media language provides an ecologically embedded source of information for assessing individuals who may be at risk for harmful drinking. METHODS: Using data collected on 3664 respondents from the general population, we examine how accurately language used on social media classifies individuals as at-risk for alcohol problems based on Alcohol Use Disorder Identification Test-Consumption score benchmarks. RESULTS: We find that social media language is moderately accurate (area under the curve = 0.75) at identifying individuals at risk for alcohol problems (i.e., hazardous drinking/alcohol use disorders) when used with models based on contextual word embeddings. High-risk alcohol use was predicted by individuals' usage of words related to alcohol, partying, informal expressions, swearing, and anger. Low-risk alcohol use was predicted by individuals' usage of social, affiliative, and faith-based words. CONCLUSIONS: The use of social media data to study drinking behavior in the general public is promising and could eventually support primary and secondary prevention efforts among Americans whose at-risk drinking may have otherwise gone "under the radar."


Assuntos
Transtornos Relacionados ao Uso de Álcool , Alcoolismo , Mídias Sociais , Consumo de Bebidas Alcoólicas/epidemiologia , Transtornos Relacionados ao Uso de Álcool/epidemiologia , Alcoolismo/diagnóstico , Alcoolismo/epidemiologia , Humanos , Idioma
6.
Depress Anxiety ; 39(12): 794-804, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36281621

RESUMO

OBJECTIVE: Language patterns may elucidate mechanisms of mental health conditions. To inform underlying theory and risk models, we evaluated prospective associations between in vivo text messaging language and differential symptoms of depression, generalized anxiety, and social anxiety. METHODS: Over 16 weeks, we collected outgoing text messages from 335 adults. Using Linguistic Inquiry and Word Count (LIWC), NRC Emotion Lexicon, and previously established depression and stress dictionaries, we evaluated the degree to which language features predict symptoms of depression, generalized anxiety, or social anxiety the following week using hierarchical linear models. To isolate the specificity of language effects, we also controlled for the effects of the two other symptom types. RESULTS: We found significant relationships of language features, including personal pronouns, negative emotion, cognitive and biological processes, and informal language, with common mental health conditions, including depression, generalized anxiety, and social anxiety (ps < .05). There was substantial overlap between language features and the three mental health outcomes. However, after controlling for other symptoms in the models, depressive symptoms were uniquely negatively associated with language about anticipation, trust, social processes, and affiliation (ßs: -.10 to -.09, ps < .05), whereas generalized anxiety symptoms were positively linked with these same language features (ßs: .12-.13, ps < .001). Social anxiety symptoms were uniquely associated with anger, sexual language, and swearing (ßs: .12-.13, ps < .05). CONCLUSION: Language that confers both common (e.g., personal pronouns and negative emotion) and specific (e.g., affiliation, anticipation, trust, and anger) risk for affective disorders is perceptible in prior week text messages, holding promise for understanding cognitive-behavioral mechanisms and tailoring digital interventions.


Assuntos
Envio de Mensagens de Texto , Adulto , Humanos , Depressão/epidemiologia , Depressão/psicologia , Ansiedade/epidemiologia , Ansiedade/psicologia , Linguística , Atitude
7.
J Biomed Inform ; 125: 103971, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34920127

RESUMO

OBJECTIVE: Quantify tradeoffs in performance, reproducibility, and resource demands across several strategies for developing clinically relevant word embeddings. MATERIALS AND METHODS: We trained separate embeddings on all full-text manuscripts in the Pubmed Central (PMC) Open Access subset, case reports therein, the English Wikipedia corpus, the Medical Information Mart for Intensive Care (MIMIC) III dataset, and all notes in the University of Pennsylvania Health System (UPHS) electronic health record. We tested embeddings in six clinically relevant tasks including mortality prediction and de-identification, and assessed performance using the scaled Brier score (SBS) and the proportion of notes successfully de-identified, respectively. RESULTS: Embeddings from UPHS notes best predicted mortality (SBS 0.30, 95% CI 0.15 to 0.45) while Wikipedia embeddings performed worst (SBS 0.12, 95% CI -0.05 to 0.28). Wikipedia embeddings most consistently (78% of notes) and the full PMC corpus embeddings least consistently (48%) de-identified notes. Across all six tasks, the full PMC corpus demonstrated the most consistent performance, and the Wikipedia corpus the least. Corpus size ranged from 49 million tokens (PMC case reports) to 10 billion (UPHS). DISCUSSION: Embeddings trained on published case reports performed as least as well as embeddings trained on other corpora in most tasks, and clinical corpora consistently outperformed non-clinical corpora. No single corpus produced a strictly dominant set of embeddings across all tasks and so the optimal training corpus depends on intended use. CONCLUSION: Embeddings trained on published case reports performed comparably on most clinical tasks to embeddings trained on larger corpora. Open access corpora allow training of clinically relevant, effective, and reproducible embeddings.


Assuntos
Registros Eletrônicos de Saúde , Publicações , Humanos , Processamento de Linguagem Natural , PubMed , Reprodutibilidade dos Testes
8.
J Exp Child Psychol ; 221: 105450, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35596980

RESUMO

In a recent longitudinal study of U.S. adolescents, grit predicted rank-order increases in growth mindset and, to a lesser degree, growth mindset predicted rank-order increases in grit. The current investigation replicated and extended these findings in a younger non-Western, educated, industrialized, rich, and democratic (non-WEIRD) population. Two large samples totaling more than 5000 elementary school children in China completed self-report questionnaires assessing grit and growth mindset five times over 2 years. As in Park et al. (2020, Journal of Experimental Child Psychology, 198, 1048892020), we found reciprocal relations between grit and growth mindset. Grit systematically predicted rank-order increases in growth mindset at each subsequent 6-month interval. Growth mindset also predicted small rank-order increases in grit over the same period. These findings suggest that, over time, behavior may exert as much an influence on beliefs as the reverse-a dynamic possibly observable as early as in elementary school and not just in WEIRD cultures.


Assuntos
Instituições Acadêmicas , Adolescente , Criança , China , Humanos , Estudos Longitudinais
9.
J Pers ; 90(3): 405-425, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-34536229

RESUMO

OBJECTIVE: We explore the personality of counties as assessed through linguistic patterns on social media. Such studies were previously limited by the cost and feasibility of large-scale surveys; however, language-based computational models applied to large social media datasets now allow for large-scale personality assessment. METHOD: We applied a language-based assessment of the five factor model of personality to 6,064,267 U.S. Twitter users. We aggregated the Twitter-based personality scores to 2,041 counties and compared to political, economic, social, and health outcomes measured through surveys and by government agencies. RESULTS: There was significant personality variation across counties. Openness to experience was higher on the coasts, conscientiousness was uniformly spread, extraversion was higher in southern states, agreeableness was higher in western states, and emotional stability was highest in the south. Across 13 outcomes, language-based personality estimates replicated patterns that have been observed in individual-level and geographic studies. This includes higher Republican vote share in less agreeable counties and increased life satisfaction in more conscientious counties. CONCLUSIONS: Results suggest that regions vary in their personality and that these differences can be studied through computational linguistic analysis of social media. Furthermore, these methods may be used to explore other psychological constructs across geographies.


Assuntos
Mídias Sociais , Extroversão Psicológica , Humanos , Idioma , Personalidade , Determinação da Personalidade
10.
Proc Natl Acad Sci U S A ; 116(40): 19887-19893, 2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-31527280

RESUMO

The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.


Assuntos
Algoritmos , Árvores de Decisões , Aprendizado de Máquina , Bases de Dados Factuais , Modelos Estatísticos , Linguagens de Programação
11.
Am J Drug Alcohol Abuse ; 48(5): 573-585, 2022 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-35853250

RESUMO

Background: Early indicators of who will remain in - or leave - treatment for substance use disorder (SUD) can drive targeted interventions to support long-term recovery.Objectives: To conduct a comprehensive study of linguistic markers of SUD treatment outcomes, the current study integrated features produced by machine learning models known to have social-psychology relevance.Methods: We extracted and analyzed linguistic features from participants' Facebook posts (N = 206, 39.32% female; 55,415 postings) over the two years before they entered a SUD treatment program. Exploratory features produced by both Linguistic Inquiry and Word Count (LIWC) and Latent Dirichlet Allocation (LDA) topic modeling and the features from theoretical domains of religiosity, affect, and temporal orientation via established AI-based linguistic models were utilized.Results: Patients who stayed in the SUD treatment for over 90 days used more words associated with religion, positive emotions, family, affiliations, and the present, and used more first-person singular pronouns (Cohen's d values: [-0.39, -0.57]). Patients who discontinued their treatment before 90 days discussed more diverse topics, focused on the past, and used more articles (Cohen's d values: [0.44, 0.57]). All ps < .05 with Benjamini-Hochberg False Discovery Rate correction.Conclusions: We confirmed the literature on protective and risk social-psychological factors linking to SUD treatment in language analysis, showing that Facebook language before treatment entry could be used to identify the markers of SUD treatment outcomes. This reflects the importance of taking these linguistic features and markers into consideration when designing and recommending SUD treatment plans.


Assuntos
Mídias Sociais , Transtornos Relacionados ao Uso de Substâncias , Feminino , Humanos , Idioma , Linguística , Masculino , Transtornos Relacionados ao Uso de Substâncias/terapia
12.
Ann Surg ; 273(5): 900-908, 2021 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-33074901

RESUMO

OBJECTIVE: The aim of this study was to systematically assess the application and potential benefits of natural language processing (NLP) in surgical outcomes research. SUMMARY BACKGROUND DATA: Widespread implementation of electronic health records (EHRs) has generated a massive patient data source. Traditional methods of data capture, such as billing codes and/or manual review of free-text narratives in EHRs, are highly labor-intensive, costly, subjective, and potentially prone to bias. METHODS: A literature search of PubMed, MEDLINE, Web of Science, and Embase identified all articles published starting in 2000 that used NLP models to assess perioperative surgical outcomes. Evaluation metrics of NLP systems were assessed by means of pooled analysis and meta-analysis. Qualitative synthesis was carried out to assess the results and risk of bias on outcomes. RESULTS: The present study included 29 articles, with over half (n = 15) published after 2018. The most common outcome identified using NLP was postoperative complications (n = 14). Compared to traditional non-NLP models, NLP models identified postoperative complications with higher sensitivity [0.92 (0.87-0.95) vs 0.58 (0.33-0.79), P < 0.001]. The specificities were comparable at 0.99 (0.96-1.00) and 0.98 (0.95-0.99), respectively. Using summary of likelihood ratio matrices, traditional non-NLP models have clinical utility for confirming documentation of outcomes/diagnoses, whereas NLP models may be reliably utilized for both confirming and ruling out documentation of outcomes/diagnoses. CONCLUSIONS: NLP usage to extract a range of surgical outcomes, particularly postoperative complications, is accelerating across disciplines and areas of clinical outcomes research. NLP and traditional non-NLP approaches demonstrate similar performance measures, but NLP is superior in ruling out documentation of surgical outcomes.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Narração , Processamento de Linguagem Natural , Procedimentos Cirúrgicos Operatórios , Humanos
13.
Crit Care Med ; 49(8): 1312-1321, 2021 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-33711001

RESUMO

OBJECTIVES: The National Early Warning Score, Modified Early Warning Score, and quick Sepsis-related Organ Failure Assessment can predict clinical deterioration. These scores exhibit only moderate performance and are often evaluated using aggregated measures over time. A simulated prospective validation strategy that assesses multiple predictions per patient-day would provide the best pragmatic evaluation. We developed a deep recurrent neural network deterioration model and conducted a simulated prospective evaluation. DESIGN: Retrospective cohort study. SETTING: Four hospitals in Pennsylvania. PATIENTS: Inpatient adults discharged between July 1, 2017, and June 30, 2019. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: We trained a deep recurrent neural network and logistic regression model using data from electronic health records to predict hourly the 24-hour composite outcome of transfer to ICU or death. We analyzed 146,446 hospitalizations with 16.75 million patient-hours. The hourly event rate was 1.6% (12,842 transfers or deaths, corresponding to 260,295 patient-hours within the predictive horizon). On a hold-out dataset, the deep recurrent neural network achieved an area under the precision-recall curve of 0.042 (95% CI, 0.04-0.043), comparable with logistic regression model (0.043; 95% CI 0.041 to 0.045), and outperformed National Early Warning Score (0.034; 95% CI, 0.032-0.035), Modified Early Warning Score (0.028; 95% CI, 0.027- 0.03), and quick Sepsis-related Organ Failure Assessment (0.021; 95% CI, 0.021-0.022). For a fixed sensitivity of 50%, the deep recurrent neural network achieved a positive predictive value of 3.4% (95% CI, 3.4-3.5) and outperformed logistic regression model (3.1%; 95% CI 3.1-3.2), National Early Warning Score (2.0%; 95% CI, 2.0-2.0), Modified Early Warning Score (1.5%; 95% CI, 1.5-1.5), and quick Sepsis-related Organ Failure Assessment (1.5%; 95% CI, 1.5-1.5). CONCLUSIONS: Commonly used early warning scores for clinical decompensation, along with a logistic regression model and a deep recurrent neural network model, show very poor performance characteristics when assessed using a simulated prospective validation. None of these models may be suitable for real-time deployment.


Assuntos
Deterioração Clínica , Cuidados Críticos/normas , Aprendizado Profundo/normas , Escores de Disfunção Orgânica , Sepse/terapia , Adulto , Humanos , Masculino , Pessoa de Meia-Idade , Pennsylvania , Estudos Retrospectivos , Medição de Risco
14.
Proc Natl Acad Sci U S A ; 115(44): 11203-11208, 2018 10 30.
Artigo em Inglês | MEDLINE | ID: mdl-30322910

RESUMO

Depression, the most prevalent mental illness, is underdiagnosed and undertreated, highlighting the need to extend the scope of current screening methods. Here, we use language from Facebook posts of consenting individuals to predict depression recorded in electronic medical records. We accessed the history of Facebook statuses posted by 683 patients visiting a large urban academic emergency department, 114 of whom had a diagnosis of depression in their medical records. Using only the language preceding their first documentation of a diagnosis of depression, we could identify depressed patients with fair accuracy [area under the curve (AUC) = 0.69], approximately matching the accuracy of screening surveys benchmarked against medical records. Restricting Facebook data to only the 6 months immediately preceding the first documented diagnosis of depression yielded a higher prediction accuracy (AUC = 0.72) for those users who had sufficient Facebook data. Significant prediction of future depression status was possible as far as 3 months before its first documentation. We found that language predictors of depression include emotional (sadness), interpersonal (loneliness, hostility), and cognitive (preoccupation with the self, rumination) processes. Unobtrusive depression assessment through social media of consenting individuals may become feasible as a scalable complement to existing screening and monitoring procedures.


Assuntos
Depressão/psicologia , Transtorno Depressivo/psicologia , Registros Eletrônicos de Saúde/estatística & dados numéricos , Mídias Sociais/estatística & dados numéricos , Adulto , Feminino , Humanos , Idioma , Masculino , Inquéritos e Questionários
15.
J Med Internet Res ; 23(9): e22844, 2021 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-34477562

RESUMO

BACKGROUND: The assessment of behaviors related to mental health typically relies on self-report data. Networked sensors embedded in smartphones can measure some behaviors objectively and continuously, with no ongoing effort. OBJECTIVE: This study aims to evaluate whether changes in phone sensor-derived behavioral features were associated with subsequent changes in mental health symptoms. METHODS: This longitudinal cohort study examined continuously collected phone sensor data and symptom severity data, collected every 3 weeks, over 16 weeks. The participants were recruited through national research registries. Primary outcomes included depression (8-item Patient Health Questionnaire), generalized anxiety (Generalized Anxiety Disorder 7-item scale), and social anxiety (Social Phobia Inventory) severity. Participants were adults who owned Android smartphones. Participants clustered into 4 groups: multiple comorbidities, depression and generalized anxiety, depression and social anxiety, and minimal symptoms. RESULTS: A total of 282 participants were aged 19-69 years (mean 38.9, SD 11.9 years), and the majority were female (223/282, 79.1%) and White participants (226/282, 80.1%). Among the multiple comorbidities group, depression changes were preceded by changes in GPS features (Time: r=-0.23, P=.02; Locations: r=-0.36, P<.001), exercise duration (r=0.39; P=.03) and use of active apps (r=-0.31; P<.001). Among the depression and anxiety groups, changes in depression were preceded by changes in GPS features for Locations (r=-0.20; P=.03) and Transitions (r=-0.21; P=.03). Depression changes were not related to subsequent sensor-derived features. The minimal symptoms group showed no significant relationships. There were no associations between sensor-based features and anxiety and minimal associations between sensor-based features and social anxiety. CONCLUSIONS: Changes in sensor-derived behavioral features are associated with subsequent depression changes, but not vice versa, suggesting a directional relationship in which changes in sensed behaviors are associated with subsequent changes in symptoms.


Assuntos
Depressão , Smartphone , Adulto , Ansiedade/diagnóstico , Ansiedade/epidemiologia , Transtornos de Ansiedade , Depressão/diagnóstico , Depressão/epidemiologia , Feminino , Humanos , Estudos Longitudinais , Masculino
16.
J Pers ; 88(2): 287-306, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31107975

RESUMO

OBJECTIVE: Social media is increasingly being used to study psychological constructs. This study is the first to use Twitter language to investigate the 24 Values in Action Inventory of Character Strengths, which have been shown to predict important life domains such as well-being. METHOD: We use both a top-down closed-vocabulary (Linguistic Inquiry and Word Count) and a data-driven open-vocabulary (Differential Language Analysis) approach to analyze 3,937,768 tweets from 4,423 participants (64.3% female), who answered a 240-item survey on character strengths. RESULTS: We present the language profiles of (a) a global positivity factor accounting for 36% of the variances in the strengths, and (b) each of the 24 individual strengths, for which we find largely face-valid language associations. Machine learning models trained on language data to predict character strengths reach out-of-sample prediction accuracies comparable to previous work on personality (rmedian = 0.28, ranging from 0.13 to 0.51). CONCLUSIONS: The findings suggest that Twitter can be used to characterize and predict character strengths. This technique could be used to measure the character strengths of large populations unobtrusively and cost-effectively.


Assuntos
Caráter , Princípios Morais , Determinação da Personalidade , Psicolinguística , Mídias Sociais , Valores Sociais , Adolescente , Adulto , Idoso , Big Data , Feminino , Humanos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Psicolinguística/métodos , Adulto Jovem
17.
J Biomed Inform ; 89: 114-121, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30557683

RESUMO

Sentiment analysis may offer insights into patient outcomes through the subjective expressions made by clinicians in the text of encounter notes. We analyzed the predictive, concurrent, convergent, and content validity of six sentiment methods in a sample of 793,725 multidisciplinary clinical notes among 41,283 hospitalizations associated with an intensive care unit stay. None of these approaches improved early prediction of in-hospital mortality using logistic regression models, but did improve both discrimination and calibration when using random forests. Additionally, positive sentiment measured by the CoreNLP (OR 0.04, 95% CI 0.002-0.55), Pattern (OR 0.09, 95% CI 0.04-0.17), sentimentr (OR 0.37, 95% CI 0.25-0.63), and Opinion (OR 0.25, 95% CI 0.07-0.89) methods were inversely associated with death on the concurrent day after adjustment for demographic characteristics and illness severity. Median daily lexical coverage ranged from 5.4% to 20.1%. While sentiment between all methods was positively correlated, their agreement was weak. Sentiment analysis holds promise for clinical applications but will require a novel domain-specific method applicable to clinical text.


Assuntos
Estado Terminal , Prontuários Médicos , Atitude , Mortalidade Hospitalar , Humanos , Unidades de Terapia Intensiva , Idioma
19.
Crit Care Med ; 46(7): 1125-1132, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29629986

RESUMO

OBJECTIVES: Early prediction of undesired outcomes among newly hospitalized patients could improve patient triage and prompt conversations about patients' goals of care. We evaluated the performance of logistic regression, gradient boosting machine, random forest, and elastic net regression models, with and without unstructured clinical text data, to predict a binary composite outcome of in-hospital death or ICU length of stay greater than or equal to 7 days using data from the first 48 hours of hospitalization. DESIGN: Retrospective cohort study with split sampling for model training and testing. SETTING: A single urban academic hospital. PATIENTS: All hospitalized patients who required ICU care at the Beth Israel Deaconess Medical Center in Boston, MA, from 2001 to 2012. INTERVENTIONS: None. MEASUREMENTS AND MAIN RESULTS: Among eligible 25,947 hospital admissions, we observed 5,504 (21.2%) in which patients died or had ICU length of stay greater than or equal to 7 days. The gradient boosting machine model had the highest discrimination without (area under the receiver operating characteristic curve, 0.83; 95% CI, 0.81-0.84) and with (area under the receiver operating characteristic curve, 0.89; 95% CI, 0.88-0.90) text-derived variables. Both gradient boosting machines and random forests outperformed logistic regression without text data (p < 0.001), whereas all models outperformed logistic regression with text data (p < 0.02). The inclusion of text data increased the discrimination of all four model types (p < 0.001). Among those models using text data, the increasing presence of terms "intubated" and "poor prognosis" were positively associated with mortality and ICU length of stay, whereas the term "extubated" was inversely associated with them. CONCLUSIONS: Variables extracted from unstructured clinical text from the first 48 hours of hospital admission using natural language processing techniques significantly improved the abilities of logistic regression and other machine learning models to predict which patients died or had long ICU stays. Learning health systems may adapt such models using open-source approaches to capture local variation in care patterns.


Assuntos
Técnicas de Apoio para a Decisão , Mortalidade Hospitalar , Unidades de Terapia Intensiva , Tempo de Internação/estatística & dados numéricos , Processamento de Linguagem Natural , Idoso , Feminino , Humanos , Unidades de Terapia Intensiva/estatística & dados numéricos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Planejamento de Assistência ao Paciente/estatística & dados numéricos , Estudos Retrospectivos
20.
Nucleic Acids Res ; 44(D1): D216-22, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26553799

RESUMO

Small non-coding RNAs (sncRNAs) are highly abundant RNAs, typically <100 nucleotides long, that act as key regulators of diverse cellular processes. Although thousands of sncRNA genes are known to exist in the human genome, no single database provides searchable, unified annotation, and expression information for full sncRNA transcripts and mature RNA products derived from these larger RNAs. Here, we present the Database of small human noncoding RNAs (DASHR). DASHR contains the most comprehensive information to date on human sncRNA genes and mature sncRNA products. DASHR provides a simple user interface for researchers to view sequence and secondary structure, compare expression levels, and evidence of specific processing across all sncRNA genes and mature sncRNA products in various human tissues. DASHR annotation and expression data covers all major classes of sncRNAs including microRNAs (miRNAs), Piwi-interacting (piRNAs), small nuclear, nucleolar, cytoplasmic (sn-, sno-, scRNAs, respectively), transfer (tRNAs), and ribosomal RNAs (rRNAs). Currently, DASHR (v1.0) integrates 187 smRNA high-throughput sequencing (smRNA-seq) datasets with over 2.5 billion reads and annotation data from multiple public sources. DASHR contains annotations for ∼ 48,000 human sncRNA genes and mature sncRNA products, 82% of which are expressed in one or more of the curated tissues. DASHR is available at http://lisanwanglab.org/DASHR.


Assuntos
Bases de Dados de Ácidos Nucleicos , Pequeno RNA não Traduzido/metabolismo , Humanos , Anotação de Sequência Molecular , Processamento Pós-Transcricional do RNA , Pequeno RNA não Traduzido/química , Pequeno RNA não Traduzido/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA