Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Stat Med ; 43(8): 1564-1576, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38332307

RESUMO

Point process data have become increasingly popular these days. For example, many of the data captured in electronic health records (EHR) are in the format of point process data. It is of great interest to study the association between a point process predictor and a scalar response using generalized functional linear regression models. Various generalized functional linear regression models have been developed under different settings in the past decades. However, existing methods can only deal with functional or longitudinal predictors, not point process predictors. In this article, we propose a novel generalized functional linear regression model for a point process predictor. Our proposed model is based on the joint modeling framework, where we adopt a log-Gaussian Cox process model for the point process predictor and a generalized linear regression model for the outcome. We also develop a new algorithm for fast model estimation based on the Gaussian variational approximation method. We conduct extensive simulation studies to evaluate the performance of our proposed method and compare it to competing methods. The performance of our proposed method is further demonstrated on an EHR dataset of patients admitted into the intensive care units of the Beth Israel Deaconess Medical Center between 2001 and 2008.


Assuntos
Algoritmos , Humanos , Modelos Lineares , Simulação por Computador , Modelos de Riscos Proporcionais
2.
Stat Med ; 42(3): 316-330, 2023 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-36443903

RESUMO

The shared random effects joint model is one of the most widely used approaches to study the associations between longitudinal biomarkers and a survival outcome and make dynamic risk predictions using the longitudinally measured biomarkers. Various types of joint models have been developed under different settings in the past decades. One major limitation of joint models is that they could be computationally expensive for complex models where the number of the shared random effects is large. Moreover, the inferential accuracy of joint models could also be diminished for complex models due to approximation errors. However, complex models are frequently needed in practice, for example, when the longitudinal biomarkers have nonlinear trajectories over time or the number of longitudinal biomarkers of interest is large. In this article, we propose a novel Gaussian variational approximate inference approach for fitting joint models, which significantly improves computational efficiency while maintaining inferential accuracy. We conduct extensive simulation studies to evaluate the performance of our proposed method and compare it to existing methods. The performance of our proposed method is further demonstrated on a dataset of patients with primary biliary cirrhosis.


Assuntos
Modelos Estatísticos , Humanos , Simulação por Computador , Biomarcadores , Estudos Longitudinais
3.
J Lipid Res ; 61(3): 445-454, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31892526

RESUMO

Whether HDL is associated with dementia risk is unclear. In addition to apoA1, other apolipoproteins are found in HDL, creating subspecies of HDL that may have distinct metabolic properties. We measured apoA1, apoC3, and apoJ levels in plasma and apoA1 levels in HDL that contains or lacks apoE, apoJ, or apoC3 using a modified sandwich ELISA in a case-cohort study nested within the Ginkgo Evaluation of Memory Study. We included 995 randomly selected participants and 521 participants who developed dementia during a mean of 5.1 years of follow-up. The level of total apoA1 was not significantly related to dementia risk, regardless of the coexistence of apoC3, apoJ, or apoE. Higher levels of total plasma apoC3 were associated with better cognitive function at baseline (difference in Modified Mini-Mental State Examination scores tertile 3 vs. tertile 1: 0.60; 95% CI: 0.23, 0.98) and a lower dementia risk (adjusted hazard ratio tertile 3 vs. tertile 1: 0.73; 95% CI: 0.55, 0.96). Plasma concentrations of apoA1 in HDL and its apolipoprotein-defined subspecies were not associated with cognitive function at baseline or with the risk of dementia during follow-up. Similar studies in other populations are required to better understand the association between apoC3 and Alzheimer's disease pathology.


Assuntos
Apolipoproteínas/sangue , Demência/sangue , Demência/diagnóstico , Lipoproteínas HDL/sangue , Idoso , Idoso de 80 Anos ou mais , Cognição , Método Duplo-Cego , Feminino , Seguimentos , Humanos , Masculino , Fatores de Risco
4.
Stat Appl Genet Mol Biol ; 18(2)2019 02 13.
Artigo em Inglês | MEDLINE | ID: mdl-30759070

RESUMO

Longitudinal genomics data and survival outcome are common in biomedical studies, where the genomics data are often of high dimension. It is of great interest to select informative longitudinal biomarkers (e.g. genes) related to the survival outcome. In this paper, we develop a computationally efficient tool, LCox, for selecting informative biomarkers related to the survival outcome using the longitudinal genomics data. LCox is powerful to detect different forms of dependence between the longitudinal biomarkers and the survival outcome. We show that LCox has improved performance compared to existing methods through extensive simulation studies. In addition, by applying LCox to a dataset of patients with idiopathic pulmonary fibrosis, we are able to identify biologically meaningful genes while all other methods fail to make any discovery. An R package to perform LCox is freely available at https://CRAN.R-project.org/package=LCox.


Assuntos
Biologia Computacional/estatística & dados numéricos , Genômica/estatística & dados numéricos , Software , Análise de Sobrevida , Algoritmos , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos
5.
Biometrics ; 75(1): 69-77, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30178494

RESUMO

Although many modeling approaches have been developed to jointly analyze longitudinal biomarkers and a time-to-event outcome, most of these methods can only handle one or a few biomarkers. In this article, we propose a novel joint latent class model to deal with high dimensional longitudinal biomarkers. Our model has three components: a class membership model, a survival submodel, and a longitudinal submodel. In our model, we assume that covariates can potentially affect biomarkers and class membership. We adopt a penalized likelihood approach to infer which covariates have random effects and/or fixed effects on biomarkers, and which covariates are informative for the latent classes. Through extensive simulation studies, we show that our proposed method has improved performance in prediction and assigning subjects to the correct classes over other joint modeling methods and that bootstrap can be used to do inference for our model. We then apply our method to a dataset of patients with idiopathic pulmonary fibrosis, for whom gene expression profiles were measured longitudinally. We are able to identify four interesting latent classes with one class being at much higher risk of death compared to the other classes. We also find that each of the latent classes has unique trajectories in some genes, yielding novel biological insights.


Assuntos
Análise de Classes Latentes , Funções Verossimilhança , Estudos Longitudinais , Biomarcadores/análise , Simulação por Computador , Perfilação da Expressão Gênica , Humanos , Fibrose Pulmonar Idiopática/tratamento farmacológico , Fibrose Pulmonar Idiopática/genética , Análise de Sobrevida , Fatores de Tempo , Resultado do Tratamento
6.
Stat Appl Genet Mol Biol ; 17(1)2018 02 05.
Artigo em Inglês | MEDLINE | ID: mdl-29397393

RESUMO

Longitudinal gene expression profiles of subjects are collected in some clinical studies to monitor disease progression and understand disease etiology. The identification of gene sets that have coordinated changes with relevant clinical outcomes over time from these data could provide significant insights into the molecular basis of disease progression and lead to better treatments. In this article, we propose a Distance-Correlation based Gene Set Analysis (dcGSA) method for longitudinal gene expression data. dcGSA is a non-parametric approach, statistically robust, and can capture both linear and nonlinear relationships between gene sets and clinical outcomes. In addition, dcGSA is able to identify related gene sets in cases where the effects of gene sets on clinical outcomes differ across subjects due to the subject heterogeneity, remove the confounding effects of some unobserved time-invariant covariates, and allow the assessment of associations between gene sets and multiple related outcomes simultaneously. Through extensive simulation studies, we demonstrate that dcGSA is more powerful of detecting relevant genes than other commonly used gene set analysis methods. When dcGSA is applied to a real dataset on systemic lupus erythematosus, we are able to identify more disease related gene sets than other methods.


Assuntos
Perfilação da Expressão Gênica/estatística & dados numéricos , Estudos Longitudinais , Lúpus Eritematoso Sistêmico/genética , Interpretação Estatística de Dados , Humanos
7.
Stat Appl Genet Mol Biol ; 16(2): 145-158, 2017 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-28343169

RESUMO

Disease subtype identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to infer disease subtypes, which often lead to biologically meaningful insights into disease. Despite many successes, existing clustering methods may not perform well when genes are highly correlated and many uninformative genes are included for clustering due to the high dimensionality. In this article, we introduce a novel subtype identification method in the Bayesian setting based on gene expression profiles. This method, called BCSub, adopts an innovative semiparametric Bayesian factor analysis model to reduce the dimension of the data to a few factor scores for clustering. Specifically, the factor scores are assumed to follow the Dirichlet process mixture model in order to induce clustering. Through extensive simulation studies, we show that BCSub has improved performance over commonly used clustering methods. When applied to two gene expression datasets, our model is able to identify subtypes that are clinically more relevant than those identified from the existing methods.


Assuntos
Perfilação da Expressão Gênica/métodos , Modelos Genéticos , Algoritmos , Teorema de Bayes , Análise por Conglomerados , Simulação por Computador , Análise Fatorial , Humanos , Análise de Sequência com Séries de Oligonucleotídeos
8.
Stat Med ; 36(22): 3495-3506, 2017 Sep 30.
Artigo em Inglês | MEDLINE | ID: mdl-28620908

RESUMO

Subgroup identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to define subgroups. Longitudinal gene expression profiles might provide additional information on disease progression than what is captured by baseline profiles alone. Therefore, subgroup identification could be more accurate and effective with the aid of longitudinal gene expression data. However, existing statistical methods are unable to fully utilize these data for patient clustering. In this article, we introduce a novel clustering method in the Bayesian setting based on longitudinal gene expression profiles. This method, called BClustLonG, adopts a linear mixed-effects framework to model the trajectory of genes over time, while clustering is jointly conducted based on the regression coefficients obtained from all genes. In order to account for the correlations among genes and alleviate the high dimensionality challenges, we adopt a factor analysis model for the regression coefficients. The Dirichlet process prior distribution is utilized for the means of the regression coefficients to induce clustering. Through extensive simulation studies, we show that BClustLonG has improved performance over other clustering methods. When applied to a dataset of severely injured (burn or trauma) patients, our model is able to identify interesting subgroups. Copyright © 2017 John Wiley & Sons, Ltd.


Assuntos
Teorema de Bayes , Análise por Conglomerados , Análise Fatorial , Perfilação da Expressão Gênica/métodos , Modelos Genéticos , Análise de Regressão , Queimaduras , Simulação por Computador , Expressão Gênica , Humanos , Cadeias de Markov , Método de Monte Carlo , Estatísticas não Paramétricas
9.
BMC Bioinformatics ; 16: 48, 2015 Feb 18.
Artigo em Inglês | MEDLINE | ID: mdl-25886892

RESUMO

BACKGROUND: Although Linear Discriminant Analysis (LDA) is commonly used for classification, it may not be directly applied in genomics studies due to the large p, small n problem in these studies. Different versions of sparse LDA have been proposed to address this significant challenge. One implicit assumption of various LDA-based methods is that the covariance matrices are the same across different classes. However, rewiring of genetic networks (therefore different covariance matrices) across different diseases has been observed in many genomics studies, which suggests that LDA and its variations may be suboptimal for disease classifications. However, it is not clear whether considering differing genetic networks across diseases can improve classification in genomics studies. RESULTS: We propose a sparse version of Quadratic Discriminant Analysis (SQDA) to explicitly consider the differences of the genetic networks across diseases. Both simulation and real data analysis are performed to compare the performance of SQDA with six commonly used classification methods. CONCLUSIONS: SQDA provides more accurate classification results than other methods for both simulated and real data. Our method should prove useful for classification in genomics studies and other research settings, where covariances differ among classes.


Assuntos
Algoritmos , Análise Discriminante , Neoplasias/classificação , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Estudos de Casos e Controles , Simulação por Computador , Perfilação da Expressão Gênica , Humanos , Reconhecimento Automatizado de Padrão
10.
Artigo em Inglês | MEDLINE | ID: mdl-38969926

RESUMO

BACKGROUND: Arsenic, cadmium, and lead are toxic elements that widely contaminate our environment. These toxicants are associated with acute and chronic health problems, and evidence suggests that minority communities, including Hispanic/Latino Americans, are disproportionately exposed. Few studies have assessed culturally specific predictors of exposure to understand the potential drivers of racial/ethnic exposure disparities. OBJECTIVE: We sought to evaluate acculturation measures as predictors of metal/metalloid (hereafter "metal") concentrations among Mexican American adults to illuminate potential exposure sources that may be targeted for interventions. METHODS: As part of a longitudinal cohort, 510 adults, aged 35 to 69 years, underwent baseline interview, physical examination, and urine sample collection. Self-reported acculturation was assessed across various domains using the Short Acculturation Scale for Hispanics (SASH). Multivariable linear regression was used to assess associations between acculturation and urinary concentrations of arsenic, cadmium, and lead. Ordinal logistic regression was utilized to assess associations between acculturation and a metal mixture score. Lastly, best subset selection was used to build a prediction model for each toxic metal with a combination of the acculturation predictors. RESULTS: After adjustment, immigration factors were positively associated with arsenic and lead concentrations. For lead alone, English language and American media and food preferences were associated with lower levels. Immigration and parental heritage from Mexico were positively associated with the metal mixture, while preferences for English language, media, and food were negatively associated. CONCLUSION: Acculturation-related predictors of exposure provide information about potential sources of toxic metals, including international travel, foods, and consumer products. The findings in this research study provide information to empower future efforts to identify and address specific acculturation-associated toxicant exposures in order to promote health equity through clinical guidance, patient education, and public policy.

11.
Womens Health (Lond) ; 19: 17455057231184325, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37431843

RESUMO

BACKGROUND: Adverse childhood experiences during key developmental periods have been shown to impact long-term health outcomes. Adverse childhood experiences may include psychological, physical, or sexual abuse; neglect; or socioeconomic factors. Adverse childhood experiences are linked with an increase in poor health behavior such as smoking and alcohol consumption, and may also influence epigenetic changes, inflammatory response, metabolic changes, and allostatic load. OBJECTIVE: We sought to explore associations between adverse childhood experiences and allostatic load in adult female participants in the UK Biobank. DESIGN: The UK Biobank is a multisite cohort study established to capture lifestyle, environment, exposure, health history, and genotype data on individuals in the United Kingdom. METHODS: Adverse childhood experiences were assessed from the Childhood Trauma Screener, which measures abuse and neglect across five items. Biological measures at enrollment were used to construct allostatic load, including measures of metabolic, inflammatory, and cardiovascular function. Females with a cancer diagnosis prior to enrollment were removed as it may influence allostatic load. Poisson regression models were used to assess the association between adverse childhood experiences and allostatic load, accounting for a priori confounders. RESULTS: A total of 33,466 females with complete data were analyzed, with a median age at enrollment of 54 (range = 40-70) years. Among the study sample, the mean allostatic load ranged from 1.85 in those who reported no adverse childhood experiences to 2.45 in those with all adverse childhood experiences reported. In multivariable analysis, there was a 4% increase in average allostatic load among females for every additional adverse childhood experience reported (incidence rate ratio = 1.04, 95% confidence interval = 1.03-1.05). Similar results were observed when assessing individual adverse childhood experience components. CONCLUSION: This analysis supports a growing body of evidence suggesting that increased exposure to early life abuse or neglect is associated with increased allostatic load in females.


Assuntos
Experiências Adversas da Infância , Alostase , Adulto , Humanos , Criança , Feminino , Pessoa de Meia-Idade , Idoso , Bancos de Espécimes Biológicos , Estudos de Coortes , Reino Unido
12.
Psychiatry Res ; 323: 115175, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37003169

RESUMO

Growing evidence has shown that applying machine learning models to large clinical data sources may exceed clinician performance in suicide risk stratification. However, many existing prediction models either suffer from "temporal bias" (a bias that stems from using case-control sampling) or require training on all available patient visit data. Here, we adopt a "landmark model" framework that aligns with clinical practice for prediction of suicide-related behaviors (SRBs) using a large electronic health record database. Using the landmark approach, we developed models for SRB prediction (regularized Cox regression and random survival forest) that establish a time-point (e.g., clinical visit) from which predictions are made over user-specified prediction windows using historical information up to that point. We applied this approach to cohorts from three clinical settings: general outpatient, psychiatric emergency department, and psychiatric inpatients, for varying prediction windows and lengths of historical data. Models achieved high discriminative performance (area under the Receiver Operating Characteristic curve 0.74-0.93 for the Cox model) across different prediction windows and settings, even with relatively short periods of historical data. In short, we developed accurate, dynamic SRB risk prediction models with the landmark approach that reduce bias and enhance the reliability and portability of suicide risk prediction models.


Assuntos
Serviço Hospitalar de Emergência , Tentativa de Suicídio , Humanos , Tentativa de Suicídio/psicologia , Reprodutibilidade dos Testes , Curva ROC
13.
Neuroimage ; 61(4): 987-99, 2012 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-22440644

RESUMO

There has been increasing interest in how the human brain responds to natural stimulus such as video watching in the neuroimaging field. Along this direction, this paper presents our effort in inferring consistent and reproducible functional interaction patterns under natural stimulus of video watching among known functional brain regions identified by task-based fMRI. Then, we applied and compared four statistical approaches, including Bayesian network modeling with searching algorithms: greedy equivalence search (GES), Peter and Clark (PC) analysis, independent multiple greedy equivalence search (IMaGES), and the commonly used Granger causality analysis (GCA), to infer consistent and reproducible functional interaction patterns among these brain regions. It is interesting that a number of reliable and consistent functional interaction patterns were identified by the GES, PC and IMaGES algorithms in different participating subjects when they watched multiple video shots of the same semantic category. These interaction patterns are meaningful given current neuroscience knowledge and are reasonably reproducible across different brains and video shots. In particular, these consistent functional interaction patterns are supported by structural connections derived from diffusion tensor imaging (DTI) data, suggesting the structural underpinnings of consistent functional interactions. Our work demonstrates that specific consistent patterns of functional interactions among relevant brain regions might reflect the brain's fundamental mechanisms of online processing and comprehension of video messages.


Assuntos
Algoritmos , Mapeamento Encefálico/métodos , Encéfalo/fisiologia , Interpretação de Imagem Assistida por Computador/métodos , Modelos Neurológicos , Vias Neurais/fisiologia , Humanos , Imageamento por Ressonância Magnética , Estimulação Luminosa
14.
Psychiatry Int (Basel) ; 3(1): 52-64, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36381676

RESUMO

Neuroticism and premenstrual conditions share pleiotropic loci and are strongly associated. It is presently not known which DSM-5 symptoms of premenstrual syndrome/premenstrual mood disorder are associated with neuroticism. We enrolled 45 study participants to provide prospective daily ratings of affective ("depression", "anxiety, "anger", "mood swings") and psychological ("low interest", "feeling overwhelmed", and "difficulty concentrating") symptoms across two-three menstrual cycles (128 total cycles). Generalized additive modeling (gam function in R) was implemented to model the relationships between neuroticism and the premenstrual increase in symptomatology. Significance level was adjusted using the False Discovery Rate method and models were adjusted for current age and age of menarche. Results of the association analysis revealed that "low interest" (p ≤ 0.05) and "difficulty concentrating" (p ≤ 0.001) were significantly associated with neuroticism. None of the remaining symptoms reached statistical significance. The late luteal phase of the menstrual cycle is characterized by complex symptomatology, reflecting a physiological milieu of numerous biological processes. By identifying co-expression between neuroticism and specific premenstrual symptomatology, the present study improves our understanding of the premenstrual conditions and provides a platform for individualized treatment developments.

15.
Brain Sci ; 12(7)2022 Jun 22.
Artigo em Inglês | MEDLINE | ID: mdl-35884622

RESUMO

OBJECTIVE: Sleep and eating behaviors are disturbed during the premenstrual phase of the menstrual cycle in a significant number of reproductive-age women. Despite their impact on the development and control of chronic health conditions, these behaviors are poorly understood. In the present study, we sought to identify affective and psychological factors which associate with premenstrual changes in sleeping and eating behaviors and assess how they impact functionality. METHODS: Fifty-seven women provided daily ratings of premenstrual symptomatology and functionality across two-three menstrual cycles (156 cycles total). For each participant and symptom, we subtracted the mean day +5 to +10 ("post-menstruum") ratings from mean day -6 to -1 ("pre-menstruum") ratings and divided this value by participant- and symptom-specific variance. We completed the statistical analysis using multivariate linear regression. RESULTS: Low interest was associated with a premenstrual increase in insomnia (p ≤ 0.05) and appetite/eating (p ≤ 0.05). Furthermore, insomnia was associated with occupational (p ≤ 0.001), recreational (p ≤ 0.001), and relational (p ≤ 0.01) impairment. CONCLUSIONS: Results of the present analysis highlight the importance of apathy (i.e., low interest) on the expression of behavioral symptomatology, as well as premenstrual insomnia on impairment. These findings can inform treatment approaches, thereby improving care for patients suffering from premenstrual symptomatology linked to chronic disease conditions.

16.
Front Psychiatry ; 13: 784316, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35573360

RESUMO

Visceral adiposity is a significant marker of all-cause mortality. Reproductive age women are at a considerable risk for developing visceral adiposity; however, the associated factors are poorly understood. The proposed study evaluated whether food craving experienced during the premenstrual period is associated with waist circumference. Forty-six women (mean BMI = 24.36) prospectively provided daily ratings of food craving across two-three menstrual cycles (122 cycles total). Their premenstrual rating of food craving was contrasted against food craving in the follicular phase to derive a corrected summary score of the premenstrual food craving increase. Study groups were divided into normal (n = 26) and obese (n = 20) based on the 80 cm waist circumference cutoff signifying an increase in risk. Waist circumference category was significantly associated with premenstrual food cravings [F (1,44) = 5.12, p = 0.028]. Post hoc comparisons using the Tukey HSD test (95% family-wise confidence level) showed that the mean score for the food craving effect size was 0.35 higher for the abdominally obese vs. normal study groups (95% CI: 0.039 to 0.67). The result was statistically significant even following inclusion of BMI in the model, pointing to a particularly dangerous process of central fat accumulation. The present study establishes an association between temporal vulnerability to an increased food-related behavior and a marker of metabolic abnormality risk (i.e., waist circumference), thereby forming a basis for integrating the premenstruum as a viable intervention target for this at-risk sex and age group.

17.
Metabolites ; 12(10)2022 Oct 04.
Artigo em Inglês | MEDLINE | ID: mdl-36295844

RESUMO

The regulation of DHEA-sulfate by steroid sulfotransferase (SULT) and steryl-sulfatase (STS) enzymes is a vital process for the downstream formation of many steroid hormones. DHEA-sulfate is the most abundant steroid hormone in the human body; thus, DHEA-sulfate and its hydrolyzed form, DHEA, continue to be evaluated in numerous studies, given their importance to human health. Yet, a basic question of relevance to the reproductive-age female population-whether the two steroid hormones vary across the menstrual cycle-has not been addressed. We applied a validated, multi-step protocol, involving realignment and imputation of study data to early follicular, mid-late follicular, periovulatory, and early, mid-, and late luteal subphases of the menstrual cycle, and analyzed DHEA-sulfate and DHEA serum concentrations using ultraperformance liquid chromatography tandem mass spectrometry. DHEA-sulfate levels started to decrease in the early luteal, significantly dropped in the mid-luteal, and returned to basal levels by the late luteal subphase. DHEA, however, did not vary across the menstrual cycle. The present study deep-mapped trajectories of DHEA and DHEA-sulfate across the entire menstrual cycle, demonstrating a significant decrease in DHEA-sulfate in the mid-luteal subphase. These findings are relevant to the active area of research examining associations between DHEA-sulfate levels and various disease states.

18.
Int J Med Inform ; 162: 104753, 2022 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-35405530

RESUMO

OBJECTIVE: The use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce a semi-supervised method for binary acronym disambiguation, the task of classifying a target sense for acronyms in the clinical EHR notes. METHODS: We developed a semi-supervised ensemble machine learning (CASEml) algorithm to automatically identify when an acronym means a target sense by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard semi-supervised method and a baseline metric selecting the most frequent acronym sense. Along with evaluating the performance of these methods for specific instances of acronyms, we evaluated the impact of acronym disambiguation on NLP-driven phenotyping of rheumatoid arthritis. RESULTS: CASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art semi-supervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis. CONCLUSION: CASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and semi-supervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping.

19.
Ann Epidemiol ; 56: 47-54.e5, 2021 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-33181262

RESUMO

PURPOSE: To describe coronavirus disease 2019 (COVID-19) mortality in Chicago during the spring of 2020 and identify at the census-tract level neighborhood characteristics that were associated with higher COVID-19 mortality rates. METHODS: Using Poisson regression and regularized linear regression (elastic net), we evaluated the association between neighborhood characteristics and COVID-19 mortality rates in Chicago through July 22 (2514 deaths across 795 populated census tracts). RESULTS: Black residents (31% of the population) accounted for 42% of COVID-19 deaths. Deaths among Hispanic/Latino residents occurred at a younger age (63 years, compared with 71 for white residents). Regarding residential setting, 52% of deaths among white residents occurred inside nursing homes, compared with 35% of deaths among black residents and 17% among Hispanic/Latino residents. Higher COVID-19 mortality was seen in neighborhoods with heightened barriers to social distancing and low health insurance coverage. Neighborhoods with a higher percentage of white and Asian residents had lower COVID-19 mortality. The associations differed by race, suggesting that neighborhood context may be most tightly linked to COVID-19 mortality among white residents. CONCLUSIONS: We describe communities that may benefit from supportive services and identify traits of communities that may benefit from targeted campaigns for prevention and testing to prevent future deaths from COVID-19.


Assuntos
COVID-19/mortalidade , Características de Residência , Idoso , Idoso de 80 Anos ou mais , Chicago/epidemiologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
20.
EBioMedicine ; 69: 103439, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-34157486

RESUMO

BACKGROUND: COVID-19 has been associated with Interstitial Lung Disease features. The immune transcriptomic overlap between Idiopathic Pulmonary Fibrosis (IPF) and COVID-19 has not been investigated. METHODS: we analyzed blood transcript levels of 50 genes known to predict IPF mortality in three COVID-19 and two IPF cohorts. The Scoring Algorithm of Molecular Subphenotypes (SAMS) was applied to distinguish high versus low-risk profiles in all cohorts. SAMS cutoffs derived from the COVID-19 Discovery cohort were used to predict intensive care unit (ICU) status, need for mechanical ventilation, and in-hospital mortality in the COVID-19 Validation cohort. A COVID-19 Single-cell RNA-sequencing cohort was used to identify the cellular sources of the 50-gene risk profiles. The same COVID-19 SAMS cutoffs were used to predict mortality in the IPF cohorts. FINDINGS: 50-gene risk profiles discriminated severe from mild COVID-19 in the Discovery cohort (P = 0·015) and predicted ICU admission, need for mechanical ventilation, and in-hospital mortality (AUC: 0·77, 0·75, and 0·74, respectively, P < 0·001) in the COVID-19 Validation cohort. In COVID-19, 50-gene expressing cells with a high-risk profile included monocytes, dendritic cells, and neutrophils, while low-risk profile-expressing cells included CD4+, CD8+ T lymphocytes, IgG producing plasmablasts, B cells, NK, and gamma/delta T cells. Same COVID-19 SAMS cutoffs were also predictive of mortality in the University of Chicago (HR:5·26, 95%CI:1·81-15·27, P = 0·0013) and Imperial College of London (HR:4·31, 95%CI:1·81-10·23, P = 0·0016) IPF cohorts. INTERPRETATION: 50-gene risk profiles in peripheral blood predict COVID-19 and IPF outcomes. The cellular sources of these gene expression changes suggest common innate and adaptive immune responses in both diseases. FUNDING: This work was supported in part by National Institute for Health Research Clinician Scientist Fellowship NIHR: CS-2013-13-017 (TMM); Action for Pulmonary Fibrosis Mike Bray fellowship (PLM); The National Heart, Lung, and Blood Institute (NHLBI) through award K01-HL-130704 (AJ); The University of South Florida (USF) Academic Support Fund and the USF Foundation, Ubben Fibrosis Fund (JHM).


Assuntos
COVID-19/genética , Transcriptoma , Adulto , Idoso , Biomarcadores/sangue , COVID-19/sangue , COVID-19/mortalidade , Feminino , Mortalidade Hospitalar , Humanos , Masculino , Pessoa de Meia-Idade , Análise de Sobrevida
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa