Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
1.
Stat Med ; 43(8): 1564-1576, 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38332307

RESUMEN

Point process data have become increasingly popular these days. For example, many of the data captured in electronic health records (EHR) are in the format of point process data. It is of great interest to study the association between a point process predictor and a scalar response using generalized functional linear regression models. Various generalized functional linear regression models have been developed under different settings in the past decades. However, existing methods can only deal with functional or longitudinal predictors, not point process predictors. In this article, we propose a novel generalized functional linear regression model for a point process predictor. Our proposed model is based on the joint modeling framework, where we adopt a log-Gaussian Cox process model for the point process predictor and a generalized linear regression model for the outcome. We also develop a new algorithm for fast model estimation based on the Gaussian variational approximation method. We conduct extensive simulation studies to evaluate the performance of our proposed method and compare it to competing methods. The performance of our proposed method is further demonstrated on an EHR dataset of patients admitted into the intensive care units of the Beth Israel Deaconess Medical Center between 2001 and 2008.


Asunto(s)
Algoritmos , Humanos , Modelos Lineales , Simulación por Computador , Modelos de Riesgos Proporcionales
2.
Womens Health (Lond) ; 19: 17455057231184325, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37431843

RESUMEN

BACKGROUND: Adverse childhood experiences during key developmental periods have been shown to impact long-term health outcomes. Adverse childhood experiences may include psychological, physical, or sexual abuse; neglect; or socioeconomic factors. Adverse childhood experiences are linked with an increase in poor health behavior such as smoking and alcohol consumption, and may also influence epigenetic changes, inflammatory response, metabolic changes, and allostatic load. OBJECTIVE: We sought to explore associations between adverse childhood experiences and allostatic load in adult female participants in the UK Biobank. DESIGN: The UK Biobank is a multisite cohort study established to capture lifestyle, environment, exposure, health history, and genotype data on individuals in the United Kingdom. METHODS: Adverse childhood experiences were assessed from the Childhood Trauma Screener, which measures abuse and neglect across five items. Biological measures at enrollment were used to construct allostatic load, including measures of metabolic, inflammatory, and cardiovascular function. Females with a cancer diagnosis prior to enrollment were removed as it may influence allostatic load. Poisson regression models were used to assess the association between adverse childhood experiences and allostatic load, accounting for a priori confounders. RESULTS: A total of 33,466 females with complete data were analyzed, with a median age at enrollment of 54 (range = 40-70) years. Among the study sample, the mean allostatic load ranged from 1.85 in those who reported no adverse childhood experiences to 2.45 in those with all adverse childhood experiences reported. In multivariable analysis, there was a 4% increase in average allostatic load among females for every additional adverse childhood experience reported (incidence rate ratio = 1.04, 95% confidence interval = 1.03-1.05). Similar results were observed when assessing individual adverse childhood experience components. CONCLUSION: This analysis supports a growing body of evidence suggesting that increased exposure to early life abuse or neglect is associated with increased allostatic load in females.


Asunto(s)
Experiencias Adversas de la Infancia , Alostasis , Adulto , Humanos , Niño , Femenino , Persona de Mediana Edad , Anciano , Bancos de Muestras Biológicas , Estudios de Cohortes , Reino Unido
3.
Psychiatry Res ; 323: 115175, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37003169

RESUMEN

Growing evidence has shown that applying machine learning models to large clinical data sources may exceed clinician performance in suicide risk stratification. However, many existing prediction models either suffer from "temporal bias" (a bias that stems from using case-control sampling) or require training on all available patient visit data. Here, we adopt a "landmark model" framework that aligns with clinical practice for prediction of suicide-related behaviors (SRBs) using a large electronic health record database. Using the landmark approach, we developed models for SRB prediction (regularized Cox regression and random survival forest) that establish a time-point (e.g., clinical visit) from which predictions are made over user-specified prediction windows using historical information up to that point. We applied this approach to cohorts from three clinical settings: general outpatient, psychiatric emergency department, and psychiatric inpatients, for varying prediction windows and lengths of historical data. Models achieved high discriminative performance (area under the Receiver Operating Characteristic curve 0.74-0.93 for the Cox model) across different prediction windows and settings, even with relatively short periods of historical data. In short, we developed accurate, dynamic SRB risk prediction models with the landmark approach that reduce bias and enhance the reliability and portability of suicide risk prediction models.


Asunto(s)
Servicio de Urgencia en Hospital , Intento de Suicidio , Humanos , Intento de Suicidio/psicología , Reproducibilidad de los Resultados , Curva ROC
4.
Stat Med ; 42(3): 316-330, 2023 02 10.
Artículo en Inglés | MEDLINE | ID: mdl-36443903

RESUMEN

The shared random effects joint model is one of the most widely used approaches to study the associations between longitudinal biomarkers and a survival outcome and make dynamic risk predictions using the longitudinally measured biomarkers. Various types of joint models have been developed under different settings in the past decades. One major limitation of joint models is that they could be computationally expensive for complex models where the number of the shared random effects is large. Moreover, the inferential accuracy of joint models could also be diminished for complex models due to approximation errors. However, complex models are frequently needed in practice, for example, when the longitudinal biomarkers have nonlinear trajectories over time or the number of longitudinal biomarkers of interest is large. In this article, we propose a novel Gaussian variational approximate inference approach for fitting joint models, which significantly improves computational efficiency while maintaining inferential accuracy. We conduct extensive simulation studies to evaluate the performance of our proposed method and compare it to existing methods. The performance of our proposed method is further demonstrated on a dataset of patients with primary biliary cirrhosis.


Asunto(s)
Modelos Estadísticos , Humanos , Simulación por Computador , Biomarcadores , Estudios Longitudinales
5.
Psychiatry Int (Basel) ; 3(1): 52-64, 2022 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-36381676

RESUMEN

Neuroticism and premenstrual conditions share pleiotropic loci and are strongly associated. It is presently not known which DSM-5 symptoms of premenstrual syndrome/premenstrual mood disorder are associated with neuroticism. We enrolled 45 study participants to provide prospective daily ratings of affective ("depression", "anxiety, "anger", "mood swings") and psychological ("low interest", "feeling overwhelmed", and "difficulty concentrating") symptoms across two-three menstrual cycles (128 total cycles). Generalized additive modeling (gam function in R) was implemented to model the relationships between neuroticism and the premenstrual increase in symptomatology. Significance level was adjusted using the False Discovery Rate method and models were adjusted for current age and age of menarche. Results of the association analysis revealed that "low interest" (p ≤ 0.05) and "difficulty concentrating" (p ≤ 0.001) were significantly associated with neuroticism. None of the remaining symptoms reached statistical significance. The late luteal phase of the menstrual cycle is characterized by complex symptomatology, reflecting a physiological milieu of numerous biological processes. By identifying co-expression between neuroticism and specific premenstrual symptomatology, the present study improves our understanding of the premenstrual conditions and provides a platform for individualized treatment developments.

6.
Metabolites ; 12(10)2022 Oct 04.
Artículo en Inglés | MEDLINE | ID: mdl-36295844

RESUMEN

The regulation of DHEA-sulfate by steroid sulfotransferase (SULT) and steryl-sulfatase (STS) enzymes is a vital process for the downstream formation of many steroid hormones. DHEA-sulfate is the most abundant steroid hormone in the human body; thus, DHEA-sulfate and its hydrolyzed form, DHEA, continue to be evaluated in numerous studies, given their importance to human health. Yet, a basic question of relevance to the reproductive-age female population-whether the two steroid hormones vary across the menstrual cycle-has not been addressed. We applied a validated, multi-step protocol, involving realignment and imputation of study data to early follicular, mid-late follicular, periovulatory, and early, mid-, and late luteal subphases of the menstrual cycle, and analyzed DHEA-sulfate and DHEA serum concentrations using ultraperformance liquid chromatography tandem mass spectrometry. DHEA-sulfate levels started to decrease in the early luteal, significantly dropped in the mid-luteal, and returned to basal levels by the late luteal subphase. DHEA, however, did not vary across the menstrual cycle. The present study deep-mapped trajectories of DHEA and DHEA-sulfate across the entire menstrual cycle, demonstrating a significant decrease in DHEA-sulfate in the mid-luteal subphase. These findings are relevant to the active area of research examining associations between DHEA-sulfate levels and various disease states.

7.
Brain Sci ; 12(7)2022 Jun 22.
Artículo en Inglés | MEDLINE | ID: mdl-35884622

RESUMEN

OBJECTIVE: Sleep and eating behaviors are disturbed during the premenstrual phase of the menstrual cycle in a significant number of reproductive-age women. Despite their impact on the development and control of chronic health conditions, these behaviors are poorly understood. In the present study, we sought to identify affective and psychological factors which associate with premenstrual changes in sleeping and eating behaviors and assess how they impact functionality. METHODS: Fifty-seven women provided daily ratings of premenstrual symptomatology and functionality across two-three menstrual cycles (156 cycles total). For each participant and symptom, we subtracted the mean day +5 to +10 ("post-menstruum") ratings from mean day -6 to -1 ("pre-menstruum") ratings and divided this value by participant- and symptom-specific variance. We completed the statistical analysis using multivariate linear regression. RESULTS: Low interest was associated with a premenstrual increase in insomnia (p ≤ 0.05) and appetite/eating (p ≤ 0.05). Furthermore, insomnia was associated with occupational (p ≤ 0.001), recreational (p ≤ 0.001), and relational (p ≤ 0.01) impairment. CONCLUSIONS: Results of the present analysis highlight the importance of apathy (i.e., low interest) on the expression of behavioral symptomatology, as well as premenstrual insomnia on impairment. These findings can inform treatment approaches, thereby improving care for patients suffering from premenstrual symptomatology linked to chronic disease conditions.

8.
Front Psychiatry ; 13: 784316, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35573360

RESUMEN

Visceral adiposity is a significant marker of all-cause mortality. Reproductive age women are at a considerable risk for developing visceral adiposity; however, the associated factors are poorly understood. The proposed study evaluated whether food craving experienced during the premenstrual period is associated with waist circumference. Forty-six women (mean BMI = 24.36) prospectively provided daily ratings of food craving across two-three menstrual cycles (122 cycles total). Their premenstrual rating of food craving was contrasted against food craving in the follicular phase to derive a corrected summary score of the premenstrual food craving increase. Study groups were divided into normal (n = 26) and obese (n = 20) based on the 80 cm waist circumference cutoff signifying an increase in risk. Waist circumference category was significantly associated with premenstrual food cravings [F (1,44) = 5.12, p = 0.028]. Post hoc comparisons using the Tukey HSD test (95% family-wise confidence level) showed that the mean score for the food craving effect size was 0.35 higher for the abdominally obese vs. normal study groups (95% CI: 0.039 to 0.67). The result was statistically significant even following inclusion of BMI in the model, pointing to a particularly dangerous process of central fat accumulation. The present study establishes an association between temporal vulnerability to an increased food-related behavior and a marker of metabolic abnormality risk (i.e., waist circumference), thereby forming a basis for integrating the premenstruum as a viable intervention target for this at-risk sex and age group.

9.
Int J Med Inform ; 162: 104753, 2022 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-35405530

RESUMEN

OBJECTIVE: The use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce a semi-supervised method for binary acronym disambiguation, the task of classifying a target sense for acronyms in the clinical EHR notes. METHODS: We developed a semi-supervised ensemble machine learning (CASEml) algorithm to automatically identify when an acronym means a target sense by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard semi-supervised method and a baseline metric selecting the most frequent acronym sense. Along with evaluating the performance of these methods for specific instances of acronyms, we evaluated the impact of acronym disambiguation on NLP-driven phenotyping of rheumatoid arthritis. RESULTS: CASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art semi-supervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis. CONCLUSION: CASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and semi-supervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping.

10.
NPJ Digit Med ; 4(1): 151, 2021 Oct 27.
Artículo en Inglés | MEDLINE | ID: mdl-34707226

RESUMEN

The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease of interest. We constructed large-scale code embeddings for a wide range of codified concepts from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions. The features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Features identified via KESER resulted in comparable performance to those built upon features selected manually or with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data. Analysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among codified concepts. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data.

11.
EBioMedicine ; 69: 103439, 2021 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-34157486

RESUMEN

BACKGROUND: COVID-19 has been associated with Interstitial Lung Disease features. The immune transcriptomic overlap between Idiopathic Pulmonary Fibrosis (IPF) and COVID-19 has not been investigated. METHODS: we analyzed blood transcript levels of 50 genes known to predict IPF mortality in three COVID-19 and two IPF cohorts. The Scoring Algorithm of Molecular Subphenotypes (SAMS) was applied to distinguish high versus low-risk profiles in all cohorts. SAMS cutoffs derived from the COVID-19 Discovery cohort were used to predict intensive care unit (ICU) status, need for mechanical ventilation, and in-hospital mortality in the COVID-19 Validation cohort. A COVID-19 Single-cell RNA-sequencing cohort was used to identify the cellular sources of the 50-gene risk profiles. The same COVID-19 SAMS cutoffs were used to predict mortality in the IPF cohorts. FINDINGS: 50-gene risk profiles discriminated severe from mild COVID-19 in the Discovery cohort (P = 0·015) and predicted ICU admission, need for mechanical ventilation, and in-hospital mortality (AUC: 0·77, 0·75, and 0·74, respectively, P < 0·001) in the COVID-19 Validation cohort. In COVID-19, 50-gene expressing cells with a high-risk profile included monocytes, dendritic cells, and neutrophils, while low-risk profile-expressing cells included CD4+, CD8+ T lymphocytes, IgG producing plasmablasts, B cells, NK, and gamma/delta T cells. Same COVID-19 SAMS cutoffs were also predictive of mortality in the University of Chicago (HR:5·26, 95%CI:1·81-15·27, P = 0·0013) and Imperial College of London (HR:4·31, 95%CI:1·81-10·23, P = 0·0016) IPF cohorts. INTERPRETATION: 50-gene risk profiles in peripheral blood predict COVID-19 and IPF outcomes. The cellular sources of these gene expression changes suggest common innate and adaptive immune responses in both diseases. FUNDING: This work was supported in part by National Institute for Health Research Clinician Scientist Fellowship NIHR: CS-2013-13-017 (TMM); Action for Pulmonary Fibrosis Mike Bray fellowship (PLM); The National Heart, Lung, and Blood Institute (NHLBI) through award K01-HL-130704 (AJ); The University of South Florida (USF) Academic Support Fund and the USF Foundation, Ubben Fibrosis Fund (JHM).


Asunto(s)
COVID-19/genética , Transcriptoma , Adulto , Anciano , Biomarcadores/sangre , COVID-19/sangre , COVID-19/mortalidad , Femenino , Mortalidad Hospitalaria , Humanos , Masculino , Persona de Mediana Edad , Análisis de Supervivencia
12.
Ann Epidemiol ; 56: 47-54.e5, 2021 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-33181262

RESUMEN

PURPOSE: To describe coronavirus disease 2019 (COVID-19) mortality in Chicago during the spring of 2020 and identify at the census-tract level neighborhood characteristics that were associated with higher COVID-19 mortality rates. METHODS: Using Poisson regression and regularized linear regression (elastic net), we evaluated the association between neighborhood characteristics and COVID-19 mortality rates in Chicago through July 22 (2514 deaths across 795 populated census tracts). RESULTS: Black residents (31% of the population) accounted for 42% of COVID-19 deaths. Deaths among Hispanic/Latino residents occurred at a younger age (63 years, compared with 71 for white residents). Regarding residential setting, 52% of deaths among white residents occurred inside nursing homes, compared with 35% of deaths among black residents and 17% among Hispanic/Latino residents. Higher COVID-19 mortality was seen in neighborhoods with heightened barriers to social distancing and low health insurance coverage. Neighborhoods with a higher percentage of white and Asian residents had lower COVID-19 mortality. The associations differed by race, suggesting that neighborhood context may be most tightly linked to COVID-19 mortality among white residents. CONCLUSIONS: We describe communities that may benefit from supportive services and identify traits of communities that may benefit from targeted campaigns for prevention and testing to prevent future deaths from COVID-19.


Asunto(s)
COVID-19/mortalidad , Características de la Residencia , Anciano , Anciano de 80 o más Años , Chicago/epidemiología , Femenino , Humanos , Masculino , Persona de Mediana Edad
13.
JAMA Netw Open ; 3(7): e209250, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32648923

RESUMEN

Importance: The ε4 allele of the apolipoprotein E (APOE) gene and lower apolipoprotein E (apoE) protein levels in plasma are risk factors for Alzheimer disease, but the underlying biological mechanisms are not fully understood. Half of plasma apoE circulates on high-density lipoproteins (HDLs). Higher apoE levels in plasma HDL were previously found to be associated with lower coronary heart disease risk, but the coexistence of another apolipoprotein, apoC3, modified this lower risk. Objective: To investigate associations between the presence of apoE in different lipoproteins with cognitive function, particularly the risk of dementia. Design, Setting, and Participants: This prospective case-cohort study embedded in the Ginkgo Evaluation of Memory Study (2000-2008) analyzed data from 1351 community-dwelling participants 74 years and older. Of this group, 995 participants were free of dementia at baseline (recruited from September 2000 to June 2002) and 521 participants were diagnosed with incident dementia during follow-up until 2008. Data analysis was performed from January 2018 to December 2019. Exposures: Enzyme-linked immunosorbent assay-measured concentration of apoE in whole plasma, HDL-depleted plasma (non-HDL), HDL, and HDL subspecies that contain or lack apoC3 or apoJ. Main Outcomes and Measures: Adjusted hazard ratios for risk of dementia and Alzheimer disease during follow-up and adjusted differences (ß coefficients) in Alzheimer Disease Assessment-Cognitive Subscale (ADAS-cog) and Modified Mini-Mental State Examination scores at baseline. Results: Among 1351 participants, the median (interquartile range) age was 78 (76-81) years; 639 (47.3%) were women. The median (interquartile range) follow-up time was 5.9 (3.7-6.5) years. Higher whole plasma apoE levels and higher apoE levels in HDL were associated with better cognitive function assessed by ADAS-cog (whole plasma, ß coefficient, -0.15; 95% CI, -0.24 to -0.06; HDL, ß coefficient, -0.20; 95% CI, -0.30 to -0.10) but were unassociated with dementia or Alzheimer disease risk. When separated by apoC3, a higher apoE level in HDL that lacks apoC3 was associated with better cognitive function (ADAS-cog per SD: ß coefficient, 0.17; 95% CI, -0.27 to -0.07; Modified Mini-Mental State Examination score per SD: ß coefficient, 0.25; 95% CI, 0.07 to 0.42) and lower risk of dementia (hazard ratio per SD, 0.86; 95% CI, 0.76 to 0.99). In contrast, apoE levels in HDL that contains apoC3 were unassociated with any of these outcomes. Conclusions and Relevance: In a prospective cohort of older adults with rigorous follow-up of dementia, the apoE level in HDL that lacked apoC3 was associated with better cognitive function and lower dementia risk. This finding suggests that the cardioprotective associations of this novel lipoprotein extend to dementia.


Asunto(s)
Apolipoproteína C-III/sangre , Apolipoproteínas E/sangre , Demencia , Anciano , Cognición/fisiología , Estudios de Cohortes , Correlación de Datos , Demencia/diagnóstico , Demencia/epidemiología , Demencia/metabolismo , Femenino , Estudios de Seguimiento , Humanos , Vida Independiente/estadística & datos numéricos , Masculino , Pruebas Neuropsicológicas/estadística & datos numéricos , Estudios Prospectivos , Factores Protectores , Factores de Riesgo , Estados Unidos/epidemiología
14.
J Am Med Inform Assoc ; 27(8): 1235-1243, 2020 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-32548637

RESUMEN

OBJECTIVE: A major bottleneck hindering utilization of electronic health record data for translational research is the lack of precise phenotype labels. Chart review as well as rule-based and supervised phenotyping approaches require laborious expert input, hampering applicability to studies that require many phenotypes to be defined and labeled de novo. Though International Classification of Diseases codes are often used as surrogates for true labels in this setting, these sometimes suffer from poor specificity. We propose a fully automated topic modeling algorithm to simultaneously annotate multiple phenotypes. MATERIALS AND METHODS: Surrogate-guided ensemble latent Dirichlet allocation (sureLDA) is a label-free multidimensional phenotyping method. It first uses the PheNorm algorithm to initialize probabilities based on 2 surrogate features for each target phenotype, and then leverages these probabilities to constrain the LDA topic model to generate phenotype-specific topics. Finally, it combines phenotype-feature counts with surrogates via clustering ensemble to yield final phenotype probabilities. RESULTS: sureLDA achieves reliably high accuracy and precision across a range of simulated and real-world phenotypes. Its performance is robust to phenotype prevalence and relative informativeness of surogate vs nonsurrogate features. It also exhibits powerful feature selection properties. DISCUSSION: sureLDA combines attractive properties of PheNorm and LDA to achieve high accuracy and precision robust to diverse phenotype characteristics. It offers particular improvement for phenotypes insufficiently captured by a few surrogate features. Moreover, sureLDA's feature selection ability enables it to handle high feature dimensions and produce interpretable computational phenotypes. CONCLUSIONS: sureLDA is well suited toward large-scale electronic health record phenotyping for highly multiphenotype applications such as phenome-wide association studies .


Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Registros Electrónicos de Salud/clasificación , Humanos , Medicina de Precisión , Curva ROC , Investigación Biomédica Traslacional
15.
J Lipid Res ; 61(3): 445-454, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-31892526

RESUMEN

Whether HDL is associated with dementia risk is unclear. In addition to apoA1, other apolipoproteins are found in HDL, creating subspecies of HDL that may have distinct metabolic properties. We measured apoA1, apoC3, and apoJ levels in plasma and apoA1 levels in HDL that contains or lacks apoE, apoJ, or apoC3 using a modified sandwich ELISA in a case-cohort study nested within the Ginkgo Evaluation of Memory Study. We included 995 randomly selected participants and 521 participants who developed dementia during a mean of 5.1 years of follow-up. The level of total apoA1 was not significantly related to dementia risk, regardless of the coexistence of apoC3, apoJ, or apoE. Higher levels of total plasma apoC3 were associated with better cognitive function at baseline (difference in Modified Mini-Mental State Examination scores tertile 3 vs. tertile 1: 0.60; 95% CI: 0.23, 0.98) and a lower dementia risk (adjusted hazard ratio tertile 3 vs. tertile 1: 0.73; 95% CI: 0.55, 0.96). Plasma concentrations of apoA1 in HDL and its apolipoprotein-defined subspecies were not associated with cognitive function at baseline or with the risk of dementia during follow-up. Similar studies in other populations are required to better understand the association between apoC3 and Alzheimer's disease pathology.


Asunto(s)
Apolipoproteínas/sangre , Demencia/sangre , Demencia/diagnóstico , Lipoproteínas HDL/sangre , Anciano , Anciano de 80 o más Años , Cognición , Método Doble Ciego , Femenino , Estudios de Seguimiento , Humanos , Masculino , Factores de Riesgo
16.
Nat Protoc ; 14(12): 3426-3444, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31748751

RESUMEN

Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).


Asunto(s)
Análisis de Datos , Registros Electrónicos de Salud/estadística & datos numéricos , Ensayos Analíticos de Alto Rendimiento/métodos , Algoritmos , Interpretación Estadística de Datos , Humanos , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Fenotipo
17.
J Am Med Inform Assoc ; 26(11): 1255-1262, 2019 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-31613361

RESUMEN

OBJECTIVE: Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). MATERIALS AND METHODS: We developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the Unified Medical Language System. Along with health care utilization, aggregated ICD and NLP counts were jointly analyzed by fitting an ensemble of latent mixture models. The multimodal automated phenotyping (MAP) algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying participants with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort phenome-wide association studies (PheWAS) for 2 single nucleotide polymorphisms with known associations. RESULTS: The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUCMAP 0.943, AUCmanual 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes. CONCLUSION: The MAP approach increased the accuracy of phenotype definition while maintaining scalability, thereby facilitating use in studies requiring large-scale phenotyping, such as PheWAS.


Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Clasificación Internacional de Enfermedades , Procesamiento de Lenguaje Natural , Fenotipo , Polimorfismo de Nucleótido Simple , Área Bajo la Curva , Humanos , Unified Medical Language System
18.
Lancet Respir Med ; 7(6): 497-508, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-30935881

RESUMEN

BACKGROUND: There is an urgent need for biomarkers to better stratify patients with idiopathic pulmonary fibrosis by risk for lung transplantation allocation who have the same clinical presentation. We aimed to investigate whether a specific immune cell type from patients with idiopathic pulmonary fibrosis could identify those at higher risk of poor outcomes. We then sought to validate our findings using cytometry and electronic health records. METHODS: We first did a discovery analysis with transcriptome data from the Gene Expression Omnibus at the National Center for Biotechnology Information for 120 peripheral blood mononuclear cell (PBMC) samples of patients with idiopathic pulmonary fibrosis. We estimated percentages of 13 immune cell types using statistical deconvolution, and investigated the association of these cell types with transplant-free survival. We validated these results using PBMC samples from patients with idiopathic pulmonary fibrosis in two independent cohorts (COMET and Yale). COMET profiled monocyte counts in 45 patients with idiopathic pulmonary fibrosis from March 12, 2010, to March 10, 2011, using flow cytometry; we tested if increased monocyte count was associated with the primary outcome of disease progression. In the Yale cohort, 15 patients with idiopathic pulmonary fibrosis (with five healthy controls) were classed as high risk or low risk from April 28, 2014, to Aug 20, 2015, using a 52-gene signature, and we assessed whether monocyte percentage (measured by cytometry by time of flight) was higher in high-risk patients. We then examined complete blood count values in the electronic health records (EHR) of 45 068 patients with idiopathic pulmonary fibrosis, systemic sclerosis, hypertrophic cardiomyopathy, or myelofibrosis from Stanford (Jan 01, 2008, to Dec 31, 2015), Northwestern (Feb 15, 2001 to July 31, 2017), Vanderbilt (Jan 01, 2008, to Dec 31, 2016), and Optum Clinformatics DataMart (Jan 01, 2004, to Dec 31, 2016) cohorts, and examined whether absolute monocyte counts of 0·95 K/µL or greater were associated with all-cause mortality in these patients. FINDINGS: In the discovery analysis, estimated CD14+ classical monocyte percentages above the mean were associated with shorter transplant-free survival times (hazard ratio [HR] 1·82, 95% CI 1·05-3·14), whereas higher percentages of T cells and B cells were not (0·97, 0·59-1·66; and 0·78, 0·45-1·34 respectively). In two validation cohorts (COMET trial and the Yale cohort), patients with higher monocyte counts were at higher risk for poor outcomes (COMET Wilcoxon p=0·025; Yale Wilcoxon p=0·049). Monocyte counts of 0·95 K/µL or greater were associated with mortality after adjusting for forced vital capacity (HR 2·47, 95% CI 1·48-4·15; p=0·0063), and the gender, age, and physiology index (HR 2·06, 95% CI 1·22-3·47; p=0·0068) across the COMET, Stanford, and Northwestern datasets). Analysis of medical records of 7459 patients with idiopathic pulmonary fibrosis showed that patients with monocyte counts of 0·95 K/µL or greater were at increased risk of mortality with lung transplantation as a censoring event, after adjusting for age at diagnosis and sex (Stanford HR=2·30, 95% CI 0·94-5·63; Vanderbilt 1·52, 1·21-1·89; Optum 1·74, 1·33-2·27). Likewise, higher absolute monocyte count was associated with shortened survival in patients with hypertrophic cardiomyopathy across all three cohorts, and in patients with systemic sclerosis or myelofibrosis in two of the three cohorts. INTERPRETATION: Monocyte count could be incorporated into the clinical assessment of patients with idiopathic pulmonary fibrosis and other fibrotic disorders. Further investigation into the mechanistic role of monocytes in fibrosis might lead to insights that assist the development of new therapies. FUNDING: Bill & Melinda Gates Foundation, US National Institute of Allergy and Infectious Diseases, and US National Library of Medicine.


Asunto(s)
Fibrosis Pulmonar Idiopática/sangre , Recuento de Leucocitos/estadística & datos numéricos , Leucocitos Mononucleares , Medición de Riesgo/métodos , Adulto , Biomarcadores/sangre , Femenino , Humanos , Fibrosis Pulmonar Idiopática/diagnóstico , Fibrosis Pulmonar Idiopática/cirugía , Trasplante de Pulmón , Masculino , Persona de Mediana Edad , Selección de Paciente , Valor Predictivo de las Pruebas , Modelos de Riesgos Proporcionales , Estudios Retrospectivos
19.
Stat Appl Genet Mol Biol ; 18(2)2019 02 13.
Artículo en Inglés | MEDLINE | ID: mdl-30759070

RESUMEN

Longitudinal genomics data and survival outcome are common in biomedical studies, where the genomics data are often of high dimension. It is of great interest to select informative longitudinal biomarkers (e.g. genes) related to the survival outcome. In this paper, we develop a computationally efficient tool, LCox, for selecting informative biomarkers related to the survival outcome using the longitudinal genomics data. LCox is powerful to detect different forms of dependence between the longitudinal biomarkers and the survival outcome. We show that LCox has improved performance compared to existing methods through extensive simulation studies. In addition, by applying LCox to a dataset of patients with idiopathic pulmonary fibrosis, we are able to identify biologically meaningful genes while all other methods fail to make any discovery. An R package to perform LCox is freely available at https://CRAN.R-project.org/package=LCox.


Asunto(s)
Biología Computacional/estadística & datos numéricos , Genómica/estadística & datos numéricos , Programas Informáticos , Análisis de Supervivencia , Algoritmos , Perfilación de la Expresión Génica/estadística & datos numéricos , Humanos
20.
Biometrics ; 75(1): 69-77, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30178494

RESUMEN

Although many modeling approaches have been developed to jointly analyze longitudinal biomarkers and a time-to-event outcome, most of these methods can only handle one or a few biomarkers. In this article, we propose a novel joint latent class model to deal with high dimensional longitudinal biomarkers. Our model has three components: a class membership model, a survival submodel, and a longitudinal submodel. In our model, we assume that covariates can potentially affect biomarkers and class membership. We adopt a penalized likelihood approach to infer which covariates have random effects and/or fixed effects on biomarkers, and which covariates are informative for the latent classes. Through extensive simulation studies, we show that our proposed method has improved performance in prediction and assigning subjects to the correct classes over other joint modeling methods and that bootstrap can be used to do inference for our model. We then apply our method to a dataset of patients with idiopathic pulmonary fibrosis, for whom gene expression profiles were measured longitudinally. We are able to identify four interesting latent classes with one class being at much higher risk of death compared to the other classes. We also find that each of the latent classes has unique trajectories in some genes, yielding novel biological insights.


Asunto(s)
Análisis de Clases Latentes , Funciones de Verosimilitud , Estudios Longitudinales , Biomarcadores/análisis , Simulación por Computador , Perfilación de la Expresión Génica , Humanos , Fibrosis Pulmonar Idiopática/tratamiento farmacológico , Fibrosis Pulmonar Idiopática/genética , Análisis de Supervivencia , Factores de Tiempo , Resultado del Tratamiento
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...