Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Más filtros

Banco de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
PLoS Genet ; 19(12): e1010907, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38113267

RESUMEN

OBJECTIVE: To overcome the limitations associated with the collection and curation of COVID-19 outcome data in biobanks, this study proposes the use of polygenic risk scores (PRS) as reliable proxies of COVID-19 severity across three large biobanks: the Michigan Genomics Initiative (MGI), UK Biobank (UKB), and NIH All of Us. The goal is to identify associations between pre-existing conditions and COVID-19 severity. METHODS: Drawing on a sample of more than 500,000 individuals from the three biobanks, we conducted a phenome-wide association study (PheWAS) to identify associations between a PRS for COVID-19 severity, derived from a genome-wide association study on COVID-19 hospitalization, and clinical pre-existing, pre-pandemic phenotypes. We performed cohort-specific PRS PheWAS and a subsequent fixed-effects meta-analysis. RESULTS: The current study uncovered 23 pre-existing conditions significantly associated with the COVID-19 severity PRS in cohort-specific analyses, of which 21 were observed in the UKB cohort and two in the MGI cohort. The meta-analysis yielded 27 significant phenotypes predominantly related to obesity, metabolic disorders, and cardiovascular conditions. After adjusting for body mass index, several clinical phenotypes, such as hypercholesterolemia and gastrointestinal disorders, remained associated with an increased risk of hospitalization following COVID-19 infection. CONCLUSION: By employing PRS as a proxy for COVID-19 severity, we corroborated known risk factors and identified novel associations between pre-existing clinical phenotypes and COVID-19 severity. Our study highlights the potential value of using PRS when actual outcome data may be limited or inadequate for robust analyses.


Asunto(s)
COVID-19 , Salud Poblacional , Humanos , Estudio de Asociación del Genoma Completo , Puntuación de Riesgo Genético , COVID-19/genética , Bancos de Muestras Biológicas , Cobertura de Afecciones Preexistentes , Factores de Riesgo , Predisposición Genética a la Enfermedad
2.
PLoS Genet ; 17(9): e1009670, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34529658

RESUMEN

Polygenic risk scores (PRS) can provide useful information for personalized risk stratification and disease risk assessment, especially when combined with non-genetic risk factors. However, their construction depends on the availability of summary statistics from genome-wide association studies (GWAS) independent from the target sample. For best compatibility, it was reported that GWAS and the target sample should match in terms of ancestries. Yet, GWAS, especially in the field of cancer, often lack diversity and are predominated by European ancestry. This bias is a limiting factor in PRS research. By using electronic health records and genetic data from the UK Biobank, we contrast the utility of breast and prostate cancer PRS derived from external European-ancestry-based GWAS across African, East Asian, European, and South Asian ancestry groups. We highlight differences in the PRS distributions of these groups that are amplified when PRS methods condense hundreds of thousands of variants into a single score. While European-GWAS-derived PRS were not directly transferrable across ancestries on an absolute scale, we establish their predictive potential when considering them separately within each group. For example, the top 10% of the breast cancer PRS distributions within each ancestry group each revealed significant enrichments of breast cancer cases compared to the bottom 90% (odds ratio of 2.81 [95%CI: 2.69,2.93] in European, 2.88 [1.85, 4.48] in African, 2.60 [1.25, 5.40] in East Asian, and 2.33 [1.55, 3.51] in South Asian individuals). Our findings highlight a compromise solution for PRS research to compensate for the lack of diversity in well-powered European GWAS efforts while recruitment of diverse participants in the field catches up.


Asunto(s)
Neoplasias de la Mama/genética , Predisposición Genética a la Enfermedad , Herencia Multifactorial , Femenino , Estudio de Asociación del Genoma Completo , Humanos
3.
Am J Hum Genet ; 107(5): 815-836, 2020 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-32991828

RESUMEN

To facilitate scientific collaboration on polygenic risk scores (PRSs) research, we created an extensive PRS online repository for 35 common cancer traits integrating freely available genome-wide association studies (GWASs) summary statistics from three sources: published GWASs, the NHGRI-EBI GWAS Catalog, and UK Biobank-based GWASs. Our framework condenses these summary statistics into PRSs using various approaches such as linkage disequilibrium pruning/p value thresholding (fixed or data-adaptively optimized thresholds) and penalized, genome-wide effect size weighting. We evaluated the PRSs in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and the population-based UK Biobank (UKB). For each PRS construct, we provide measures on predictive performance and discrimination. Besides PRS evaluation, the Cancer-PRSweb platform features construct downloads and phenome-wide PRS association study results (PRS-PheWAS) for predictive PRSs. We expect this integrated platform to accelerate PRS-related cancer research.


Asunto(s)
Bancos de Muestras Biológicas/estadística & datos numéricos , Predisposición Genética a la Enfermedad , Genoma Humano , Genómica/métodos , Herencia Multifactorial , Neoplasias/genética , Adulto , Anciano , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Internet , Desequilibrio de Ligamiento , Masculino , Persona de Mediana Edad , Neoplasias/clasificación , Neoplasias/diagnóstico , Neoplasias/epidemiología , Fenotipo , Carácter Cuantitativo Heredable , Factores de Riesgo , Reino Unido/epidemiología , Estados Unidos/epidemiología
4.
PLoS Genet ; 15(6): e1008202, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31194742

RESUMEN

Polygenic risk scores (PRS) are designed to serve as single summary measures that are easy to construct, condensing information from a large number of genetic variants associated with a disease. They have been used for stratification and prediction of disease risk. The primary focus of this paper is to demonstrate how we can combine PRS and electronic health records data to better understand the shared and unique genetic architecture and etiology of disease subtypes that may be both related and heterogeneous. PRS construction strategies often depend on the purpose of the study, the available data/summary estimates, and the underlying genetic architecture of a disease. We consider several choices for constructing a PRS using data obtained from various publicly-available sources including the UK Biobank and evaluate their abilities to predict not just the primary phenotype but also secondary phenotypes derived from electronic health records (EHR). This study was conducted using data from 30,702 unrelated, genotyped patients of recent European descent from the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort within Michigan Medicine. We examine the three most common skin cancer subtypes in the USA: basal cell carcinoma, cutaneous squamous cell carcinoma, and melanoma. Using these PRS for various skin cancer subtypes, we conduct a phenome-wide association study (PheWAS) within the MGI data to evaluate PRS associations with secondary traits. PheWAS results are then replicated using population-based UK Biobank data and compared across various PRS construction methods. We develop an accompanying visual catalog called PRSweb that provides detailed PheWAS results and allows users to directly compare different PRS construction methods.


Asunto(s)
Predisposición Genética a la Enfermedad , Genómica , Herencia Multifactorial/genética , Neoplasias Cutáneas/genética , Bancos de Muestras Biológicas , Registros Electrónicos de Salud , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Michigan/epidemiología , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Factores de Riesgo , Neoplasias Cutáneas/patología , Reino Unido/epidemiología
5.
BMC Infect Dis ; 21(1): 533, 2021 Jun 07.
Artículo en Inglés | MEDLINE | ID: mdl-34098885

RESUMEN

BACKGROUND: Many popular disease transmission models have helped nations respond to the COVID-19 pandemic by informing decisions about pandemic planning, resource allocation, implementation of social distancing measures, lockdowns, and other non-pharmaceutical interventions. We study how five epidemiological models forecast and assess the course of the pandemic in India: a baseline curve-fitting model, an extended SIR (eSIR) model, two extended SEIR (SAPHIRE and SEIR-fansy) models, and a semi-mechanistic Bayesian hierarchical model (ICM). METHODS: Using COVID-19 case-recovery-death count data reported in India from March 15 to October 15 to train the models, we generate predictions from each of the five models from October 16 to December 31. To compare prediction accuracy with respect to reported cumulative and active case counts and reported cumulative death counts, we compute the symmetric mean absolute prediction error (SMAPE) for each of the five models. For reported cumulative cases and deaths, we compute Pearson's and Lin's correlation coefficients to investigate how well the projected and observed reported counts agree. We also present underreporting factors when available, and comment on uncertainty of projections from each model. RESULTS: For active case counts, SMAPE values are 35.14% (SEIR-fansy) and 37.96% (eSIR). For cumulative case counts, SMAPE values are 6.89% (baseline), 6.59% (eSIR), 2.25% (SAPHIRE) and 2.29% (SEIR-fansy). For cumulative death counts, the SMAPE values are 4.74% (SEIR-fansy), 8.94% (eSIR) and 0.77% (ICM). Three models (SAPHIRE, SEIR-fansy and ICM) return total (sum of reported and unreported) cumulative case counts as well. We compute underreporting factors as of October 31 and note that for cumulative cases, the SEIR-fansy model yields an underreporting factor of 7.25 and ICM model yields 4.54 for the same quantity. For total (sum of reported and unreported) cumulative deaths the SEIR-fansy model reports an underreporting factor of 2.97. On October 31, we observe 8.18 million cumulative reported cases, while the projections (in millions) from the baseline model are 8.71 (95% credible interval: 8.63-8.80), while eSIR yields 8.35 (7.19-9.60), SAPHIRE returns 8.17 (7.90-8.52) and SEIR-fansy projects 8.51 (8.18-8.85) million cases. Cumulative case projections from the eSIR model have the highest uncertainty in terms of width of 95% credible intervals, followed by those from SAPHIRE, the baseline model and finally SEIR-fansy. CONCLUSIONS: In this comparative paper, we describe five different models used to study the transmission dynamics of the SARS-Cov-2 virus in India. While simulation studies are the only gold standard way to compare the accuracy of the models, here we were uniquely poised to compare the projected case-counts against observed data on a test period. The largest variability across models is observed in predicting the "total" number of infections including reported and unreported cases (on which we have no validation data). The degree of under-reporting has been a major concern in India and is characterized in this report. Overall, the SEIR-fansy model appeared to be a good choice with publicly available R-package and desired flexibility plus accuracy.


Asunto(s)
COVID-19/epidemiología , COVID-19/transmisión , Pandemias , Teorema de Bayes , Control de Enfermedades Transmisibles/métodos , Simulación por Computador , Predicción , Humanos , India/epidemiología , Modelos Estadísticos
6.
J Biomed Inform ; 113: 103652, 2021 01.
Artículo en Inglés | MEDLINE | ID: mdl-33279681

RESUMEN

BACKGROUND: Traditional methods for disease risk prediction and assessment, such as diagnostic tests using serum, urine, blood, saliva or imaging biomarkers, have been important for identifying high-risk individuals for many diseases, leading to early detection and improved survival. For pancreatic cancer, traditional methods for screening have been largely unsuccessful in identifying high-risk individuals in advance of disease progression leading to high mortality and poor survival. Electronic health records (EHR) linked to genetic profiles provide an opportunity to integrate multiple sources of patient information for risk prediction and stratification. We leverage a constellation of temporally associated diagnoses available in the EHR to construct a summary risk score, called a phenotype risk score (PheRS), for identifying individuals at high-risk for having pancreatic cancer. The proposed PheRS approach incorporates the time with respect to disease onset into the prediction framework. We combine and contrast the PheRS with more well-known measures of inherited susceptibility, namely, the polygenic risk scores (PRS) for prediction of pancreatic cancer. METHODOLOGY: We first calculated pairwise, unadjusted associations between pancreatic cancer diagnosis and all possible other diagnoses across the medical phenome. We call these pairwise associations co-occurrences. After accounting for cross-phenotype correlations, the multivariable association estimates from a subset of relatively independent diagnoses were used to create a weighted sum PheRS. We constructed time-restricted risk scores using data from 38,359 participants in the Michigan Genomics Initiative (MGI) based on the diagnoses contained in the EHR at 0, 1, 2, and 5 years prior to the target pancreatic cancer diagnosis. The PheRS was assessed for predictability in the UK Biobank (UKB). We tested the relative contribution of PheRS when added to a model containing a summary measure of inherited genetic susceptibility (PRS) plus other covariates like age, sex, smoking status, drinking status, and body mass index (BMI). RESULTS: Our exploration of co-occurrence patterns identified expected associations while also revealing unexpected relationships that may warrant closer attention. Solely using the pancreatic cancer PheRS at 5 years before the target diagnoses yielded an AUC of 0.60 (95% CI = [0.58, 0.62]) in UKB. A larger predictive model including PheRS, PRS, and the covariates at the 5-year threshold achieved an AUC of 0.74 (95% CI = [0.72, 0.76]) in UKB. We note that PheRS does contribute independently in the joint model. Finally, scores at the top percentiles of the PheRS distribution demonstrated promise in terms of risk stratification. Scores in the top 2% were 10.20 (95% CI = [9.34, 12.99]) times more likely to identify cases than those in the bottom 98% in UKB at the 5-year threshold prior to pancreatic cancer diagnosis. CONCLUSIONS: We developed a framework for creating a time-restricted PheRS from EHR data for pancreatic cancer using the rich information content of a medical phenome. In addition to identifying hypothesis-generating associations for future research, this PheRS demonstrates a potentially important contribution in identifying high-risk individuals, even after adjusting for PRS for pancreatic cancer and other traditional epidemiologic covariates. The methods are generalizable to other phenotypic traits.


Asunto(s)
Registros Electrónicos de Salud , Neoplasias Pancreáticas , Bancos de Muestras Biológicas , Estudio de Asociación del Genoma Completo , Humanos , Michigan , Neoplasias Pancreáticas/genética , Fenotipo , Factores de Riesgo
7.
Stat Med ; 39(11): 1675-1694, 2020 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32101638

RESUMEN

The statistical practice of modeling interaction with two linear main effects and a product term is ubiquitous in the statistical and epidemiological literature. Most data modelers are aware that the misspecification of main effects can potentially cause severe type I error inflation in tests for interactions, leading to spurious detection of interactions. However, modeling practice has not changed. In this article, we focus on the specific situation where the main effects in the model are misspecified as linear terms and characterize its impact on common tests for statistical interaction. We then propose some simple alternatives that fix the issue of potential type I error inflation in testing interaction due to main effect misspecification. We show that when using the sandwich variance estimator for a linear regression model with a quantitative outcome and two independent factors, both the Wald and score tests asymptotically maintain the correct type I error rate. However, if the independence assumption does not hold or the outcome is binary, using the sandwich estimator does not fix the problem. We further demonstrate that flexibly modeling the main effect under a generalized additive model can largely reduce or often remove bias in the estimates and maintain the correct type I error rate for both quantitative and binary outcomes regardless of the independence assumption. We show, under the independence assumption and for a continuous outcome, overfitting and flexibly modeling the main effects does not lead to power loss asymptotically relative to a correctly specified main effect model. Our simulation study further demonstrates the empirical fact that using flexible models for the main effects does not result in a significant loss of power for testing interaction in general. Our results provide an improved understanding of the strengths and limitations for tests of interaction in the presence of main effect misspecification. Using data from a large biobank study "The Michigan Genomics Initiative", we present two examples of interaction analysis in support of our results.


Asunto(s)
Interpretación Estadística de Datos , Sesgo , Simulación por Computador , Humanos , Modelos Lineales , Michigan
8.
Stat Med ; 39(6): 773-800, 2020 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-31859414

RESUMEN

Biobanks linked to electronic health records provide rich resources for health-related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis-generating studies of disease-treatment, disease-exposure, and disease-gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank-based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank-based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.


Asunto(s)
Bancos de Muestras Biológicas , Registros Electrónicos de Salud , Genómica , Michigan , Proyectos de Investigación
9.
Int J Cancer ; 144(9): 2192-2205, 2019 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-30499236

RESUMEN

As a follow-up to genome-wide association analysis of common variants associated with ovarian carcinoma (cancer), our study considers seven well-known ovarian cancer risk factors and their interactions with 28 genome-wide significant common genetic variants. The interaction analyses were based on data from 9971 ovarian cancer cases and 15,566 controls from 17 case-control studies. Likelihood ratio and Wald tests for multiplicative interaction and for relative excess risk due to additive interaction were used. The top multiplicative interaction was noted between oral contraceptive pill (OCP) use (ever vs. never) and rs13255292 (p value = 3.48 × 10-4 ). Among women with the TT genotype for this variant, the odds ratio for OCP use was 0.53 (95% CI = 0.46-0.60) compared to 0.71 (95%CI = 0.66-0.77) for women with the CC genotype. When stratified by duration of OCP use, women with 1-5 years of OCP use exhibited differential protective benefit across genotypes. However, no interaction on either the multiplicative or additive scale was found to be statistically significant after multiple testing correction. The results suggest that OCP use may offer increased benefit for women who are carriers of the T allele in rs13255292. On the other hand, for women carrying the C allele in this variant, longer (5+ years) use of OCP may reduce the impact of carrying the risk allele of this SNP. Replication of this finding is needed. The study presents a comprehensive analytic framework for conducting gene-environment analysis in ovarian cancer.


Asunto(s)
Exposición a Riesgos Ambientales/efectos adversos , Interacción Gen-Ambiente , Predisposición Genética a la Enfermedad/genética , Neoplasias Ováricas/etiología , Neoplasias Ováricas/genética , Estudios de Casos y Controles , Anticonceptivos Hormonales Orales , Ambiente , Femenino , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Humanos , Polimorfismo de Nucleótido Simple/genética , Riesgo
10.
Cancer Causes Control ; 30(12): 1377-1388, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-31606852

RESUMEN

PURPOSE: Liver cancer incidence continues to increase while incidence of most other cancers is decreasing. We analyze recent and long-term trends of US liver cancer incidence by race/ethnicity and sex to best understand where to focus preventive efforts. METHODS: Liver cancer incidence rates from 1992 to 2016 were obtained from the Surveillance, Epidemiology, and End Results registry. Delay-adjusted age-standardized incidence trends by race/ethnicity and sex were analyzed using joinpoint regression. Age-specific incidence was analyzed using age-period-cohort models. Hepatitis C seroprevalence by cohort was calculated using National Health and Nutrition Examination Survey data. RESULTS: Liver cancer incidence has peaked in males and Asian or Pacific Islanders. Hispanic males, a high-incidence population, are experiencing a decrease in incidence, although not yet statistically significant. In contrast, incidence continues to increase in females, although at lower rates than in the 1990s, and American Indian/Alaska Natives (AI/ANs). Liver cancer incidence continues to be higher in males. Non-Hispanic Whites have the lowest incidence among racial/ethnic groups. Trends largely reflect differences in incidence by birth-cohort, which increased considerably, particularly in males, for those born around the 1950s, and continues to increase in females and AI/ANs. The patterns in males are likely driven by cohort variations in Hepatitis C infection. CONCLUSIONS: Liver cancer incidence appears to have peaked among males. However, important differences in liver cancer trends by race/ethnicity and sex remain, highlighting the need for monitoring trends across different groups. Preventive interventions should focus on existing liver cancer disparities, targeting AI/ANs, females, and high-incidence groups.


Asunto(s)
Etnicidad/estadística & datos numéricos , Neoplasias Hepáticas/epidemiología , Grupos Raciales/estadística & datos numéricos , Adulto , Anciano , Anciano de 80 o más Años , Femenino , Humanos , Incidencia , Masculino , Persona de Mediana Edad , Encuestas Nutricionales , Sistema de Registros/estadística & datos numéricos , Estudios Seroepidemiológicos , Estados Unidos/epidemiología , Adulto Joven
12.
J Am Med Inform Assoc ; 31(7): 1479-1492, 2024 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-38742457

RESUMEN

OBJECTIVES: To develop recommendations regarding the use of weights to reduce selection bias for commonly performed analyses using electronic health record (EHR)-linked biobank data. MATERIALS AND METHODS: We mapped diagnosis (ICD code) data to standardized phecodes from 3 EHR-linked biobanks with varying recruitment strategies: All of Us (AOU; n = 244 071), Michigan Genomics Initiative (MGI; n = 81 243), and UK Biobank (UKB; n = 401 167). Using 2019 National Health Interview Survey data, we constructed selection weights for AOU and MGI to represent the US adult population more. We used weights previously developed for UKB to represent the UKB-eligible population. We conducted 4 common analyses comparing unweighted and weighted results. RESULTS: For AOU and MGI, estimated phecode prevalences decreased after weighting (weighted-unweighted median phecode prevalence ratio [MPR]: 0.82 and 0.61), while UKB estimates increased (MPR: 1.06). Weighting minimally impacted latent phenome dimensionality estimation. Comparing weighted versus unweighted phenome-wide association study for colorectal cancer, the strongest associations remained unaltered, with considerable overlap in significant hits. Weighting affected the estimated log-odds ratio for sex and colorectal cancer to align more closely with national registry-based estimates. DISCUSSION: Weighting had a limited impact on dimensionality estimation and large-scale hypothesis testing but impacted prevalence and association estimation. When interested in estimating effect size, specific signals from untargeted association analyses should be followed up by weighted analysis. CONCLUSION: EHR-linked biobanks should report recruitment and selection mechanisms and provide selection weights with defined target populations. Researchers should consider their intended estimands, specify source and target populations, and weight EHR-linked biobank analyses accordingly.


Asunto(s)
Bancos de Muestras Biológicas , Registros Electrónicos de Salud , Humanos , Sesgo de Selección , Femenino , Masculino , Adulto , Persona de Mediana Edad , Registro Médico Coordinado , Estados Unidos , Anciano , Reino Unido , Michigan
13.
medRxiv ; 2024 Feb 13.
Artículo en Inglés | MEDLINE | ID: mdl-38405832

RESUMEN

Objective: To explore the role of selection bias adjustment by weighting electronic health record (EHR)-linked biobank data for commonly performed analyses. Materials and methods: We mapped diagnosis (ICD code) data to standardized phecodes from three EHR-linked biobanks with varying recruitment strategies: All of Us (AOU; n=244,071), Michigan Genomics Initiative (MGI; n=81,243), and UK Biobank (UKB; n=401,167). Using 2019 National Health Interview Survey data, we constructed selection weights for AOU and MGI to be more representative of the US adult population. We used weights previously developed for UKB to represent the UKB-eligible population. We conducted four common descriptive and analytic tasks comparing unweighted and weighted results. Results: For AOU and MGI, estimated phecode prevalences decreased after weighting (weighted-unweighted median phecode prevalence ratio [MPR]: 0.82 and 0.61), while UKB's estimates increased (MPR: 1.06). Weighting minimally impacted latent phenome dimensionality estimation. Comparing weighted versus unweighted PheWAS for colorectal cancer, the strongest associations remained unaltered and there was large overlap in significant hits. Weighting affected the estimated log-odds ratio for sex and colorectal cancer to align more closely with national registry-based estimates. Discussion: Weighting had limited impact on dimensionality estimation and large-scale hypothesis testing but impacted prevalence and association estimation more. Results from untargeted association analyses should be followed by weighted analysis when effect size estimation is of interest for specific signals. Conclusion: EHR-linked biobanks should report recruitment and selection mechanisms and provide selection weights with defined target populations. Researchers should consider their intended estimands, specify source and target populations, and weight EHR-linked biobank analyses accordingly.

14.
medRxiv ; 2023 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-37425863

RESUMEN

Background: Observational vaccine effectiveness (VE) studies based on real-world data are a crucial supplement to initial randomized clinical trials of Coronavirus Disease 2019 (COVID-19) vaccines. However, there exists substantial heterogeneity in study designs and statistical methods for estimating VE. The impact of such heterogeneity on VE estimates is not clear. Methods: We conducted a two-step literature review of booster VE: a literature search for first or second monovalent boosters on January 1, 2023, and a rapid search for bivalent boosters on March 28, 2023. For each study identified, study design, methods, and VE estimates for infection, hospitalization, and/or death were extracted and summarized via forest plots. We then applied methods identified in the literature to a single dataset from Michigan Medicine (MM), providing a comparison of the impact of different statistical methodologies on the same dataset. Results: We identified 53 studies estimating VE of the first booster, 16 for the second booster. Of these studies, 2 were case-control, 17 were test-negative, and 50 were cohort studies. Together, they included nearly 130 million people worldwide. VE for all outcomes was very high (around 90%) in earlier studies (i.e., in 2021), but became attenuated and more heterogeneous over time (around 40%-50% for infection, 60%-90% for hospitalization, and 50%-90% for death). VE compared to the previous dose was lower for the second booster (10-30% for infection, 30-60% against hospitalization, and 50-90% against death). We also identified 11 bivalent booster studies including over 20 million people. Early studies of the bivalent booster showed increased effectiveness compared to the monovalent booster (VE around 50-80% for hospitalization and death).Our primary analysis with MM data using a cohort design included 186,495 individuals overall (including 153,811 boosted and 32,684 with only a primary series vaccination), and a secondary test-negative design included 65,992 individuals tested for SARS-CoV-2. When different statistical designs and methods were applied to MM data, VE estimates for hospitalization and death were robust to analytic choices, with test-negative designs leading to narrower confidence intervals. Adjusting either for the propensity of getting boosted or directly adjusting for covariates reduced the heterogeneity across VE estimates for the infection outcome. Conclusion: While the advantage of the second monovalent booster is not obvious from the literature review, the first monovalent booster and the bivalent booster appear to offer strong protection against severe COVID-19. Based on both the literature view and data analysis, VE analyses with a severe disease outcome (hospitalization, ICU admission, or death) appear to be more robust to design and analytic choices than an infection endpoint. Test-negative designs can extend to severe disease outcomes and may offer advantages in statistical efficiency when used properly.

15.
Sci Adv ; 9(51): eadj3747, 2023 Dec 22.
Artículo en Inglés | MEDLINE | ID: mdl-38117882

RESUMEN

We investigated the design and analysis of observational booster vaccine effectiveness (VE) studies by performing a scoping review of booster VE literature with a focus on study design and analytic choices. We then applied 20 different approaches, including those found in the literature, to a single dataset from Michigan Medicine. We identified 80 studies in our review, including over 150 million observations in total. We found that while protection against infection is variable and dependent on several factors including the study population and time period, both monovalent boosters and particularly the bivalent booster offer strong protection against severe COVID-19. In addition, VE analyses with a severe disease outcome (hospitalization, intensive care unit admission, or death) appear to be more robust to design and analytic choices than an infection endpoint. In terms of design choices, we found that test-negative designs and their variants may offer advantages in statistical efficiency compared to cohort designs.


Asunto(s)
COVID-19 , Humanos , COVID-19/epidemiología , COVID-19/prevención & control , Hospitalización , Unidades de Cuidados Intensivos , Michigan/epidemiología , Estudios Observacionales como Asunto
16.
Cancer Epidemiol Biomarkers Prev ; 32(6): 748-759, 2023 06 01.
Artículo en Inglés | MEDLINE | ID: mdl-36626383

RESUMEN

BACKGROUND: Studies have shown an increased risk of severe SARS-CoV-2-related (COVID-19) disease outcome and mortality for patients with cancer, but it is not well understood whether associations vary by cancer site, cancer treatment, and vaccination status. METHODS: Using electronic health record data from an academic medical center, we identified a retrospective cohort of 260,757 individuals tested for or diagnosed with COVID-19 from March 10, 2020, to August 1, 2022. Of these, 52,019 tested positive for COVID-19 of whom 13,752 had a cancer diagnosis. We conducted Firth-corrected logistic regression to assess the association between cancer status, site, treatment, vaccination, and four COVID-19 outcomes: hospitalization, intensive care unit admission, mortality, and a composite "severe COVID" outcome. RESULTS: Cancer diagnosis was significantly associated with higher rates of severe COVID, hospitalization, and mortality. These associations were driven by patients whose most recent initial cancer diagnosis was within the past 3 years. Chemotherapy receipt, colorectal cancer, hematologic malignancies, kidney cancer, and lung cancer were significantly associated with higher rates of worse COVID-19 outcomes. Vaccinations were significantly associated with lower rates of worse COVID-19 outcomes regardless of cancer status. CONCLUSIONS: Patients with colorectal cancer, hematologic malignancies, kidney cancer, or lung cancer or who receive chemotherapy for treatment should be cautious because of their increased risk of worse COVID-19 outcomes, even after vaccination. IMPACT: Additional COVID-19 precautions are warranted for people with certain cancer types and treatments. Significant benefit from vaccination is noted for both cancer and cancer-free patients.


Asunto(s)
COVID-19 , Neoplasias Colorrectales , Neoplasias Hematológicas , Neoplasias Renales , Neoplasias Pulmonares , Humanos , COVID-19/epidemiología , SARS-CoV-2 , Estudios Retrospectivos , Hospitalización , Vacunación
17.
J Clin Med ; 12(23)2023 Nov 25.
Artículo en Inglés | MEDLINE | ID: mdl-38068365

RESUMEN

BACKGROUND: Post-Acute Sequelae of COVID-19 (PASC) have emerged as a global public health and healthcare challenge. This study aimed to uncover predictive factors for PASC from multi-modal data to develop a predictive model for PASC diagnoses. METHODS: We analyzed electronic health records from 92,301 COVID-19 patients, covering medical phenotypes, medications, and lab results. We used a Super Learner-based prediction approach to identify predictive factors. We integrated the model outputs into individual and composite risk scores and evaluated their predictive performance. RESULTS: Our analysis identified several factors predictive of diagnoses of PASC, including being overweight/obese and the use of HMG CoA reductase inhibitors prior to COVID-19 infection, and respiratory system symptoms during COVID-19 infection. We developed a composite risk score with a moderate discriminatory ability for PASC (covariate-adjusted AUC (95% confidence interval): 0.66 (0.63, 0.69)) by combining the risk scores based on phenotype and medication records. The combined risk score could identify 10% of individuals with a 2.2-fold increased risk for PASC. CONCLUSIONS: We identified several factors predictive of diagnoses of PASC and integrated the information into a composite risk score for PASC prediction, which could contribute to the identification of individuals at higher risk for PASC and inform preventive efforts.

18.
Epidemiol Health ; 45: e2023074, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37591787

RESUMEN

The Epidemiologic Questionnaire (EPI-Q) was established to collect broad, uniform, self-reported health data to supplement electronic health record (EHR) and genotype information from participants in the University of Michigan (UM) Precision Health cohorts. Recruitment of EPI-Q participants, who were already enrolled in 1 of 3 ongoing UM Precision Health cohorts-the Michigan Genomics Initiative, Mental Health Biobank, and Metabolism, Endocrinology, and Diabetes cohorts-began in March 2020. Of 54,043 retrospective invitations, 5,577 individuals enrolled, representing a 10.3% response rate. Of these, 3,502 (63.7%) were female, and the average age was 56.1 years (standard deviation, 15.4). The baseline survey comprises 11 modules on topics including personal and family health history, lifestyle, and cancer screening and history. Additionally, 11 optional modules cover topics including financial toxicity, occupational exposure, and life meaning. The questions are based on standardized and validated instruments used in other cohorts, and we share resources to expedite development of similar surveys. Data are collected via the MyDataHelps platform, which enables current and future participants to share non-Michigan Medicine EHR data. Recruitment is ongoing. Cohort data are available to those with institutional review board approval; for details, contact the Data Office for Clinical and Translational Research (DataOffice@umich.edu).


Asunto(s)
Registros Electrónicos de Salud , Aplicaciones Móviles , Humanos , Femenino , Persona de Mediana Edad , Masculino , Estudios Retrospectivos , Genotipo , Encuestas y Cuestionarios , Encuestas Epidemiológicas
19.
PLoS One ; 17(7): e0269017, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35877617

RESUMEN

Since the beginning of the Coronavirus Disease 2019 (COVID-19) pandemic, a focus of research has been to identify risk factors associated with COVID-19-related outcomes, such as testing and diagnosis, and use them to build prediction models. Existing studies have used data from digital surveys or electronic health records (EHRs), but very few have linked the two sources to build joint predictive models. In this study, we used survey data on 7,054 patients from the Michigan Genomics Initiative biorepository to evaluate how well self-reported data could be integrated with electronic records for the purpose of modeling COVID-19-related outcomes. We observed that among survey respondents, self-reported COVID-19 diagnosis captured a larger number of cases than the corresponding EHRs, suggesting that self-reported outcomes may be better than EHRs for distinguishing COVID-19 cases from controls. In the modeling context, we compared the utility of survey- and EHR-derived predictor variables in models of survey-reported COVID-19 testing and diagnosis. We found that survey-derived predictors produced uniformly stronger models than EHR-derived predictors-likely due to their specificity, temporal proximity, and breadth-and that combining predictors from both sources offered no consistent improvement compared to using survey-based predictors alone. Our results suggest that, even though general EHRs are useful in predictive models of COVID-19 outcomes, they may not be essential in those models when rich survey data are already available. The two data sources together may offer better prediction for COVID severity, but we did not have enough severe cases in the survey respondents to assess that hypothesis in in our study.


Asunto(s)
COVID-19 , Registros Electrónicos de Salud , COVID-19/diagnóstico , COVID-19/epidemiología , Prueba de COVID-19 , Humanos , Autoinforme , Encuestas y Cuestionarios
20.
Sci Adv ; 8(24): eabp8621, 2022 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-35714183

RESUMEN

India experienced a massive surge in SARS-CoV-2 infections and deaths during April to June 2021 despite having controlled the epidemic relatively well during 2020. Using counterfactual predictions from epidemiological disease transmission models, we produce evidence in support of how strengthening public health interventions early would have helped control transmission in the country and significantly reduced mortality during the second wave, even without harsh lockdowns. We argue that enhanced surveillance at district, state, and national levels and constant assessment of risk associated with increased transmission are critical for future pandemic responsiveness. Building on our retrospective analysis, we provide a tiered data-driven framework for timely escalation of future interventions as a tool for policy-makers.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA