Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 35
Filter
Add more filters

Publication year range
1.
Am J Hum Genet ; 107(5): 815-836, 2020 11 05.
Article in English | MEDLINE | ID: mdl-32991828

ABSTRACT

To facilitate scientific collaboration on polygenic risk scores (PRSs) research, we created an extensive PRS online repository for 35 common cancer traits integrating freely available genome-wide association studies (GWASs) summary statistics from three sources: published GWASs, the NHGRI-EBI GWAS Catalog, and UK Biobank-based GWASs. Our framework condenses these summary statistics into PRSs using various approaches such as linkage disequilibrium pruning/p value thresholding (fixed or data-adaptively optimized thresholds) and penalized, genome-wide effect size weighting. We evaluated the PRSs in two biobanks: the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort at Michigan Medicine, and the population-based UK Biobank (UKB). For each PRS construct, we provide measures on predictive performance and discrimination. Besides PRS evaluation, the Cancer-PRSweb platform features construct downloads and phenome-wide PRS association study results (PRS-PheWAS) for predictive PRSs. We expect this integrated platform to accelerate PRS-related cancer research.


Subject(s)
Biological Specimen Banks/statistics & numerical data , Genetic Predisposition to Disease , Genome, Human , Genomics/methods , Multifactorial Inheritance , Neoplasms/genetics , Adult , Aged , Female , Genome-Wide Association Study , Humans , Internet , Linkage Disequilibrium , Male , Middle Aged , Neoplasms/classification , Neoplasms/diagnosis , Neoplasms/epidemiology , Phenotype , Quantitative Trait, Heritable , Risk Factors , United Kingdom/epidemiology , United States/epidemiology
2.
PLoS Comput Biol ; 18(6): e1010115, 2022 06.
Article in English | MEDLINE | ID: mdl-35658007

ABSTRACT

Infectious disease forecasting is of great interest to the public health community and policymakers, since forecasts can provide insight into disease dynamics in the near future and inform interventions. Due to delays in case reporting, however, forecasting models may often underestimate the current and future disease burden. In this paper, we propose a general framework for addressing reporting delay in disease forecasting efforts with the goal of improving forecasts. We propose strategies for leveraging either historical data on case reporting or external internet-based data to estimate the amount of reporting error. We then describe several approaches for adapting general forecasting pipelines to account for under- or over-reporting of cases. We apply these methods to address reporting delay in data on dengue fever cases in Puerto Rico from 1990 to 2009 and to reports of influenza-like illness (ILI) in the United States between 2010 and 2019. Through a simulation study, we compare method performance and evaluate robustness to assumption violations. Our results show that forecasting accuracy and prediction coverage almost always increase when correction methods are implemented to address reporting delay. Some of these methods required knowledge about the reporting error or high quality external data, which may not always be available. Provided alternatives include excluding recently-reported data and performing sensitivity analysis. This work provides intuition and guidance for handling delay in disease case reporting and may serve as a useful resource to inform practical infectious disease forecasting efforts.


Subject(s)
Communicable Diseases , Influenza, Human , Communicable Diseases/epidemiology , Computer Simulation , Forecasting , Humans , Influenza, Human/epidemiology , Models, Statistical , Public Health , United States
3.
Biometrics ; 78(1): 214-226, 2022 03.
Article in English | MEDLINE | ID: mdl-33179768

ABSTRACT

Health research using electronic health records (EHR) has gained popularity, but misclassification of EHR-derived disease status and lack of representativeness of the study sample can result in substantial bias in effect estimates and can impact power and type I error. In this paper, we develop new strategies for handling disease status misclassification and selection bias in EHR-based association studies. We first focus on each type of bias separately. For misclassification, we propose three novel likelihood-based bias correction strategies. A distinguishing feature of the EHR setting is that misclassification may be related to patient-varying factors, and the proposed methods leverage data in the EHR to estimate misclassification rates without gold standard labels. For addressing selection bias, we describe how calibration and inverse probability weighting methods from the survey sampling literature can be extended and applied to the EHR setting. Addressing misclassification and selection biases simultaneously is a more challenging problem than dealing with each on its own, and we propose several new strategies. For all methods proposed, we derive valid standard error estimators and provide software for implementation. We provide a new suite of statistical estimation and inference strategies for addressing misclassification and selection bias simultaneously that is tailored to problems arising in EHR data analysis. We apply these methods to data from The Michigan Genomics Initiative, a longitudinal EHR-linked biorepository.


Subject(s)
Electronic Health Records , Bias , Humans , Likelihood Functions , Michigan , Selection Bias
4.
Stat Med ; 41(28): 5501-5516, 2022 12 10.
Article in English | MEDLINE | ID: mdl-36131394

ABSTRACT

Electronic health records (EHR) are not designed for population-based research, but they provide easy and quick access to longitudinal health information for a large number of individuals. Many statistical methods have been proposed to account for selection bias, missing data, phenotyping errors, or other problems that arise in EHR data analysis. However, addressing multiple sources of bias simultaneously is challenging. We developed a methodological framework (R package, SAMBA) for jointly handling both selection bias and phenotype misclassification in the EHR setting that leverages external data sources. These methods assume factors related to selection and misclassification are fully observed, but these factors may be poorly understood and partially observed in practice. As a follow-up to the methodological work, we demonstrate how to apply these methods for two real-world case studies, and we evaluate their performance. In both examples, we use individual patient-level data collected through the University of Michigan Health System and various external population-based data sources. In case study (a), we explore the impact of these methods on estimated associations between gender and cancer diagnosis. In case study (b), we compare corrected associations between previously identified genetic loci and age-related macular degeneration with gold standard external summary estimates. These case studies illustrate how to utilize diverse auxiliary information to achieve less biased inference in EHR-based research.


Subject(s)
Electronic Health Records , Information Storage and Retrieval , Selection Bias , Bias , Phenotype
5.
Stat Med ; 41(13): 2317-2337, 2022 06 15.
Article in English | MEDLINE | ID: mdl-35224743

ABSTRACT

False negative rates of severe acute respiratory coronavirus 2 diagnostic tests, together with selection bias due to prioritized testing can result in inaccurate modeling of COVID-19 transmission dynamics based on reported "case" counts. We propose an extension of the widely used Susceptible-Exposed-Infected-Removed (SEIR) model that accounts for misclassification error and selection bias, and derive an analytic expression for the basic reproduction number R0 as a function of false negative rates of the diagnostic tests and selection probabilities for getting tested. Analyzing data from the first two waves of the pandemic in India, we show that correcting for misclassification and selection leads to more accurate prediction in a test sample. We provide estimates of undetected infections and deaths between April 1, 2020 and August 31, 2021. At the end of the first wave in India, the estimated under-reporting factor for cases was at 11.1 (95% CI: 10.7,11.5) and for deaths at 3.58 (95% CI: 3.5,3.66) as of February 1, 2021, while they change to 19.2 (95% CI: 17.9, 19.9) and 4.55 (95% CI: 4.32, 4.68) as of July 1, 2021. Equivalently, 9.0% (95% CI: 8.7%, 9.3%) and 5.2% (95% CI: 5.0%, 5.6%) of total estimated infections were reported on these two dates, while 27.9% (95% CI: 27.3%, 28.6%) and 22% (95% CI: 21.4%, 23.1%) of estimated total deaths were reported. Extensive simulation studies demonstrate the effect of misclassification and selection on estimation of R0 and prediction of future infections. A R-package SEIRfansy is developed for broader dissemination.


Subject(s)
COVID-19 , Basic Reproduction Number , COVID-19/diagnosis , COVID-19/epidemiology , Humans , India/epidemiology , Pandemics , SARS-CoV-2
6.
PLoS Genet ; 15(6): e1008202, 2019 06.
Article in English | MEDLINE | ID: mdl-31194742

ABSTRACT

Polygenic risk scores (PRS) are designed to serve as single summary measures that are easy to construct, condensing information from a large number of genetic variants associated with a disease. They have been used for stratification and prediction of disease risk. The primary focus of this paper is to demonstrate how we can combine PRS and electronic health records data to better understand the shared and unique genetic architecture and etiology of disease subtypes that may be both related and heterogeneous. PRS construction strategies often depend on the purpose of the study, the available data/summary estimates, and the underlying genetic architecture of a disease. We consider several choices for constructing a PRS using data obtained from various publicly-available sources including the UK Biobank and evaluate their abilities to predict not just the primary phenotype but also secondary phenotypes derived from electronic health records (EHR). This study was conducted using data from 30,702 unrelated, genotyped patients of recent European descent from the Michigan Genomics Initiative (MGI), a longitudinal biorepository effort within Michigan Medicine. We examine the three most common skin cancer subtypes in the USA: basal cell carcinoma, cutaneous squamous cell carcinoma, and melanoma. Using these PRS for various skin cancer subtypes, we conduct a phenome-wide association study (PheWAS) within the MGI data to evaluate PRS associations with secondary traits. PheWAS results are then replicated using population-based UK Biobank data and compared across various PRS construction methods. We develop an accompanying visual catalog called PRSweb that provides detailed PheWAS results and allows users to directly compare different PRS construction methods.


Subject(s)
Genetic Predisposition to Disease , Genomics , Multifactorial Inheritance/genetics , Skin Neoplasms/genetics , Biological Specimen Banks , Electronic Health Records , Genome-Wide Association Study , Genotype , Humans , Michigan/epidemiology , Phenotype , Polymorphism, Single Nucleotide/genetics , Risk Factors , Skin Neoplasms/pathology , United Kingdom/epidemiology
7.
Biometrics ; 77(4): 1342-1354, 2021 12.
Article in English | MEDLINE | ID: mdl-32920819

ABSTRACT

Multiple imputation by chained equations (MICE) has emerged as a popular approach for handling missing data. A central challenge for applying MICE is determining how to incorporate outcome information into covariate imputation models, particularly for complicated outcomes. Often, we have a particular analysis model in mind, and we would like to ensure congeniality between the imputation and analysis models. We propose a novel strategy for directly incorporating the analysis model into the handling of missing data. In our proposed approach, multiple imputations of missing covariates are obtained without using outcome information. We then utilize the strategy of imputation stacking, where multiple imputations are stacked on top of each other to create a large data set. The analysis model is then incorporated through weights. Instead of applying Rubin's combining rules, we obtain parameter estimates by fitting a weighted version of the analysis model on the stacked data set. We propose a novel estimator for obtaining standard errors for this stacked and weighted analysis. Our estimator is based on the observed data information principle in Louis' work and can be applied for analyzing stacked multiple imputations more generally. Our approach for analyzing stacked multiple imputations is the first method that can be easily applied (using R package StackImpute) for a wide variety of standard analysis models and missing data settings.


Subject(s)
Models, Statistical , Research Design
8.
Stat Med ; 40(27): 6118-6132, 2021 11 30.
Article in English | MEDLINE | ID: mdl-34459011

ABSTRACT

Not-at-random missingness presents a challenge in addressing missing data in many health research applications. In this article, we propose a new approach to account for not-at-random missingness after multiple imputation through weighted analysis of stacked multiple imputations. The weights are easily calculated as a function of the imputed data and assumptions about the not-at-random missingness. We demonstrate through simulation that the proposed method has excellent performance when the missingness model is correctly specified. In practice, the missingness mechanism will not be known. We show how we can use our approach in a sensitivity analysis framework to evaluate the robustness of model inference to different assumptions about the missingness mechanism, and we provide R package StackImpute to facilitate implementation as part of routine sensitivity analyses. We apply the proposed method to account for not-at-random missingness in human papillomavirus test results in a study of survival for patients diagnosed with oropharyngeal cancer.


Subject(s)
Models, Statistical , Research Design , Computer Simulation , Data Interpretation, Statistical , Humans
9.
J Biomed Inform ; 113: 103652, 2021 01.
Article in English | MEDLINE | ID: mdl-33279681

ABSTRACT

BACKGROUND: Traditional methods for disease risk prediction and assessment, such as diagnostic tests using serum, urine, blood, saliva or imaging biomarkers, have been important for identifying high-risk individuals for many diseases, leading to early detection and improved survival. For pancreatic cancer, traditional methods for screening have been largely unsuccessful in identifying high-risk individuals in advance of disease progression leading to high mortality and poor survival. Electronic health records (EHR) linked to genetic profiles provide an opportunity to integrate multiple sources of patient information for risk prediction and stratification. We leverage a constellation of temporally associated diagnoses available in the EHR to construct a summary risk score, called a phenotype risk score (PheRS), for identifying individuals at high-risk for having pancreatic cancer. The proposed PheRS approach incorporates the time with respect to disease onset into the prediction framework. We combine and contrast the PheRS with more well-known measures of inherited susceptibility, namely, the polygenic risk scores (PRS) for prediction of pancreatic cancer. METHODOLOGY: We first calculated pairwise, unadjusted associations between pancreatic cancer diagnosis and all possible other diagnoses across the medical phenome. We call these pairwise associations co-occurrences. After accounting for cross-phenotype correlations, the multivariable association estimates from a subset of relatively independent diagnoses were used to create a weighted sum PheRS. We constructed time-restricted risk scores using data from 38,359 participants in the Michigan Genomics Initiative (MGI) based on the diagnoses contained in the EHR at 0, 1, 2, and 5 years prior to the target pancreatic cancer diagnosis. The PheRS was assessed for predictability in the UK Biobank (UKB). We tested the relative contribution of PheRS when added to a model containing a summary measure of inherited genetic susceptibility (PRS) plus other covariates like age, sex, smoking status, drinking status, and body mass index (BMI). RESULTS: Our exploration of co-occurrence patterns identified expected associations while also revealing unexpected relationships that may warrant closer attention. Solely using the pancreatic cancer PheRS at 5 years before the target diagnoses yielded an AUC of 0.60 (95% CI = [0.58, 0.62]) in UKB. A larger predictive model including PheRS, PRS, and the covariates at the 5-year threshold achieved an AUC of 0.74 (95% CI = [0.72, 0.76]) in UKB. We note that PheRS does contribute independently in the joint model. Finally, scores at the top percentiles of the PheRS distribution demonstrated promise in terms of risk stratification. Scores in the top 2% were 10.20 (95% CI = [9.34, 12.99]) times more likely to identify cases than those in the bottom 98% in UKB at the 5-year threshold prior to pancreatic cancer diagnosis. CONCLUSIONS: We developed a framework for creating a time-restricted PheRS from EHR data for pancreatic cancer using the rich information content of a medical phenome. In addition to identifying hypothesis-generating associations for future research, this PheRS demonstrates a potentially important contribution in identifying high-risk individuals, even after adjusting for PRS for pancreatic cancer and other traditional epidemiologic covariates. The methods are generalizable to other phenotypic traits.


Subject(s)
Electronic Health Records , Pancreatic Neoplasms , Biological Specimen Banks , Genome-Wide Association Study , Humans , Michigan , Pancreatic Neoplasms/genetics , Phenotype , Risk Factors
10.
Biostatistics ; 20(3): 416-432, 2019 07 01.
Article in English | MEDLINE | ID: mdl-29584820

ABSTRACT

Multistate cure models are multistate models in which transitions into one or more of the states cannot occur for a fraction of the population. In the study of cancer, multistate cure models can be used to identify factors related to the rate of cancer recurrence, the rate of death before and after recurrence, and the probability of being cured by initial treatment. However, the previous method for fitting multistate cure models requires substantial custom programming, making these valuable models less accessible to analysts. In this article, we present an Expectation-Maximization (EM) algorithm for fitting the multistate cure model using maximum likelihood. The proposed algorithm makes use of a weighted likelihood representation allowing it to be easily implemented with standard software and can incorporate either parametric or non-parametric baseline hazards for the state transition rates. A common complicating feature in cancer studies is that the follow-up times for recurrence and death may differ. Additionally, we may have missingness in the covariates. We propose a Monte Carlo EM (MCEM) algorithm for fitting the multistate cure model in the presence of covariate missingness and/or unequal follow-up of the two outcomes, we describe a novel approach for obtaining standard errors, and we provide some software. Simulations demonstrate good algorithmic performance as long as the modeling assumptions are sufficiently restrictive. We apply the proposed algorithm to a study of recurrence and death in patients with head and neck cancer.


Subject(s)
Algorithms , Biostatistics/methods , Models, Biological , Models, Statistical , Head and Neck Neoplasms/mortality , Head and Neck Neoplasms/pathology , Head and Neck Neoplasms/therapy , Humans , Likelihood Functions , Monte Carlo Method
11.
Stat Med ; 39(14): 1965-1979, 2020 06 30.
Article in English | MEDLINE | ID: mdl-32198773

ABSTRACT

Large-scale association analyses based on observational health care databases such as electronic health records have been a topic of increasing interest in the scientific community. However, challenges due to nonprobability sampling and phenotype misclassification associated with the use of these data sources are often ignored in standard analyses. The extent of the bias introduced by ignoring these factors is not well-characterized. In this paper, we develop an analytic framework for characterizing the bias expected in disease-gene association studies based on electronic health records when disease status misclassification and the sampling mechanism are ignored. Through a sensitivity analysis approach, this framework can be used to obtain plausible values for parameters of interest given summary results from standard analysis. We develop an online tool for performing this sensitivity analysis. Simulations demonstrate promising properties of the proposed method. We apply our approach to study bias in disease-gene association studies using electronic health record data from the Michigan Genomics Initiative, a longitudinal biorepository effort within The University Michigan health system.


Subject(s)
Electronic Health Records , Genome-Wide Association Study , Bias , Michigan , Phenotype , Polymorphism, Single Nucleotide
12.
Stat Med ; 39(6): 773-800, 2020 03 15.
Article in English | MEDLINE | ID: mdl-31859414

ABSTRACT

Biobanks linked to electronic health records provide rich resources for health-related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis-generating studies of disease-treatment, disease-exposure, and disease-gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank-based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank-based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.


Subject(s)
Biological Specimen Banks , Electronic Health Records , Genomics , Michigan , Research Design
13.
Cancer ; 125(1): 68-78, 2019 01 01.
Article in English | MEDLINE | ID: mdl-30291798

ABSTRACT

BACKGROUND: Accurate, individualized prognostication in patients with oropharyngeal squamous cell carcinoma (OPSCC) is vital for patient counseling and treatment decision making. With the emergence of human papillomavirus (HPV) as an important biomarker in OPSCC, calculators incorporating this variable have been developed. However, it is critical to characterize their accuracy prior to implementation. METHODS: Four OPSCC calculators were identified that integrate HPV into their estimation of 5-year overall survival. Treatment outcomes for 856 patients with OPSCC who were evaluated at a single institution from 2003 through 2016 were analyzed. Predicted survival probabilities were generated for each patient using each calculator. Calculator performance was assessed and compared using Kaplan-Meier plots, receiver operating characteristic curves, concordance statistics, and calibration plots. RESULTS: Correlation between pairs of calculators varied, with coefficients ranging from 0.63 to 0.90. Only 3 of 6 pairs of calculators yielded predictions within 10% of each other for at least 50% of patients. Kaplan-Meier curves of calculator-defined risk groups demonstrated reasonable stratification. Areas under the receiver operating characteristic curve ranged from 0.74 to 0.80, and concordance statistics ranged from 0.71 to 0.78. Each calculator demonstrated superior discriminatory ability compared with clinical staging according to the seventh and eighth editions of the American Joint Committee on Cancer staging manual. Among models, the Denmark calculator was found to be best calibrated to observed outcomes. CONCLUSIONS: Existing calculators exhibited reasonable estimation of survival in patients with OPSCC, but there was considerable variability in predictions for individual patients, which limits the clinical usefulness of these calculators. Given the increasing role of personalized treatment in patients with OPSCC, further work is needed to improve accuracy and precision, possibly through the identification and incorporation of additional biomarkers.


Subject(s)
Carcinoma, Squamous Cell/therapy , Carcinoma, Squamous Cell/virology , Oropharyngeal Neoplasms/therapy , Papillomavirus Infections/therapy , Aged , Area Under Curve , Carcinoma, Squamous Cell/mortality , Female , Humans , Male , Middle Aged , Oropharyngeal Neoplasms/mortality , Oropharyngeal Neoplasms/virology , Papillomavirus Infections/mortality , Precision Medicine , Prognosis , Prospective Studies , Survival Analysis , Treatment Outcome
14.
Cancer ; 124(4): 706-716, 2018 02 15.
Article in English | MEDLINE | ID: mdl-29112231

ABSTRACT

BACKGROUND: Accurate prognostication is essential to the optimal management of laryngeal cancer. Predictive models have been developed to calculate the risk of oncologic outcomes, but extensive external validation of accuracy and reliability is necessary before implementing them into clinical practice. METHOD: Four published prognostic calculators that predict 5-year overall survival for patients with laryngeal cancer were evaluated using patient information from a prospective epidemiology study cohort (n = 246; median follow-up, 60 months) with previously untreated, stage I through IVb laryngeal squamous cell carcinoma. RESULTS: Different calculators yielded substantially different predictions for individual patients. The observed 5-year overall survival was significantly higher than the averaged predicted 5-year overall survival of the 4 calculators (71.9%; 95% confidence interval [CI], 65%-78%] vs 47.7%). Statistical analyses demonstrated the calculators' limited capacity to discriminate outcomes for risk-stratified patients. The area under the receiver operating characteristic curve ranged from 0.68 to 0.72. C-index values were similar for each of the 4 models (range, 0.66-0.68). There was a lower than expected hazard of death for patients who received induction (bioselective) chemotherapy (hazard ratio, 0.46; 95% CI, 0.24-0.88; P = .024) or primary surgical intervention (hazard ratio, 0.43; 95 % CI, 0.21-0.90; P = .024) compared with those who received concurrent chemoradiation. CONCLUSIONS: Suboptimal reliability and accuracy limit the integration of existing individualized prediction tools into routine clinical decision making. The calculators predicted significantly worse than observed survival among patients who received induction chemotherapy and primary surgery, suggesting a need for updated consideration of modern treatment modalities. Further development of individualized prognostic calculators may improve risk prediction, treatment planning, and counseling for patients with laryngeal cancer. Cancer 2018;124:706-16. © 2017 American Cancer Society.


Subject(s)
Laryngeal Neoplasms/surgery , Laryngeal Neoplasms/therapy , Outcome Assessment, Health Care/methods , Risk Assessment/methods , Aged , Chemoradiotherapy/methods , Female , Follow-Up Studies , Humans , Induction Chemotherapy/methods , Kaplan-Meier Estimate , Male , Middle Aged , Outcome Assessment, Health Care/statistics & numerical data , Prognosis , Prospective Studies , Risk Assessment/statistics & numerical data , Risk Factors
15.
Stat Med ; 35(26): 4701-4717, 2016 11 20.
Article in English | MEDLINE | ID: mdl-27439726

ABSTRACT

We explore several approaches for imputing partially observed covariates when the outcome of interest is a censored event time and when there is an underlying subset of the population that will never experience the event of interest. We call these subjects 'cured', and we consider the case where the data are modeled using a Cox proportional hazards (CPH) mixture cure model. We study covariate imputation approaches using fully conditional specification. We derive the exact conditional distribution and suggest a sampling scheme for imputing partially observed covariates in the CPH cure model setting. We also propose several approximations to the exact distribution that are simpler and more convenient to use for imputation. A simulation study demonstrates that the proposed imputation approaches outperform existing imputation approaches for survival data without a cure fraction in terms of bias in estimating CPH cure model parameters. We apply our multiple imputation techniques to a study of patients with head and neck cancer. Copyright © 2016 John Wiley & Sons, Ltd.


Subject(s)
Data Interpretation, Statistical , Proportional Hazards Models , Bias , Head and Neck Neoplasms/therapy , Humans
16.
Support Care Cancer ; 24(11): 4669-78, 2016 11.
Article in English | MEDLINE | ID: mdl-27378380

ABSTRACT

PURPOSE: The objectives of this study are to describe racial/ethnic differences and clinical/treatment correlates of worry about recurrence and examine modifiable factors in the health care experience to reduce worry among breast cancer survivors, partners, and pairs. METHODS: Women with non-metastatic breast cancer identified by the Detroit and Los Angeles SEER registries between 6/05 and 2/07 were surveyed at 9 months and 4 years. Latina and Black women were oversampled. Partners were surveyed at time 2. Worry about recurrence was regressed on sociodemographics, clinical/treatment, and modifiable factors (e.g., emotional support received by providers) among survivors, partners, and pairs. RESULTS: The final sample included 510 pairs. Partners reported more worry about recurrence than survivors. Compared to Whites, Latinas(os) were more likely to report worry and Blacks were less likely to report worry (all p < 0.05). Partners of survivors who received chemotherapy reported more worry (OR = 2.47 [1.45, 4.22]). Among modifiable factors, survivors and pairs who received more emotional support from providers were less likely to report worry than those survivors and pairs who did not receive such support (OR = 0.56 [0.32, 0.97]) and (OR = 0.45 [0.23,0.85]), respectively. CONCLUSIONS: Early identification of survivors and partners who are reporting considerable worry about recurrence can lead to targeted culturally sensitive interventions to avoid poorer outcomes. Interventions focused on health care providers offering information on risk and emotional support to survivors and partners is warranted.


Subject(s)
Breast Neoplasms/psychology , Neoplasm Recurrence, Local/psychology , Survivors/psychology , Adult , Aged , Anxiety/psychology , Breast Neoplasms/ethnology , Breast Neoplasms/mortality , Ethnicity , Female , Humans , Middle Aged , Risk , Surveys and Questionnaires , Young Adult
17.
Ann Appl Stat ; 18(3): 1858-1878, 2024 Sep.
Article in English | MEDLINE | ID: mdl-39149424

ABSTRACT

Electronic health records (EHRs) are increasingly recognized as a cost-effective resource for patient recruitment in clinical research. However, how to optimally select a cohort from millions of individuals to answer a scientific question of interest remains unclear. Consider a study to estimate the mean or mean difference of an expensive outcome. Inexpensive auxiliary covariates predictive of the outcome may often be available in patients' health records, presenting an opportunity to recruit patients selectively, which may improve efficiency in downstream analyses. In this paper we propose a two-phase sampling design that leverages available information on auxiliary covariates in EHR data. A key challenge in using EHR data for multiphase sampling is the potential selection bias, because EHR data are not necessarily representative of the target population. Extending existing literature on two-phase sampling design, we derive an optimal two-phase sampling method that improves efficiency over random sampling while accounting for the potential selection bias in EHR data. We demonstrate the efficiency gain from our sampling design via simulation studies and an application evaluating the prevalence of hypertension among U.S. adults leveraging data from the Michigan Genomics Initiative, a longitudinal biorepository in Michigan Medicine.

18.
Cancer Epidemiol Biomarkers Prev ; 32(6): 748-759, 2023 06 01.
Article in English | MEDLINE | ID: mdl-36626383

ABSTRACT

BACKGROUND: Studies have shown an increased risk of severe SARS-CoV-2-related (COVID-19) disease outcome and mortality for patients with cancer, but it is not well understood whether associations vary by cancer site, cancer treatment, and vaccination status. METHODS: Using electronic health record data from an academic medical center, we identified a retrospective cohort of 260,757 individuals tested for or diagnosed with COVID-19 from March 10, 2020, to August 1, 2022. Of these, 52,019 tested positive for COVID-19 of whom 13,752 had a cancer diagnosis. We conducted Firth-corrected logistic regression to assess the association between cancer status, site, treatment, vaccination, and four COVID-19 outcomes: hospitalization, intensive care unit admission, mortality, and a composite "severe COVID" outcome. RESULTS: Cancer diagnosis was significantly associated with higher rates of severe COVID, hospitalization, and mortality. These associations were driven by patients whose most recent initial cancer diagnosis was within the past 3 years. Chemotherapy receipt, colorectal cancer, hematologic malignancies, kidney cancer, and lung cancer were significantly associated with higher rates of worse COVID-19 outcomes. Vaccinations were significantly associated with lower rates of worse COVID-19 outcomes regardless of cancer status. CONCLUSIONS: Patients with colorectal cancer, hematologic malignancies, kidney cancer, or lung cancer or who receive chemotherapy for treatment should be cautious because of their increased risk of worse COVID-19 outcomes, even after vaccination. IMPACT: Additional COVID-19 precautions are warranted for people with certain cancer types and treatments. Significant benefit from vaccination is noted for both cancer and cancer-free patients.


Subject(s)
COVID-19 , Colorectal Neoplasms , Hematologic Neoplasms , Kidney Neoplasms , Lung Neoplasms , Humans , COVID-19/epidemiology , SARS-CoV-2 , Retrospective Studies , Hospitalization , Vaccination
19.
PLoS One ; 18(1): e0279894, 2023.
Article in English | MEDLINE | ID: mdl-36603015

ABSTRACT

The COVID-19 pandemic has highlighted a need for better understanding of countries' vulnerability and resilience to not only pandemics but also disasters, climate change, and other systemic shocks. A comprehensive characterization of vulnerability can inform efforts to improve infrastructure and guide disaster response in the future. In this paper, we propose a data-driven framework for studying countries' vulnerability and resilience to incident disasters across multiple dimensions of society. To illustrate this methodology, we leverage the rich data landscape surrounding the COVID-19 pandemic to characterize observed resilience for several countries (USA, Brazil, India, Sweden, New Zealand, and Israel) as measured by pandemic impacts across a variety of social, economic, and political domains. We also assess how observed responses and outcomes (i.e., resilience) of the COVID-19 pandemic are associated with pre-pandemic characteristics or vulnerabilities, including (1) prior risk for adverse pandemic outcomes due to population density and age and (2) the systems in place prior to the pandemic that may impact the ability to respond to the crisis, including health infrastructure and economic capacity. Our work demonstrates the importance of viewing vulnerability and resilience in a multi-dimensional way, where a country's resources and outcomes related to vulnerability and resilience can differ dramatically across economic, political, and social domains. This work also highlights key gaps in our current understanding about vulnerability and resilience and a need for data-driven, context-specific assessments of disaster vulnerability in the future.


Subject(s)
COVID-19 , Disasters , Humans , COVID-19/epidemiology , Pandemics , Brazil/epidemiology , India
20.
Cancer Inform ; 22: 11769351231183847, 2023.
Article in English | MEDLINE | ID: mdl-37426052

ABSTRACT

Background: In recent years, interest in prognostic calculators for predicting patient health outcomes has grown with the popularity of personalized medicine. These calculators, which can inform treatment decisions, employ many different methods, each of which has advantages and disadvantages. Methods: We present a comparison of a multistate model (MSM) and a random survival forest (RSF) through a case study of prognostic predictions for patients with oropharyngeal squamous cell carcinoma. The MSM is highly structured and takes into account some aspects of the clinical context and knowledge about oropharyngeal cancer, while the RSF can be thought of as a black-box non-parametric approach. Key in this comparison are the high rate of missing values within these data and the different approaches used by the MSM and RSF to handle missingness. Results: We compare the accuracy (discrimination and calibration) of survival probabilities predicted by both approaches and use simulation studies to better understand how predictive accuracy is influenced by the approach to (1) handling missing data and (2) modeling structural/disease progression information present in the data. We conclude that both approaches have similar predictive accuracy, with a slight advantage going to the MSM. Conclusions: Although the MSM shows slightly better predictive ability than the RSF, consideration of other differences are key when selecting the best approach for addressing a specific research question. These key differences include the methods' ability to incorporate domain knowledge, and their ability to handle missing data as well as their interpretability, and ease of implementation. Ultimately, selecting the statistical method that has the most potential to aid in clinical decisions requires thoughtful consideration of the specific goals.

SELECTION OF CITATIONS
SEARCH DETAIL