Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 226
Filter
Add more filters

Publication year range
1.
EMBO J ; 41(22): e108040, 2022 11 17.
Article in English | MEDLINE | ID: mdl-36215697

ABSTRACT

The ribonuclease DIS3 is one of the most frequently mutated genes in the hematological cancer multiple myeloma, yet the basis of its tumor suppressor function in this disease remains unclear. Herein, exploiting the TCGA dataset, we found that DIS3 plays a prominent role in the DNA damage response. DIS3 inactivation causes genomic instability by increasing mutational load, and a pervasive accumulation of DNA:RNA hybrids that induces genomic DNA double-strand breaks (DSBs). DNA:RNA hybrid accumulation also prevents binding of the homologous recombination (HR) machinery to double-strand breaks, hampering DSB repair. DIS3-inactivated cells become sensitive to PARP inhibitors, suggestive of a defect in homologous recombination repair. Accordingly, multiple myeloma patient cells mutated for DIS3 harbor an increased mutational burden and a pervasive overexpression of pro-inflammatory interferon, correlating with the accumulation of DNA:RNA hybrids. We propose DIS3 loss in myeloma to be a driving force for tumorigenesis via DNA:RNA hybrid-dependent enhanced genome instability and increased mutational rate. At the same time, DIS3 loss represents a liability that might be therapeutically exploited in patients whose cancer cells harbor DIS3 mutations.


Subject(s)
Multiple Myeloma , Humans , Multiple Myeloma/genetics , Multiple Myeloma/pathology , Ribonucleases/metabolism , Recombinational DNA Repair , Homologous Recombination , Genomic Instability , DNA Repair , DNA/metabolism , RNA , Exosome Multienzyme Ribonuclease Complex/metabolism
2.
Genet Epidemiol ; 2024 Apr 23.
Article in English | MEDLINE | ID: mdl-38654400

ABSTRACT

Multigene panel testing now allows efficient testing of many cancer susceptibility genes leading to a larger number of mutation carriers being identified. They need to be counseled about their cancer risk conferred by the specific gene mutation. An important cancer susceptibility gene is PALB2. Multiple studies reported risk estimates for breast cancer (BC) conferred by pathogenic variants in PALB2. Due to the diverse modalities of reported risk estimates (age-specific risk, odds ratio, relative risk, and standardized incidence ratio) and effect sizes, a meta-analysis combining these estimates is necessary to accurately counsel patients with this mutation. However, this is not trivial due to heterogeneity of studies in terms of study design and risk measure. We utilized a recently proposed Bayesian random-effects meta-analysis method that can synthesize estimates from such heterogeneous studies. We applied this method to combine estimates from 12 studies on BC risk for carriers of pathogenic PALB2 mutations. The estimated overall (meta-analysis-based) risk of BC is 12.80% (6.11%-22.59%) by age 50 and 48.47% (36.05%-61.74%) by age 80. Pathogenic mutations in PALB2 makes women more susceptible to BC. Our risk estimates can help clinically manage patients carrying pathogenic variants in PALB2.

3.
Blood ; 141(14): 1724-1736, 2023 04 06.
Article in English | MEDLINE | ID: mdl-36603186

ABSTRACT

High-dose melphalan (HDM) improves progression-free survival in multiple myeloma (MM), yet melphalan is a DNA-damaging alkylating agent; therefore, we assessed its mutational effect on surviving myeloma cells by analyzing paired MM samples collected at diagnosis and relapse in the IFM 2009 study. We performed deep whole-genome sequencing on samples from 68 patients, 43 of whom were treated with RVD (lenalidomide, bortezomib, and dexamethasone) and 25 with RVD + HDM. Although the number of mutations was similar at diagnosis in both groups (7137 vs 7230; P = .67), the HDM group had significantly more mutations at relapse (9242 vs 13 383, P = .005). No change in the frequency of copy number alterations or structural variants was observed. The newly acquired mutations were typically associated with DNA damage and double-stranded breaks and were predominantly on the transcribed strand. A machine learning model, using this unique pattern, predicted patients who would receive HDM with high sensitivity, specificity, and positive prediction value. Clonal evolution analysis showed that all patients treated with HDM had clonal selection, whereas a static progression was observed with RVD. A significantly higher percentage of mutations were subclonal in the HDM cohort. Intriguingly, patients treated with HDM who achieved complete remission (CR) had significantly more mutations at relapse yet had similar survival rates as those treated with RVD who achieved CR. This similarity could have been due to HDM relapse samples having significantly more neoantigens. Overall, our study identifies increased genomic changes associated with HDM and provides rationale to further understand clonal complexity.


Subject(s)
Multiple Myeloma , Humans , Multiple Myeloma/drug therapy , Multiple Myeloma/genetics , Multiple Myeloma/diagnosis , Melphalan/therapeutic use , Neoplasm Recurrence, Local/drug therapy , Neoplasm Recurrence, Local/genetics , Bortezomib/therapeutic use , Lenalidomide/therapeutic use , Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Chronic Disease , Transplantation, Autologous , Dexamethasone/therapeutic use
4.
J Natl Compr Canc Netw ; 22(3): 158-166, 2024 04.
Article in English | MEDLINE | ID: mdl-38626807

ABSTRACT

BACKGROUND: Pancreatic adenocarcinoma (PC) is a highly lethal malignancy with a survival rate of only 12%. Surveillance is recommended for high-risk individuals (HRIs), but it is not widely adopted. To address this unmet clinical need and drive early diagnosis research, we established the Pancreatic Cancer Early Detection (PRECEDE) Consortium. METHODS: PRECEDE is a multi-institutional international collaboration that has undertaken an observational prospective cohort study. Individuals (aged 18-90 years) are enrolled into 1 of 7 cohorts based on family history and pathogenic germline variant (PGV) status. From April 1, 2020, to November 21, 2022, a total of 3,402 participants were enrolled in 1 of 7 study cohorts, with 1,759 (51.7%) meeting criteria for the highest-risk cohort (Cohort 1). Cohort 1 HRIs underwent germline testing and pancreas imaging by MRI/MR-cholangiopancreatography or endoscopic ultrasound. RESULTS: A total of 1,400 participants in Cohort 1 (79.6%) had completed baseline imaging and were subclassified into 3 groups based on familial PC (FPC; n=670), a PGV and FPC (PGV+/FPC+; n=115), and a PGV with a pedigree that does not meet FPC criteria (PGV+/FPC-; n=615). One HRI was diagnosed with stage IIB PC on study entry, and 35.1% of HRIs harbored pancreatic cysts. Increasing age (odds ratio, 1.05; P<.001) and FPC group assignment (odds ratio, 1.57; P<.001; relative to PGV+/FPC-) were independent predictors of harboring a pancreatic cyst. CONCLUSIONS: PRECEDE provides infrastructure support to increase access to clinical surveillance for HRIs worldwide, while aiming to drive early PC detection advancements through longitudinal standardized clinical data, imaging, and biospecimen captures. Increased cyst prevalence in HRIs with FPC suggests that FPC may infer distinct biological processes. To enable the development of PC surveillance approaches better tailored to risk category, we recommend adoption of subclassification of HRIs into FPC, PGV+/FPC+, and PGV+/FPC- risk groups by surveillance protocols.


Subject(s)
Adenocarcinoma , Pancreatic Neoplasms , Humans , Pancreatic Neoplasms/diagnostic imaging , Pancreatic Neoplasms/epidemiology , Early Detection of Cancer/methods , Prospective Studies , Genetic Predisposition to Disease , Magnetic Resonance Imaging
5.
Biometrics ; 80(2)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38819308

ABSTRACT

Multi-gene panel testing allows many cancer susceptibility genes to be tested quickly at a lower cost making such testing accessible to a broader population. Thus, more patients carrying pathogenic germline mutations in various cancer-susceptibility genes are being identified. This creates a great opportunity, as well as an urgent need, to counsel these patients about appropriate risk-reducing management strategies. Counseling hinges on accurate estimates of age-specific risks of developing various cancers associated with mutations in a specific gene, ie, penetrance estimation. We propose a meta-analysis approach based on a Bayesian hierarchical random-effects model to obtain penetrance estimates by integrating studies reporting different types of risk measures (eg, penetrance, relative risk, odds ratio) while accounting for the associated uncertainties. After estimating posterior distributions of the parameters via a Markov chain Monte Carlo algorithm, we estimate penetrance and credible intervals. We investigate the proposed method and compare with an existing approach via simulations based on studies reporting risks for two moderate-risk breast cancer susceptibility genes, ATM and PALB2. Our proposed method is far superior in terms of coverage probability of credible intervals and mean square error of estimates. Finally, we apply our method to estimate the penetrance of breast cancer among carriers of pathogenic mutations in the ATM gene.


Subject(s)
Bayes Theorem , Genetic Predisposition to Disease , Penetrance , Humans , Genetic Predisposition to Disease/genetics , Ataxia Telangiectasia Mutated Proteins/genetics , Breast Neoplasms/genetics , Female , Fanconi Anemia Complementation Group N Protein/genetics , Computer Simulation , Markov Chains , Neoplasms/genetics , Neoplasms/epidemiology , Tumor Suppressor Proteins/genetics , Risk Assessment/methods , Risk Assessment/statistics & numerical data , Monte Carlo Method , Meta-Analysis as Topic , Germ-Line Mutation , Models, Statistical
6.
Biometrics ; 80(2)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38819314

ABSTRACT

The five discussions of our paper provide several modeling alternatives, extensions, and generalizations that can potentially guide future research in meta-analysis. In this rejoinder, we briefly summarize and comment on some of those points.


Subject(s)
Meta-Analysis as Topic , Neoplasms , Penetrance , Humans , Neoplasms/epidemiology , Models, Statistical , Risk Assessment/statistics & numerical data , Genetic Predisposition to Disease
7.
Stat Med ; 43(9): 1774-1789, 2024 Apr 30.
Article in English | MEDLINE | ID: mdl-38396313

ABSTRACT

It is increasingly common to encounter prediction tasks in the biomedical sciences for which multiple datasets are available for model training. Common approaches such as pooling datasets before model fitting can produce poor out-of-study prediction performance when datasets are heterogeneous. Theoretical and applied work has shown multistudy ensembling to be a viable alternative that leverages the variability across datasets in a manner that promotes model generalizability. Multistudy ensembling uses a two-stage stacking strategy which fits study-specific models and estimates ensemble weights separately. This approach ignores, however, the ensemble properties at the model-fitting stage, potentially resulting in performance losses. Motivated by challenges in the estimation of COVID-attributable mortality, we propose optimal ensemble construction, an approach to multistudy stacking whereby we jointly estimate ensemble weights and parameters associated with study-specific models. We prove that limiting cases of our approach yield existing methods such as multistudy stacking and pooling datasets before model fitting. We propose an efficient block coordinate descent algorithm to optimize the loss function. We use our method to perform multicountry COVID-19 baseline mortality prediction. We show that when little data is available for a country before the onset of the pandemic, leveraging data from other countries can substantially improve prediction accuracy. We further compare and characterize the method's performance in data-driven simulations and other numerical experiments. Our method remains competitive with or outperforms multistudy stacking and other earlier methods in the COVID-19 data application and in a range of simulation settings.


Subject(s)
Algorithms , COVID-19 , Humans , Computer Simulation
8.
Genet Epidemiol ; 46(7): 395-414, 2022 10.
Article in English | MEDLINE | ID: mdl-35583099

ABSTRACT

Risk evaluation to identify individuals who are at greater risk of cancer as a result of heritable pathogenic variants is a valuable component of individualized clinical management. Using principles of Mendelian genetics, Bayesian probability theory, and variant-specific knowledge, Mendelian models derive the probability of carrying a pathogenic variant and developing cancer in the future, based on family history. Existing Mendelian models are widely employed, but are generally limited to specific genes and syndromes. However, the upsurge of multigene panel germline testing has spurred the discovery of many new gene-cancer associations that are not presently accounted for in these models. We have developed PanelPRO, a flexible, efficient Mendelian risk prediction framework that can incorporate an arbitrary number of genes and cancers, overcoming the computational challenges that arise because of the increased model complexity. We implement an 11-gene, 11-cancer model, the largest Mendelian model created thus far, based on this framework. Using simulations and a clinical cohort with germline panel testing data, we evaluate model performance, validate the reverse-compatibility of our approach with existing Mendelian models, and illustrate its usage. Our implementation is freely available for research use in the PanelPRO R package.


Subject(s)
Genetic Predisposition to Disease , Neoplasms , Bayes Theorem , Cohort Studies , Humans , Models, Genetic , Neoplasms/genetics
9.
Genet Med ; 25(7): 100837, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37057674

ABSTRACT

PURPOSE: The aim of this study was to describe the clinical impact of commercial laboratories issuing conflicting classifications of genetic variants. METHODS: Results from 2000 patients undergoing a multigene hereditary cancer panel by a single laboratory were analyzed. Clinically significant discrepancies between the laboratory-provided test reports and other major commercial laboratories were identified, including differences between pathogenic/likely pathogenic and variant of uncertain significance (VUS) classifications, via review of ClinVar archives. For patients carrying a VUS, clinical documentation was assessed for evidence of provider awareness of the conflict. RESULTS: Fifty of 975 (5.1%) patients with non-negative results carried a variant with a clinically significant conflict, 19 with a pathogenic/likely pathogenic variant reported in APC or MUTYH, and 31 with a VUS reported in CDKN2A, CHEK2, MLH1, MSH2, MUTYH, RAD51C, or TP53. Only 10 of 28 (36%) patients with a VUS with a clinically significant conflict had a documented discussion by a provider about the conflict. Discrepant counseling strategies were used for different patients with the same variant. Among patients with a CDKN2A variant or a monoallelic MUTYH variant, providers were significantly more likely to make recommendations based on the laboratory-reported classification. CONCLUSION: Our findings highlight the frequency of variant interpretation discrepancies and importance of clinician awareness. Guidance is needed on managing patients with discrepant variants to support accurate risk assessment.


Subject(s)
Genetic Variation , Neoplasms , Humans , Neoplasms/genetics , Laboratories , Genetic Testing/methods , Genetic Predisposition to Disease
10.
Blood ; 138(20): 1980-1985, 2021 11 18.
Article in English | MEDLINE | ID: mdl-34792571

ABSTRACT

Immunoglobulin M (IgM) multiple myeloma (MM) is a rare disease subgroup. Its differentiation from other IgM-producing gammopathies such as Waldenström macroglobulinemia (WM) has not been well characterized but is essential for proper risk assessment and treatment. In this study, we investigated genomic and transcriptomic characteristics of IgM-MM samples using whole-genome and transcriptome sequencing to identify differentiating characteristics from non-IgM-MM and WM. Our results suggest that IgM-MM shares most of its defining structural variants and gene-expression profiling with MM, but has some key characteristics, including t(11;14) translocation, chromosome 6 and 13 deletion as well as distinct molecular and transcription-factor signatures. Furthermore, IgM-MM translocations were predominantly characterized by VHDHJH recombination-induced breakpoints, as opposed to the usual class-switching region breakpoints; coupled with its lack of class switching, these data favor a pre-germinal center origin. Finally, we found elevated expression of clinically relevant targets, including CD20 and Bruton tyrosine kinase, as well as high BCL2/BCL2L1 ratio in IgM-MM, providing potential for targeted therapeutics.


Subject(s)
Immunoglobulin M/genetics , Multiple Myeloma/genetics , Transcriptome , Waldenstrom Macroglobulinemia/genetics , DNA Copy Number Variations , Germinal Center/metabolism , Humans , Multiple Myeloma/diagnosis , Mutation , Translocation, Genetic , Waldenstrom Macroglobulinemia/diagnosis
11.
Mol Cell ; 57(4): 636-647, 2015 Feb 19.
Article in English | MEDLINE | ID: mdl-25699710

ABSTRACT

The mechanisms contributing to transcription-associated genomic instability are both complex and incompletely understood. Although R-loops are normal transcriptional intermediates, they are also associated with genomic instability. Here, we show that BRCA1 is recruited to R-loops that form normally over a subset of transcription termination regions. There it mediates the recruitment of a specific, physiological binding partner, senataxin (SETX). Disruption of this complex led to R-loop-driven DNA damage at those loci as reflected by adjacent γ-H2AX accumulation and ssDNA breaks within the untranscribed strand of relevant R-loop structures. Genome-wide analysis revealed widespread BRCA1 binding enrichment at R-loop-rich termination regions (TRs) of actively transcribed genes. Strikingly, within some of these genes in BRCA1 null breast tumors, there are specific insertion/deletion mutations located close to R-loop-mediated BRCA1 binding sites within TRs. Thus, BRCA1/SETX complexes support a DNA repair mechanism that addresses R-loop-based DNA damage at transcriptional pause sites.


Subject(s)
BRCA1 Protein/physiology , DNA Repair , Models, Genetic , RNA Helicases/physiology , BRCA1 Protein/genetics , BRCA1 Protein/metabolism , DNA Damage , DNA Helicases , HeLa Cells , Humans , Multifunctional Enzymes , RNA Helicases/genetics , RNA Helicases/metabolism , Transcription Termination, Genetic , Transcription, Genetic
12.
Genet Epidemiol ; 45(2): 154-170, 2021 03.
Article in English | MEDLINE | ID: mdl-33000511

ABSTRACT

Estimating the prevalence of rare germline genetic mutations in the general population is of interest as it can inform genetic counseling and risk management. Most studies that estimate the prevalence of mutations are performed in high-risk populations, and each study is designed with differing inclusion criteria, resulting in ascertained populations. Quantifying the effects of ascertainment is necessary to estimate the prevalence in the general population. This quantification is difficult as the inclusion criteria is often based on disease status and/or family history. Combining estimates from multiple studies through a meta-analysis is challenging due to the variety of study designs and ascertainment mechanisms as well as the complexity of quantifying the effect of these mechanisms. We provide guidelines on how to quantify the ascertainment mechanism for a wide range of settings and propose a general approach for conducting a meta-analysis in these complex settings by incorporating study-specific ascertainment mechanisms into a joint likelihood function. We implement the proposed likelihood-based approach using both frequentist and Bayesian methodologies. We evaluate these approaches in simulations and show that the methods are robust and produce unbiased estimates of the prevalence. An advantage of the Bayesian approach is that it can easily incorporate uncertainty in ascertainment probability values. We apply our methods to estimate the prevalence of PALB2 mutations in the United States by combining data from multiple studies and obtain a prevalence estimate of around 0.02%.


Subject(s)
Models, Genetic , Bayes Theorem , Humans , Likelihood Functions , Mutation , Prevalence
13.
Genet Epidemiol ; 45(2): 209-221, 2021 03.
Article in English | MEDLINE | ID: mdl-33030277

ABSTRACT

Germline mutations in many genes have been shown to increase the risk of developing cancer. This risk can vary across families who carry mutations in the same gene due to differences in the specific variants, gene-gene interactions, other susceptibility mutations, environmental factors, and behavioral factors. We develop an analytic tool to explore this heterogeneity using family history data. We propose to evaluate the ratio between the number of observed cancer cases in a family and the number of expected cases under a model where risk is assumed to be the same across families. We perform this analysis for both carriers and noncarriers in each family, using carrier probabilities when carrier statuses are unknown, and visualize the results. We first illustrate the approach in simulated data and then apply it to data on colorectal cancer risk in families carrying mutations in Lynch syndrome genes from Creighton University's Hereditary Cancer Center. We show that colorectal cancer risk in carriers can vary widely across families, and that this variation is not matched by a corresponding variation in the noncarriers from the same families. This suggests that the sources of variation in these families are to be found predominantly in variants harbored in the mutated MMR genes considered, or in variants interacting with them.


Subject(s)
Colorectal Neoplasms, Hereditary Nonpolyposis , Genetic Predisposition to Disease , Colorectal Neoplasms, Hereditary Nonpolyposis/genetics , Humans , Models, Genetic , Mutation
14.
Am J Epidemiol ; 191(7): 1307-1322, 2022 06 27.
Article in English | MEDLINE | ID: mdl-35292800

ABSTRACT

In the Men's Lifestyle Validation Study (2011-2013), we examined the validity and relative validity of a physical activity questionnaire (PAQ), a Web-based 24-hour recall (Activities Completed Over Time in 24 Hours (ACT24)), and an accelerometer by multiple comparison methods. Over the course of 1 year, 609 men completed 2 PAQs, two 7-day accelerometer measurements, at least 1 doubly labeled water (DLW) physical activity level (PAL) measurement (n = 100 with repeat measurements), and 4 ACT24s; they also measured their resting pulse rate. A subset (n = 197) underwent dual-energy x-ray absorptiometry (n = 99 with repeated measurements). The method of triads was used to estimate correlations with true activity using DLW PAL, accelerometry, and the PAQ or ACT24 as alternative comparison measures. Estimated correlations of the PAQ with true activity were 0.60 (95% confidence interval (95% CI): 0.52, 0.68) for total activity, 0.69 (95% CI: 0.61, 0.79) for moderate-to-vigorous physical activity (MVPA), and 0.76 (95% CI: 0.62, 0.93) for vigorous activity. Corresponding correlations for total activity were 0.53 (95% CI: 0.45, 0.63) for the average of 4 ACT24s and 0.68 (95% CI: 0.61, 0.75) for accelerometry. Total activity and MVPA measured by PAQ, ACT24, and accelerometry were all significantly correlated with body fat percentage and resting pulse rate, which are physiological indicators of physical activity. Using a combination of comparison methods, we found the PAQ and accelerometry to have moderate validity for assessing physical activity, especially MVPA, in epidemiologic studies.


Subject(s)
Accelerometry , Exercise , Epidemiologic Studies , Exercise/physiology , Humans , Life Style , Male , Reproducibility of Results , Surveys and Questionnaires
15.
Bioinformatics ; 37(11): 1521-1527, 2021 07 12.
Article in English | MEDLINE | ID: mdl-33245114

ABSTRACT

MOTIVATION: Genomic data are often produced in batches due to practical restrictions, which may lead to unwanted variation in data caused by discrepancies across batches. Such 'batch effects' often have negative impact on downstream biological analysis and need careful consideration. In practice, batch effects are usually addressed by specifically designed software, which merge the data from different batches, then estimate batch effects and remove them from the data. Here, we focus on classification and prediction problems, and propose a different strategy based on ensemble learning. We first develop prediction models within each batch, then integrate them through ensemble weighting methods. RESULTS: We provide a systematic comparison between these two strategies using studies targeting diverse populations infected with tuberculosis. In one study, we simulated increasing levels of heterogeneity across random subsets of the study, which we treat as simulated batches. We then use the two methods to develop a genomic classifier for the binary indicator of disease status. We evaluate the accuracy of prediction in another independent study targeting a different population cohort. We observed that in independent validation, while merging followed by batch adjustment provides better discrimination at low level of heterogeneity, our ensemble learning strategy achieves more robust performance, especially at high severity of batch effects. These observations provide practical guidelines for handling batch effects in the development and evaluation of genomic classifiers. AVAILABILITY AND IMPLEMENTATION: The data underlying this article are available in the article and in its online supplementary material. Processed data is available in the Github repository with implementation code, at https://github.com/zhangyuqing/bea_ensemble. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome , Genomics , Humans , Machine Learning
16.
Genet Med ; 24(10): 2155-2166, 2022 10.
Article in English | MEDLINE | ID: mdl-35997715

ABSTRACT

PURPOSE: Models used to predict the probability of an individual having a pathogenic homozygous or heterozygous variant in a mismatch repair gene, such as MMRpro, are widely used. Recently, MMRpro was updated with new colorectal cancer penetrance estimates. The purpose of this study was to evaluate the predictive performance of MMRpro and other models for individuals with a family history of colorectal cancer. METHODS: We performed a validation study of 4 models, Leiden, MMRpredict, PREMM5, and MMRpro, using 784 members of clinic-based families from the United States. Predicted probabilities were compared with germline testing results and evaluated for discrimination, calibration, and predictive accuracy. We analyzed several strategies to combine models and improve predictive performance. RESULTS: MMRpro with additional tumor information (MMRpro+) and PREMM5 outperformed the other models in discrimination and predictive accuracy. MMRpro+ was the best calibrated with an observed to expected ratio of 0.98 (95% CI = 0.89-1.08). The combination models showed improvement over PREMM5 and performed similar to MMRpro+. CONCLUSION: MMRpro+ and PREMM5 performed well in predicting the probability of having a pathogenic homozygous or heterozygous variant in a mismatch repair gene. They serve as useful clinical decision tools for identifying individuals who would benefit greatly from screening and prevention strategies.


Subject(s)
Colorectal Neoplasms, Hereditary Nonpolyposis , DNA Mismatch Repair , Colorectal Neoplasms, Hereditary Nonpolyposis/diagnosis , Colorectal Neoplasms, Hereditary Nonpolyposis/genetics , DNA Mismatch Repair/genetics , Germ-Line Mutation/genetics , Heterozygote , Humans , Mismatch Repair Endonuclease PMS2/genetics , MutL Protein Homolog 1/genetics
17.
Genet Epidemiol ; 44(6): 564-578, 2020 09.
Article in English | MEDLINE | ID: mdl-32506746

ABSTRACT

There are numerous statistical models used to identify individuals at high risk of cancer due to inherited mutations. Mendelian models predict future risk of cancer by using family history with estimated cancer penetrances (age- and sex-specific risk of cancer given the genotype of the mutations) and mutation prevalences. However, there is often residual risk heterogeneity across families even after accounting for the mutations in the model, due to environmental or unobserved genetic risk factors. We aim to improve Mendelian risk prediction by incorporating a frailty model that contains a family-specific frailty vector, impacting the cancer hazard function, to account for this heterogeneity. We use a discrete uniform population frailty distribution and implement a marginalized approach that averages each family's risk predictions over the family's frailty distribution. We apply the proposed approach to improve breast cancer prediction in BRCAPRO, a Mendelian model that accounts for inherited mutations in the BRCA1 and BRCA2 genes to predict breast and ovarian cancer. We evaluate the proposed model's performance in simulations and real data from the Cancer Genetics Network and show improvements in model calibration and discrimination. We also discuss alternative approaches for incorporating frailties and their strengths and limitations.


Subject(s)
Genetic Predisposition to Disease , Models, Genetic , Breast Neoplasms/genetics , Computer Simulation , Female , Genes, BRCA1 , Genes, BRCA2 , Humans , Male , Models, Statistical , Mutation/genetics , Risk Factors
18.
Br J Cancer ; 125(12): 1712-1717, 2021 12.
Article in English | MEDLINE | ID: mdl-34703010

ABSTRACT

INTRODUCTION: Identifying families with an underlying inherited cancer predisposition is a major goal of cancer prevention efforts. Mendelian risk models have been developed to better predict the risk associated with a pathogenic variant of developing breast/ovarian cancer (with BRCAPRO) and the risk of developing pancreatic cancer (PANCPRO). Given that pathogenic variants involving BRCA2 and BRCA1 predispose to all three of these cancers, we developed a joint risk model to capture shared susceptibility. METHODS: We expanded the existing framework for PANCPRO and BRCAPRO to jointly model risk of pancreatic, breast, and ovarian cancer and validated this new model, BRCAPANCPRO on three data sets each reflecting the common target populations. RESULTS: BRCAPANCPRO outperformed the prior BRCAPRO and PANCPRO models and yielded good discrimination for differentiating BRCA1 and BRCA2 carriers from non-carriers (AUCs 0.79, 95% CI: 0.73-0.84 and 0.70, 95% CI: 0.60-0.80) in families seen in high-risk clinics and pancreatic cancer family registries, respectively. In addition, BRCAPANCPRO was reasonably well calibrated for predicting future risk of pancreatic cancer (observed-to-expected (O/E) ratio = 0.81 [0.69, 0.94]). DISCUSSION: The BRCAPANCPRO model provides improved risk assessment over our previous risk models, particularly for pedigrees with a co-occurrence of pancreatic cancer and breast and/or ovarian cancer.


Subject(s)
Breast Neoplasms/diagnosis , Ovarian Neoplasms/diagnosis , Pancreatic Neoplasms/diagnosis , Female , Humans , Male , Medical History Taking , Risk Assessment
19.
Biostatistics ; 21(2): 253-268, 2020 04 01.
Article in English | MEDLINE | ID: mdl-30202918

ABSTRACT

Cross-study validation (CSV) of prediction models is an alternative to traditional cross-validation (CV) in domains where multiple comparable datasets are available. Although many studies have noted potential sources of heterogeneity in genomic studies, to our knowledge none have systematically investigated their intertwined impacts on prediction accuracy across studies. We employ a hybrid parametric/non-parametric bootstrap method to realistically simulate publicly available compendia of microarray, RNA-seq, and whole metagenome shotgun microbiome studies of health outcomes. Three types of heterogeneity between studies are manipulated and studied: (i) imbalances in the prevalence of clinical and pathological covariates, (ii) differences in gene covariance that could be caused by batch, platform, or tumor purity effects, and (iii) differences in the "true" model that associates gene expression and clinical factors to outcome. We assess model accuracy, while altering these factors. Lower accuracy is seen in CSV than in CV. Surprisingly, heterogeneity in known clinical covariates and differences in gene covariance structure have very limited contributions in the loss of accuracy when validating in new studies. However, forcing identical generative models greatly reduces the within/across study difference. These results, observed consistently for multiple disease outcomes and omics platforms, suggest that the most easily identifiable sources of study heterogeneity are not necessarily the primary ones that undermine the ability to accurately replicate the accuracy of omics prediction models in new studies. Unidentified heterogeneity, such as could arise from unmeasured confounding, may be more important.


Subject(s)
Biostatistics/methods , Genetic Research , Genomics/methods , Models, Biological , Models, Statistical , Genomics/standards , Humans , Metagenome/genetics , Microarray Analysis/methods , Microarray Analysis/standards , Microbiota/genetics , Sequence Analysis, RNA/methods
20.
Stat Med ; 40(3): 593-606, 2021 02 10.
Article in English | MEDLINE | ID: mdl-33120437

ABSTRACT

Commercialized multigene panel testing brings unprecedented opportunities to understand germline genetic contributions to hereditary cancers. Most genetic testing companies classify the pathogenicity of variants as pathogenic, benign, or variants of unknown significance (VUSs). The unknown pathogenicity of VUSs poses serious challenges to clinical decision-making. This study aims to assess the frequency of VUSs that are likely pathogenic in disease-susceptibility genes. Using estimates of probands' probability of having a pathogenic mutation (ie, the carrier score) based on a family history probabilistic risk prediction model, we assume the carrier score distribution for probands with VUSs is a mixture of the carrier score distribution for probands with positive results and the carrier score distribution for probands with negative results. Under this mixture model, we propose a likelihood-based approach to assess the frequency of pathogenicity among probands with VUSs, while accounting for the existence of possible pathogenic mutations on genes not tested. We conducted simulations to assess the performance of the approach and show that under various settings, the approach performs well with very little bias in the estimated proportion of VUSs that are likely pathogenic. We also estimate the positive predictive value across the entire range of carrier scores. We apply our approach to the USC-Stanford Hereditary Cancer Panel Testing cohort, and estimate the proportion of probands that have VUSs in BRCA1/2 that are likely pathogenic to be 10.12% [95%CI: 0%, 43.04%]. This approach will enable clinicians to target high-risk patients who have VUSs, allowing for early prevention interventions.


Subject(s)
Breast Neoplasms , Genetic Predisposition to Disease , Breast Neoplasms/genetics , Female , Genetic Testing , Humans , Likelihood Functions , Mutation , Virulence
SELECTION OF CITATIONS
SEARCH DETAIL