Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
Biometrics ; 80(2)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38682463

ABSTRACT

Inferring the cancer-type specificities of ultra-rare, genome-wide somatic mutations is an open problem. Traditional statistical methods cannot handle such data due to their ultra-high dimensionality and extreme data sparsity. To harness information in rare mutations, we have recently proposed a formal multilevel multilogistic "hidden genome" model. Through its hierarchical layers, the model condenses information in ultra-rare mutations through meta-features embodying mutation contexts to characterize cancer types. Consistent, scalable point estimation of the model can incorporate 10s of millions of variants across thousands of tumors and permit impressive prediction and attribution. However, principled statistical inference is infeasible due to the volume, correlation, and noninterpretability of mutation contexts. In this paper, we propose a novel framework that leverages topic models from computational linguistics to effectuate dimension reduction of mutation contexts producing interpretable, decorrelated meta-feature topics. We propose an efficient MCMC algorithm for implementation that permits rigorous full Bayesian inference at a scale that is orders of magnitude beyond the capability of existing out-of-the-box inferential high-dimensional multi-class regression methods and software. Applying our model to the Pan Cancer Analysis of Whole Genomes dataset reveals interesting biological insights including somatic mutational topics associated with UV exposure in skin cancer, aging in colorectal cancer, and strong influence of epigenome organization in liver cancer. Under cross-validation, our model demonstrates highly competitive predictive performance against blackbox methods of random forest and deep learning.


Subject(s)
Algorithms , Bayes Theorem , Mutation , Neoplasms , Humans , Neoplasms/genetics , Models, Statistical , Skin Neoplasms/genetics
2.
Genet Epidemiol ; 2024 Apr 30.
Article in English | MEDLINE | ID: mdl-38686586

ABSTRACT

Numerous studies over the past generation have identified germline variants that increase specific cancer risks. Simultaneously, a revolution in sequencing technology has permitted high-throughput annotations of somatic genomes characterizing individual tumors. However, examining the relationship between germline variants and somatic alteration patterns is hugely challenged by the large numbers of variants in a typical tumor, the rarity of most individual variants, and the heterogeneity of tumor somatic fingerprints. In this article, we propose statistical methodology that frames the investigation of germline-somatic relationships in an interpretable manner. The method uses meta-features embodying biological contexts of individual somatic alterations to implicitly group rare mutations. Our team has used this technique previously through a multilevel regression model to diagnose with high accuracy tumor site of origin. Herein, we further leverage topic models from computational linguistics to achieve interpretable lower-dimensional embeddings of the meta-features. We demonstrate how the method can identify distinctive somatic profiles linked to specific germline variants or environmental risk factors. We illustrate the method using The Cancer Genome Atlas whole-exome sequencing data to characterize somatic tumor fingerprints in breast cancer patients with germline BRCA1/2 mutations and in head and neck cancer patients exposed to human papillomavirus.

3.
Cancer Res Commun ; 3(3): 483-488, 2023 03.
Article in English | MEDLINE | ID: mdl-36969913

ABSTRACT

Many studies have shown that the distributions of the genomic, nucleotide, and epigenetic contexts of somatic variants in tumors are informative of cancer etiology. Recently, a new direction of research has focused on extracting signals from the contexts of germline variants and evidence has emerged that patterns defined by these factors are associated with oncogenic pathways, histologic subtypes, and prognosis. It remains an open question whether aggregating germline variants using meta-features capturing their genomic, nucleotide, and epigenetic contexts can improve cancer risk prediction. This aggregation approach can potentially increase statistical power for detecting signals from rare variants, which have been hypothesized to be a major source of the missing heritability of cancer. Using germline whole-exome sequencing data from the UK Biobank, we developed risk models for 10 cancer types using known risk variants (cancer-associated SNPs and pathogenic variants in known cancer predisposition genes) as well as models that additionally include the meta-features. The meta-features did not improve the prediction accuracy of models based on known risk variants. It is possible that expanding the approach to whole-genome sequencing can lead to gains in prediction accuracy. Significance: There is evidence that cancer is partly caused by rare genetic variants that have not yet been identified. We investigate this issue using novel statistical methods and data from the UK Biobank.


Subject(s)
Genetic Predisposition to Disease , Neoplasms , Humans , Exome Sequencing , Genetic Predisposition to Disease/genetics , Neoplasms/genetics , Germ-Line Mutation/genetics , Genomics
4.
Cancers (Basel) ; 15(4)2023 Feb 08.
Article in English | MEDLINE | ID: mdl-36831433

ABSTRACT

Accurate risk stratification is key to reducing cancer morbidity through targeted screening and preventative interventions. Multiple breast cancer risk prediction models are used in clinical practice, and often provide a range of different predictions for the same patient. Integrating information from different models may improve the accuracy of predictions, which would be valuable for both clinicians and patients. BRCAPRO is a widely used model that predicts breast cancer risk based on detailed family history information. A major limitation of this model is that it does not consider non-genetic risk factors. To address this limitation, we expand BRCAPRO by combining it with another popular existing model, BCRAT (i.e., Gail), which uses a largely complementary set of risk factors, most of them non-genetic. We consider two approaches for combining BRCAPRO and BCRAT: (1) modifying the penetrance (age-specific probability of developing cancer given genotype) functions in BRCAPRO using relative hazard estimates from BCRAT, and (2) training an ensemble model that takes BRCAPRO and BCRAT predictions as input. Using both simulated data and data from Newton-Wellesley Hospital and the Cancer Genetics Network, we show that the combination models are able to achieve performance gains over both BRCAPRO and BCRAT. In the Cancer Genetics Network cohort, we show that the proposed BRCAPRO + BCRAT penetrance modification model performs comparably to IBIS, an existing model that combines detailed family history with non-genetic risk factors.

5.
Ir J Med Sci ; 191(2): 641-650, 2022 Apr.
Article in English | MEDLINE | ID: mdl-33733397

ABSTRACT

BACKGROUND: Determining how many female patients who underwent breast imaging meet the eligibility criteria for genetic testing for familial pancreatic cancer (FPC). METHODS: A total of 42,904 patients seen at the Newton-Wellesley Hospital between 2007 and 2009 were retrospectively reviewed. The first four categories were based on pancreatic cancer-associated syndromes: (1) hereditary breast and ovarian cancer (HBOC), (2) Lynch syndrome (LS), (3) familial atypical multiple mole melanoma (FAMMM), and (4) family history of FPC (FH-FPC). PancPRO (5) and MelaPRO (6) categories were based on risk scores from Mendelian risk prediction tool. RESULTS: Exactly 4445 of 42,904 patients were found to be in at least one of the six risk categories. About 5.7% of patients were classified as being at high risk for HBOC, 2.3% as being at high risk for LS, 0.1% as being at high risk for FAMMM, 0.1% as being at high risk for FH-FPC, 2.7% as being at high risk based on PancPRO, and 0.2% as being at high risk based on MelaPRO. CONCLUSION: About 10.4% of the female patients were classified as being at high risk for FPC. This finding emphasizes the importance of applying criteria to the general population, in order to ensure that individuals with high risk are identified early.


Subject(s)
Colorectal Neoplasms, Hereditary Nonpolyposis , Pancreatic Neoplasms , Colorectal Neoplasms, Hereditary Nonpolyposis/diagnosis , Colorectal Neoplasms, Hereditary Nonpolyposis/genetics , Female , Genetic Predisposition to Disease , Genetic Testing , Humans , Pancreatic Neoplasms/diagnosis , Pancreatic Neoplasms/genetics , Retrospective Studies
6.
Hum Hered ; 86(1-4): 34-44, 2021.
Article in English | MEDLINE | ID: mdl-34718237

ABSTRACT

BACKGROUND: Many cancer types show considerable heritability, and extensive research has been done to identify germline susceptibility variants. Linkage studies have discovered many rare high-risk variants, and genome-wide association studies (GWAS) have discovered many common low-risk variants. However, it is believed that a considerable proportion of the heritability of cancer remains unexplained by known susceptibility variants. The "rare variant hypothesis" proposes that much of the missing heritability lies in rare variants that cannot reliably be detected by linkage analysis or GWAS. Until recently, high sequencing costs have precluded extensive surveys of rare variants, but technological advances have now made it possible to analyze rare variants on a much greater scale. OBJECTIVES: In this study, we investigated associations between rare variants and 14 cancer types. METHODS: We ran association tests using whole-exome sequencing data from The Cancer Genome Atlas (TCGA) and validated the findings using data from the Pan-Cancer Analysis of Whole Genomes Consortium (PCAWG). RESULTS: We identified four significant associations in TCGA, only one of which was replicated in PCAWG (BRCA1 and ovarian cancer). CONCLUSIONS: Our results provide little evidence in favor of the rare variant hypothesis. Much larger sample sizes may be needed to detect undiscovered rare cancer variants.


Subject(s)
Exome , Ovarian Neoplasms , Exome/genetics , Genetic Predisposition to Disease , Genome-Wide Association Study , Germ Cells , Humans , Exome Sequencing
7.
Nat Commun ; 12(1): 3051, 2021 05 24.
Article in English | MEDLINE | ID: mdl-34031376

ABSTRACT

The vast preponderance of somatic mutations in a typical cancer are either extremely rare or have never been previously recorded in available databases that track somatic mutations. These constitute a hidden genome that contrasts the relatively small number of mutations that occur frequently, the properties of which have been studied in depth. Here we demonstrate that this hidden genome contains much more accurate information than common mutations for the purpose of identifying the site of origin of primary cancers in settings where this is unknown. We accomplish this using a projection-based statistical method that achieves a highly effective signal condensation, by leveraging DNA sequence and epigenetic contexts using a set of meta-features that embody the mutation contexts of rare variants throughout the genome.


Subject(s)
Mutation , Neoplasms/genetics , DNA Repair , Databases, Genetic , Humans , Logistic Models , Machine Learning
8.
Cancers (Basel) ; 14(1)2021 Dec 23.
Article in English | MEDLINE | ID: mdl-35008209

ABSTRACT

(1) Background: The purpose of this study is to compare the performance of four breast cancer risk prediction models by race, molecular subtype, family history of breast cancer, age, and BMI. (2) Methods: Using a cohort of women aged 40-84 without prior history of breast cancer who underwent screening mammography from 2006 to 2015, we generated breast cancer risk estimates using the Breast Cancer Risk Assessment tool (BCRAT), BRCAPRO, Breast Cancer Surveillance Consortium (BCSC) and combined BRCAPRO+BCRAT models. Model calibration and discrimination were compared using observed-to-expected ratios (O/E) and the area under the receiver operator curve (AUC) among patients with at least five years of follow-up. (3) Results: We observed comparable discrimination and calibration across models. There was no significant difference in model performance between Black and White women. Model discrimination was poorer for HER2+ and triple-negative subtypes compared with ER/PR+HER2-. The BRCAPRO+BCRAT model displayed improved calibration and discrimination compared to BRCAPRO among women with a family history of breast cancer. Across models, discriminatory accuracy was greater among obese than non-obese women. When defining high risk as a 5-year risk of 1.67% or greater, models demonstrated discordance in 2.9% to 19.7% of patients. (4) Conclusions: Our results can inform the implementation of risk assessment and risk-based screening among women undergoing screening mammography.

9.
J Natl Cancer Inst ; 112(5): 489-497, 2020 05 01.
Article in English | MEDLINE | ID: mdl-31556450

ABSTRACT

BACKGROUND: Several breast cancer risk-assessment models exist. Few studies have evaluated predictive accuracy of multiple models in large screening populations. METHODS: We evaluated the performance of the BRCAPRO, Gail, Claus, Breast Cancer Surveillance Consortium (BCSC), and Tyrer-Cuzick models in predicting risk of breast cancer over 6 years among 35 921 women aged 40-84 years who underwent mammography screening at Newton-Wellesley Hospital from 2007 to 2009. We assessed model discrimination using the area under the receiver operating characteristic curve (AUC) and assessed calibration by comparing the ratio of observed-to-expected (O/E) cases. We calculated the square root of the Brier score and positive and negative predictive values of each model. RESULTS: Our results confirmed the good calibration and comparable moderate discrimination of the BRCAPRO, Gail, Tyrer-Cuzick, and BCSC models. The Gail model had slightly better O/E ratio and AUC (O/E = 0.98, 95% confidence interval [CI] = 0.91 to 1.06, AUC = 0.64, 95% CI = 0.61 to 0.65) compared with BRCAPRO (O/E = 0.94, 95% CI = 0.88 to 1.02, AUC = 0.61, 95% CI = 0.59 to 0.63) and Tyrer-Cuzick (version 8, O/E = 0.84, 95% CI = 0.79 to 0.91, AUC = 0.62, 95% 0.60 to 0.64) in the full study population, and the BCSC model had the highest AUC among women with available breast density information (O/E = 0.97, 95% CI = 0.89 to 1.05, AUC = 0.64, 95% CI = 0.62 to 0.66). All models had poorer predictive accuracy for human epidermal growth factor receptor 2 positive and triple-negative breast cancers than hormone receptor positive human epidermal growth factor receptor 2 negative breast cancers. CONCLUSIONS: In a large cohort of patients undergoing mammography screening, existing risk prediction models had similar, moderate predictive accuracy and good calibration overall. Models that incorporate additional genetic and nongenetic risk factors and estimate risk of tumor subtypes may further improve breast cancer risk prediction.


Subject(s)
Breast Neoplasms/diagnostic imaging , Breast Neoplasms/epidemiology , Risk Assessment/methods , Adult , Aged , Aged, 80 and over , Cohort Studies , Female , Humans , Mammography , Massachusetts/epidemiology , Middle Aged , Models, Statistical , Registries
10.
Radiology ; 292(1): 51-59, 2019 07.
Article in English | MEDLINE | ID: mdl-31063080

ABSTRACT

Background Screening breast MRI is recommended for women with BRCA mutation or a history of chest radiation, but guidelines are equivocal for MRI screening of women with a personal history of breast cancer or high-risk lesion. Purpose To evaluate screening breast MRI performance across women with different elevated breast cancer risk indications. Materials and Methods All screening breast MRI examinations performed between 2011 and 2014 underwent retrospective medical record review. Indications for screening were as follows: BRCA mutation carrier or history of chest radiation (BRCA/RT group), family history of breast cancer (FH group), personal history of breast cancer (PH group), and history of high-risk lesion (HRL group). Screening performance metrics were calculated and compared among indications by using logistic regression adjusted for age, available prior MRI, mammographic density, examination year, and multiple risk factors. Results There were 5170 screening examinations in 2637 women (mean age, 52 years; range, 23-86 years); 67 breast cancers were detected. The cancer detection rate (CDR) was highest in the BRCA/RT group (26 per 1000 examinations; 95% confidence interval [CI]: 16, 43 per 1000 examinations), intermediate for those in the PH and HRL groups (12 per 1000 examinations [95% CI: 9, 17 per 1000 examinations] and 15 per 1000 examinations [95% CI: 7, 32 per 1000 examinations], respectively), and lowest for those in the FH group (8 per 1000 examinations; 95% CI: 4, 14 per 1000 examinations). No difference in CDR was evident for the PH or HRL group compared with the BRCA/RT group (P = .14 and .18, respectively). The CDR was lower for the FH group compared with the BRCA/RT group (P = .02). No difference was evident in positive predictive value for biopsies performed (PPV3) for the BRCA/RT group (41%; 95% CI: 26%, 56%) compared with the PH (41%; 95% CI: 31%, 52%; P = .63) or HRL (36%, 95% CI: 17%, 60%; P = .37) groups. PPV3 was lower for the FH group (14%; 95% CI: 8%, 25%; P = .048). Conclusion Screening breast MRI should be considered for women with a personal history of breast cancer or high-risk lesion. Worse screening MRI performance in patients with a family history of breast cancer suggests that better risk assessment strategies may benefit these women. © RSNA, 2019.


Subject(s)
Breast Neoplasms/diagnostic imaging , Magnetic Resonance Imaging/methods , Adult , Aged , Aged, 80 and over , Breast/diagnostic imaging , Female , Humans , Middle Aged , Reproducibility of Results , Risk Factors , Young Adult
SELECTION OF CITATIONS
SEARCH DETAIL
...