Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Mais filtros

Base de dados
Intervalo de ano de publicação
Am J Hum Genet ; 107(3): 432-444, 2020 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-32758450


Accurate colorectal cancer (CRC) risk prediction models are critical for identifying individuals at low and high risk of developing CRC, as they can then be offered targeted screening and interventions to address their risks of developing disease (if they are in a high-risk group) and avoid unnecessary screening and interventions (if they are in a low-risk group). As it is likely that thousands of genetic variants contribute to CRC risk, it is clinically important to investigate whether these genetic variants can be used jointly for CRC risk prediction. In this paper, we derived and compared different approaches to generating predictive polygenic risk scores (PRS) from genome-wide association studies (GWASs) including 55,105 CRC-affected case subjects and 65,079 control subjects of European ancestry. We built the PRS in three ways, using (1) 140 previously identified and validated CRC loci; (2) SNP selection based on linkage disequilibrium (LD) clumping followed by machine-learning approaches; and (3) LDpred, a Bayesian approach for genome-wide risk prediction. We tested the PRS in an independent cohort of 101,987 individuals with 1,699 CRC-affected case subjects. The discriminatory accuracy, calculated by the age- and sex-adjusted area under the receiver operating characteristics curve (AUC), was highest for the LDpred-derived PRS (AUC = 0.654) including nearly 1.2 M genetic variants (the proportion of causal genetic variants for CRC assumed to be 0.003), whereas the PRS of the 140 known variants identified from GWASs had the lowest AUC (AUC = 0.629). Based on the LDpred-derived PRS, we are able to identify 30% of individuals without a family history as having risk for CRC similar to those with a family history of CRC, whereas the PRS based on known GWAS variants identified only top 10% as having a similar relative risk. About 90% of these individuals have no family history and would have been considered average risk under current screening guidelines, but might benefit from earlier screening. The developed PRS offers a way for risk-stratified CRC screening and other targeted interventions.

Neoplasias Colorretais/epidemiologia , Predisposição Genética para Doença , Genoma Humano/genética , Medição de Risco , Idoso , Grupo com Ancestrais do Continente Asiático/genética , Teorema de Bayes , Neoplasias Colorretais/genética , Neoplasias Colorretais/patologia , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco
Gastroenterology ; 158(5): 1274-1286.e12, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31866242


BACKGROUND & AIMS: Early-onset colorectal cancer (CRC, in persons younger than 50 years old) is increasing in incidence; yet, in the absence of a family history of CRC, this population lacks harmonized recommendations for prevention. We aimed to determine whether a polygenic risk score (PRS) developed from 95 CRC-associated common genetic risk variants was associated with risk for early-onset CRC. METHODS: We studied risk for CRC associated with a weighted PRS in 12,197 participants younger than 50 years old vs 95,865 participants 50 years or older. PRS was calculated based on single nucleotide polymorphisms associated with CRC in a large-scale genome-wide association study as of January 2019. Participants were pooled from 3 large consortia that provided clinical and genotyping data: the Colon Cancer Family Registry, the Colorectal Transdisciplinary Study, and the Genetics and Epidemiology of Colorectal Cancer Consortium and were all of genetically defined European descent. Findings were replicated in an independent cohort of 72,573 participants. RESULTS: Overall associations with CRC per standard deviation of PRS were significant for early-onset cancer, and were stronger compared with late-onset cancer (P for interaction = .01); when we compared the highest PRS quartile with the lowest, risk increased 3.7-fold for early-onset CRC (95% CI 3.28-4.24) vs 2.9-fold for late-onset CRC (95% CI 2.80-3.04). This association was strongest for participants without a first-degree family history of CRC (P for interaction = 5.61 × 10-5). When we compared the highest with the lowest quartiles in this group, risk increased 4.3-fold for early-onset CRC (95% CI 3.61-5.01) vs 2.9-fold for late-onset CRC (95% CI 2.70-3.00). Sensitivity analyses were consistent with these findings. CONCLUSIONS: In an analysis of associations with CRC per standard deviation of PRS, we found the cumulative burden of CRC-associated common genetic variants to associate with early-onset cancer, and to be more strongly associated with early-onset than late-onset cancer, particularly in the absence of CRC family history. Analyses of PRS, along with environmental and lifestyle risk factors, might identify younger individuals who would benefit from preventive measures.

Neoplasias Colorretais/genética , Predisposição Genética para Doença , Idade de Início , Estudos de Casos e Controles , Estudos de Coortes , Análise Mutacional de DNA , Conjuntos de Dados como Assunto , Feminino , Estudo de Associação Genômica Ampla , Técnicas de Genotipagem , Humanos , Estilo de Vida , Masculino , Anamnese , Pessoa de Meia-Idade , Taxa de Mutação , Polimorfismo de Nucleotídeo Único , Fatores de Risco , Sequenciamento Completo do Genoma
BMC Bioinformatics ; 15: 137, 2014 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-24886083


BACKGROUND: DNA microarrays are potentially powerful technology for improving diagnostic classification, treatment selection, and prognostic assessment. The use of this technology to predict cancer outcome has a history of almost a decade. Disease class predictors can be designed for known disease cases and provide diagnostic confirmation or clarify abnormal cases. The main input to this class predictors are high dimensional data with many variables and few observations. Dimensionality reduction of these features set significantly speeds up the prediction task. Feature selection and feature transformation methods are well known preprocessing steps in the field of bioinformatics. Several prediction tools are available based on these techniques. RESULTS: Studies show that a well tuned Kernel PCA (KPCA) is an efficient preprocessing step for dimensionality reduction, but the available bandwidth selection method for KPCA was computationally expensive. In this paper, we propose a new data-driven bandwidth selection criterion for KPCA, which is related to least squares cross-validation for kernel density estimation. We propose a new prediction model with a well tuned KPCA and Least Squares Support Vector Machine (LS-SVM). We estimate the accuracy of the newly proposed model based on 9 case studies. Then, we compare its performances (in terms of test set Area Under the ROC Curve (AUC) and computational time) with other well known techniques such as whole data set + LS-SVM, PCA + LS-SVM, t-test + LS-SVM, Prediction Analysis of Microarrays (PAM) and Least Absolute Shrinkage and Selection Operator (Lasso). Finally, we assess the performance of the proposed strategy with an existing KPCA parameter tuning algorithm by means of two additional case studies. CONCLUSION: We propose, evaluate, and compare several mathematical/statistical techniques, which apply feature transformation/selection for subsequent classification, and consider its application in medical diagnostics. Both feature selection and feature transformation perform well on classification tasks. Due to the dynamic selection property of feature selection, it is hard to define significant features for the classifier, which predicts classes of future samples. Moreover, the proposed strategy enjoys a distinctive advantage with its relatively lesser time complexity.

Análise de Sequência com Séries de Oligonucleotídeos , Análise de Componente Principal , Algoritmos , Inteligência Artificial , Humanos , Análise dos Mínimos Quadrados , Neoplasias/classificação , Máquina de Vetores de Suporte
BMC Bioinformatics ; 15: 411, 2014 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-25551433


BACKGROUND: Clinical data, such as patient history, laboratory analysis, ultrasound parameters-which are the basis of day-to-day clinical decision support-are often used to guide the clinical management of cancer in the presence of microarray data. Several data fusion techniques are available to integrate genomics or proteomics data, but only a few studies have created a single prediction model using both gene expression and clinical data. These studies often remain inconclusive regarding an obtained improvement in prediction performance. To improve clinical management, these data should be fully exploited. This requires efficient algorithms to integrate these data sets and design a final classifier. LS-SVM classifiers and generalized eigenvalue/singular value decompositions are successfully used in many bioinformatics applications for prediction tasks. While bringing up the benefits of these two techniques, we propose a machine learning approach, a weighted LS-SVM classifier to integrate two data sources: microarray and clinical parameters. RESULTS: We compared and evaluated the proposed methods on five breast cancer case studies. Compared to LS-SVM classifier on individual data sets, generalized eigenvalue decomposition (GEVD) and kernel GEVD, the proposed weighted LS-SVM classifier offers good prediction performance, in terms of test area under ROC Curve (AUC), on all breast cancer case studies. CONCLUSIONS: Thus a clinical classifier weighted with microarray data set results in significantly improved diagnosis, prognosis and prediction responses to therapy. The proposed model has been shown as a promising mathematical framework in both data fusion and non-linear classification problems.

Neoplasias da Mama/genética , Máquina de Vetores de Suporte , Algoritmos , Área Sob a Curva , Inteligência Artificial , Bases de Dados Genéticas , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Prognóstico , Software
Artigo em Inglês | MEDLINE | ID: mdl-26356338


We propose a method, maximum likelihood estimation of generalized eigenvalue decomposition (MLGEVD) that employs a well known technique relying on the generalization of singular value decomposition (SVD). The main aim of the work is to show the tight equivalence between MLGEVD and generalized ridge regression. This relationship reveals an important mathematical property of GEVD in which the second argument act as prior information in the model. Thus we show that MLGEVD allows the incorporation of external knowledge about the quantities of interest into the estimation problem. We illustrate the importance of prior knowledge in clinical decision making/identifying differentially expressed genes with case studies for which microarray data sets with corresponding clinical/literature information are available. On all of these three case studies, MLGEVD outperformed GEVD on prediction in terms of test area under the ROC curve (test AUC). MLGEVD results in significantly improved diagnosis, prognosis and prediction of therapy response.

Algoritmos , Biologia Computacional/métodos , Funções Verossimilhança , Perfilação da Expressão Gênica , Humanos , Neoplasias/classificação , Neoplasias/genética , Neoplasias/metabolismo