Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 50(12): e70, 2022 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-35412634

RESUMO

Discovering rare cancer driver genes is difficult because their mutational frequency is too low for statistical detection by computational methods. EPIMUTESTR is an integrative nearest-neighbor machine learning algorithm that identifies such marginal genes by modeling the fitness of their mutations with the phylogenetic Evolutionary Action (EA) score. Over cohorts of sequenced patients from The Cancer Genome Atlas representing 33 tumor types, EPIMUTESTR detected 214 previously inferred cancer driver genes and 137 new candidates never identified computationally before of which seven genes are supported in the COSMIC Cancer Gene Census. EPIMUTESTR achieved better robustness and specificity than existing methods in a number of benchmark methods and datasets.


Assuntos
Aprendizado de Máquina , Neoplasias , Humanos , Mutação , Neoplasias/genética , Neoplasias/patologia , Oncogenes , Filogenia
2.
Bioinformatics ; 36(10): 3093-3098, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31985777

RESUMO

SUMMARY: Feature selection can improve the accuracy of machine-learning models, but appropriate steps must be taken to avoid overfitting. Nested cross-validation (nCV) is a common approach that chooses the classification model and features to represent a given outer fold based on features that give the maximum inner-fold accuracy. Differential privacy is a related technique to avoid overfitting that uses a privacy-preserving noise mechanism to identify features that are stable between training and holdout sets.We develop consensus nested cross-validation (cnCV) that combines the idea of feature stability from differential privacy with nCV. Feature selection is applied in each inner fold and the consensus of top features across folds is used as a measure of feature stability or reliability instead of classification accuracy, which is used in standard nCV. We use simulated data with main effects, correlation and interactions to compare the classification accuracy and feature selection performance of the new cnCV with standard nCV, Elastic Net optimized by cross-validation, differential privacy and private evaporative cooling (pEC). We also compare these methods using real RNA-seq data from a study of major depressive disorder.The cnCV method has similar training and validation accuracy to nCV, but cnCV has much shorter run times because it does not construct classifiers in the inner folds. The cnCV method chooses a more parsimonious set of features with fewer false positives than nCV. The cnCV method has similar accuracy to pEC and cnCV selects stable features between folds without the need to specify a privacy threshold. We show that cnCV is an effective and efficient approach for combining feature selection with classification. AVAILABILITY AND IMPLEMENTATION: Code available at https://github.com/insilico/cncv. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Transtorno Depressivo Maior , Consenso , Humanos , Aprendizado de Máquina , Reprodutibilidade dos Testes , Projetos de Pesquisa
3.
Bioinformatics ; 35(13): 2329-2331, 2019 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-30481259

RESUMO

MOTIVATION: An important challenge in gene expression analysis is to improve hub gene selection to enrich for biological relevance or improve classification accuracy for a given phenotype. In order to incorporate phenotypic context into co-expression, we recently developed an epistasis-expression network centrality method that blends the importance of gene-gene interactions (epistasis) and main effects of genes. Further blending of prior knowledge from functional interactions has the potential to enrich for relevant genes and stabilize classification. RESULTS: We develop two new expression-epistasis centrality methods that incorporate interaction prior knowledge. The first extends our SNPrank (EpistasisRank) method by incorporating a gene-wise prior knowledge vector. This prior knowledge vector informs the centrality algorithm of the inclination of a gene to be involved in interactions by incorporating functional interaction information from the Integrative Multi-species Prediction database. The second method extends Katz centrality to expression-epistasis networks (EpistasisKatz), extends the Katz bias to be a gene-wise vector of main effects and extends the Katz attenuation constant prefactor to be a prior-knowledge vector for interactions. Using independent microarray studies of major depressive disorder, we find that including prior knowledge in network centrality feature selection stabilizes the training classification and reduces over-fitting. AVAILABILITY AND IMPLEMENTATION: Methods and examples provided at https://github.com/insilico/Rinbix and https://github.com/insilico/PriorKnowledgeEpistasisRank. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Algoritmos , Transtorno Depressivo Maior , Epistasia Genética , Redes Reguladoras de Genes , Humanos , Fenótipo
4.
Microorganisms ; 7(3)2019 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-30875727

RESUMO

Vaccination is an effective prevention of influenza infection. However, certain individuals develop a lower antibody response after vaccination, which may lead to susceptibility to subsequent infection. An important challenge in human health is to find baseline gene signatures to help identify individuals who are at higher risk for infection despite influenza vaccination. We developed a multi-level machine learning strategy to build a predictive model of vaccine response using pre-vaccination antibody titers and network interactions between pre-vaccination gene expression levels. The first-level baseline-antibody model explains a significant amount of variation in post-vaccination response, especially for subjects with large pre-existing antibody titers. In the second level, we clustered individuals based on pre-vaccination antibody titers to focus gene-based modeling on individuals with lower baseline HAI where additional response variation may be predicted by baseline gene expression levels. In the third level, we used a gene-association interaction network (GAIN) feature selection algorithm to find the best pairs of genes that interact to influence antibody response within each baseline titer cluster. We used ratios of the top interacting genes as predictors to stabilize machine learning model generalizability. We trained and tested the multi-level approach on data with young and older individuals immunized against influenza vaccine in multiple cohorts. Our results indicate that the GAIN feature selection approach improves model generalizability and identifies genes enriched for immunologically relevant pathways, including B Cell Receptor signaling and antigen processing. Using a multi-level approach, starting with a baseline HAI model and stratifying on baseline HAI, allows for more targeted gene-based modeling. We provide an interactive tool that may be extended to other vaccine studies.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA