Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Microbiol Spectr ; 12(7): e0410823, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38832899

RESUMO

The rapid spread of antimicrobial resistance (AMR) is a threat to global health, and the nature of co-occurring antimicrobial resistance genes (ARGs) may cause collateral AMR effects once antimicrobial agents are used. Therefore, it is essential to identify which pairs of ARGs co-occur. Given the wealth of next-generation sequencing data available in public repositories, we have investigated the correlation between ARG abundances in a collection of 214,095 metagenomic data sets. Using more than 6.76∙108 read fragments aligned to acquired ARGs to infer pairwise correlation coefficients, we found that more ARGs correlated with each other in human and animal sampling origins than in soil and water environments. Furthermore, we argued that the correlations could serve as risk profiles of resistance co-occurring to critically important antimicrobials (CIAs). Using these profiles, we found evidence of several ARGs conferring resistance for CIAs being co-abundant, such as tetracycline ARGs correlating with most other forms of resistance. In conclusion, this study highlights the important ARG players indirectly involved in shaping the resistomes of various environments that can serve as monitoring targets in AMR surveillance programs. IMPORTANCE: Understanding the collateral effects happening in a resistome can reveal previously unknown links between antimicrobial resistance genes (ARGs). Through the analysis of pairwise ARG abundances in 214K metagenomic samples, we observed that the co-abundance is highly dependent on the environmental context and argue that these correlations can be used to show the risk of co-selection occurring in different settings.


Assuntos
Antibacterianos , Bactérias , Farmacorresistência Bacteriana , Metagenômica , Humanos , Antibacterianos/farmacologia , Bactérias/genética , Bactérias/efeitos dos fármacos , Bactérias/classificação , Farmacorresistência Bacteriana/genética , Animais , Genes Bacterianos/genética , Microbiologia do Solo , Sequenciamento de Nucleotídeos em Larga Escala , Metagenoma/genética
2.
medRxiv ; 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38766190

RESUMO

INTRODUCTION: Traditional brain imaging genetics studies have primarily focused on how genetic factors influence the volume of specific brain regions, often neglecting the overall complexity of brain architecture and its genetic underpinnings. METHODS: This study analyzed data from participants across the Alzheimer's disease (AD) continuum from the ALFA and ADNI studies. We exploited compositional data analysis to examine relative brain volumetric variations that (i) differentiate cognitively unimpaired (CU) individuals, defined as amyloid-negative (A-) based on CSF profiling, from those at different AD stages, and (ii) associated with increased genetic susceptibility to AD, assessed using polygenic risk scores. RESULTS: Distinct brain signatures differentiated CU A-individuals from amyloid-positive MCI and AD. Moreover, disease stage-specific signatures were associated with higher genetic risk of AD. DISCUSSION: The findings underscore the complex interplay between genetics and disease stages in shaping brain structure, which could inform targeted preventive strategies and interventions in preclinical AD.

3.
NAR Genom Bioinform ; 6(2): lqae038, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38666212

RESUMO

The growing interest in studying the relationship between the human microbiome and our health has also extended to time-to-event studies where researchers explore the connection between the microbiome and the occurrence of a specific event of interest. The analysis of microbiome obtained through high throughput sequencing techniques requires the use of specialized Compositional Data Analysis (CoDA) methods designed to accommodate its compositional nature. There is a limited availability of statistical tools for microbiome analysis that incorporate CoDA, and this is even more pronounced in the context of survival analysis. To fill this methodological gap, we present coda4microbiome for survival studies, a new methodology for the identification of microbial signatures in time-to-event studies. The algorithm implements an elastic-net penalized Cox regression model adapted to compositional covariates. We illustrate coda4microbiome algorithm for survival studies with a case study about the time to develop type 1 diabetes for non-obese diabetic mice. Our algorithm identified a bacterial signature composed of 21 genera associated with diabetes development. coda4microbiome for survival studies is integrated in the R package coda4microbiome as an extension of the existing functions for cross-sectional and longitudinal studies.

4.
Front Microbiol ; 14: 1250806, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38075858

RESUMO

The human microbiome has become an area of intense research due to its potential impact on human health. However, the analysis and interpretation of this data have proven to be challenging due to its complexity and high dimensionality. Machine learning (ML) algorithms can process vast amounts of data to uncover informative patterns and relationships within the data, even with limited prior knowledge. Therefore, there has been a rapid growth in the development of software specifically designed for the analysis and interpretation of microbiome data using ML techniques. These software incorporate a wide range of ML algorithms for clustering, classification, regression, or feature selection, to identify microbial patterns and relationships within the data and generate predictive models. This rapid development with a constant need for new developments and integration of new features require efforts into compile, catalog and classify these tools to create infrastructures and services with easy, transparent, and trustable standards. Here we review the state-of-the-art for ML tools applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on ML based software and framework resources currently available for the analysis of microbiome data in humans. The aim is to support microbiologists and biomedical scientists to go deeper into specialized resources that integrate ML techniques and facilitate future benchmarking to create standards for the analysis of microbiome data. The software resources are organized based on the type of analysis they were developed for and the ML techniques they implement. A description of each software with examples of usage is provided including comments about pitfalls and lacks in the usage of software based on ML methods in relation to microbiome data that need to be considered by developers and users. This review represents an extensive compilation to date, offering valuable insights and guidance for researchers interested in leveraging ML approaches for microbiome analysis.

5.
Front Oncol ; 13: 1155244, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37588099

RESUMO

Background and objective: Neoadjuvant chemotherapy (NAC) followed by cystectomy is the standard of care in muscle-invasive bladder cancer (MIBC). Pathological response has been associated with longer survival, but no currently available clinicopathological variables can identify patients likely to respond, highlighting the need for predictive biomarkers. We sought to identify a predictive signature of response to NAC integrating clinical score, taxonomic subtype, and gene expression. Material and methods: From 1994 to 2014, pre-treatment tumor samples were collected from MIBC patients (stage T2-4N0/+M0) at two Spanish hospitals. A clinical score was determined based on stage, hydronephrosis and histology. Taxonomic subtypes (BASQ, luminal, and mixed) were identified by immunohistochemistry. A custom set of 41 genes involved in DNA damage repair and immune response was analyzed in 84 patients with the NanoString nCounter platform. Genes related to pathological response were identified by LASSO penalized logistic regression. NAC consisted of cisplatin/methotrexate/vinblastine until 2000, after which most patients received cisplatin/gemcitabine. The capacity of the integrated signature to predict pathological response was assessed with AUC. Overall survival (OS) and disease-specific survival (DSS) were analyzed with the Kaplan-Meier method. Results: LASSO selected eight genes to be included in the signature (RAD51, IFNγ, CHEK1, CXCL9, c-MET, KRT14, HERC2, FOXA1). The highest predictive accuracy was observed with the inclusion in the model of only three genes (RAD51, IFNɣ, CHEK1). The integrated clinical-taxonomic-gene expression signature including these three genes had a higher predictive ability (AUC=0.71) than only clinical score plus taxonomic subtype (AUC=0.58) or clinical score alone (AUC=0.56). This integrated signature was also significantly associated with OS (p=0.02) and DSS (p=0.02). Conclusions: We have identified a predictive signature for response to NAC in MIBC patients that integrates the expression of three genes with clinicopathological characteristics and taxonomic subtypes. Prospective studies to validate these results are ongoing.

6.
BMC Bioinformatics ; 24(1): 82, 2023 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-36879227

RESUMO

BACKGROUND: One of the main challenges of microbiome analysis is its compositional nature that if ignored can lead to spurious results. Addressing the compositional structure of microbiome data is particularly critical in longitudinal studies where abundances measured at different times can correspond to different sub-compositions. RESULTS: We developed coda4microbiome, a new R package for analyzing microbiome data within the Compositional Data Analysis (CoDA) framework in both, cross-sectional and longitudinal studies. The aim of coda4microbiome is prediction, more specifically, the method is designed to identify a model (microbial signature) containing the minimum number of features with the maximum predictive power. The algorithm relies on the analysis of log-ratios between pairs of components and variable selection is addressed through penalized regression on the "all-pairs log-ratio model", the model containing all possible pairwise log-ratios. For longitudinal data, the algorithm infers dynamic microbial signatures by performing penalized regression over the summary of the log-ratio trajectories (the area under these trajectories). In both, cross-sectional and longitudinal studies, the inferred microbial signature is expressed as the (weighted) balance between two groups of taxa, those that contribute positively to the microbial signature and those that contribute negatively. The package provides several graphical representations that facilitate the interpretation of the analysis and the identified microbial signatures. We illustrate the new method with data from a Crohn's disease study (cross-sectional data) and on the developing microbiome of infants (longitudinal data). CONCLUSIONS: coda4microbiome is a new algorithm for identification of microbial signatures in both, cross-sectional and longitudinal studies. The algorithm is implemented as an R package that is available at CRAN ( https://cran.r-project.org/web/packages/coda4microbiome/ ) and is accompanied with a vignette with a detailed description of the functions. The website of the project contains several tutorials: https://malucalle.github.io/coda4microbiome/.


Assuntos
Algoritmos , Microbiota , Lactente , Humanos , Estudos Transversais , Análise de Dados , Estudos Longitudinais
7.
NAR Genom Bioinform ; 2(2): lqaa029, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33575585

RESUMO

Though variable selection is one of the most relevant tasks in microbiome analysis, e.g. for the identification of microbial signatures, many studies still rely on methods that ignore the compositional nature of microbiome data. The applicability of compositional data analysis methods has been hampered by the availability of software and the difficulty in interpreting their results. This work is focused on three methods for variable selection that acknowledge the compositional structure of microbiome data: selbal, a forward selection approach for the identification of compositional balances, and clr-lasso and coda-lasso, two penalized regression models for compositional data analysis. This study highlights the link between these methods and brings out some limitations of the centered log-ratio transformation for variable selection. In particular, the fact that it is not subcompositionally consistent makes the microbial signatures obtained from clr-lasso not readily transferable. Coda-lasso is computationally efficient and suitable when the focus is the identification of the most associated microbial taxa. Selbal stands out when the goal is to obtain a parsimonious model with optimal prediction performance, but it is computationally greedy. We provide a reproducible vignette for the application of these methods that will enable researchers to fully leverage their potential in microbiome studies.

8.
Genomics Inform ; 17(1): e6, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30929407

RESUMO

Understanding the role of the microbiome in human health and how it can be modulated is becoming increasingly relevant for preventive medicine and for the medical management of chronic diseases. The development of high-throughput sequencing technologies has boosted microbiome research through the study of microbial genomes and allowing a more precise quantification of microbiome abundances and function. Microbiome data analysis is challenging because it involves high-dimensional structured multivariate sparse data and because of its compositional nature. In this review we outline some of the procedures that are most commonly used for microbiome analysis and that are implemented in R packages. We place particular emphasis on the compositional structure of microbiome data. We describe the principles of compositional data analysis and distinguish between standard methods and those that fit into compositional data analysis.

9.
Genes (Basel) ; 10(3)2019 03 20.
Artigo em Inglês | MEDLINE | ID: mdl-30897838

RESUMO

Omics data integration is already a reality. However, few omics-based algorithms show enough predictive ability to be implemented into clinics or public health domains. Clinical/epidemiological data tend to explain most of the variation of health-related traits, and its joint modeling with omics data is crucial to increase the algorithm's predictive ability. Only a small number of published studies performed a "real" integration of omics and non-omics (OnO) data, mainly to predict cancer outcomes. Challenges in OnO data integration regard the nature and heterogeneity of non-omics data, the possibility of integrating large-scale non-omics data with high-throughput omics data, the relationship between OnO data (i.e., ascertainment bias), the presence of interactions, the fairness of the models, and the presence of subphenotypes. These challenges demand the development and application of new analysis strategies to integrate OnO data. In this contribution we discuss different attempts of OnO data integration in clinical and epidemiological studies. Most of the reviewed papers considered only one type of omics data set, mainly RNA expression data. All selected papers incorporated non-omics data in a low-dimensionality fashion. The integrative strategies used in the identified papers adopted three modeling methods: Independent, conditional, and joint modeling. This review presents, discusses, and proposes integrative analytical strategies towards OnO data integration.


Assuntos
Biologia Computacional/métodos , Locos de Características Quantitativas , Algoritmos , Predisposição Genética para Doença , Genômica , Humanos , Modelos Genéticos , Prognóstico , Análise de Sequência de RNA
10.
Cancer Cell ; 30(1): 27-42, 2016 07 11.
Artigo em Inglês | MEDLINE | ID: mdl-27321955

RESUMO

Non-muscle-invasive bladder cancer (NMIBC) is a heterogeneous disease with widely different outcomes. We performed a comprehensive transcriptional analysis of 460 early-stage urothelial carcinomas and showed that NMIBC can be subgrouped into three major classes with basal- and luminal-like characteristics and different clinical outcomes. Large differences in biological processes such as the cell cycle, epithelial-mesenchymal transition, and differentiation were observed. Analysis of transcript variants revealed frequent mutations in genes encoding proteins involved in chromatin organization and cytoskeletal functions. Furthermore, mutations in well-known cancer driver genes (e.g., TP53 and ERBB2) were primarily found in high-risk tumors, together with APOBEC-related mutational signatures. The identification of subclasses in NMIBC may offer better prognostication and treatment selection based on subclass assignment.


Assuntos
Biomarcadores Tumorais/genética , Perfilação da Expressão Gênica/métodos , Mutação , Análise de Sequência de RNA/métodos , Neoplasias da Bexiga Urinária/genética , Neoplasias da Bexiga Urinária/patologia , Desaminases APOBEC/genética , Análise por Conglomerados , Feminino , Regulação Neoplásica da Expressão Gênica , Predisposição Genética para Doença , Humanos , Masculino , Estadiamento de Neoplasias , RNA Longo não Codificante/genética , Análise de Sobrevida
11.
IEEE/ACM Trans Comput Biol Bioinform ; 13(6): 1100-1106, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-28055892

RESUMO

The goal of Genome-wide Association Studies (GWAS) is the identification of genetic variants, usually single nucleotide polymorphisms (SNPs), that are associated with disease risk. However, SNPs detected so far with GWAS for most common diseases only explain a small proportion of their total heritability. Gene set analysis (GSA) has been proposed as an alternative to single-SNP analysis with the aim of improving the power of genetic association studies. Nevertheless, most GSA methods rely on expensive computational procedures that make unfeasible their implementation in GWAS. We propose a new GSA method, referred as globalEVT, which uses the extreme value theory to derive gene-level p-values. GlobalEVT reduces dramatically the computational requirements compared to other GSA approaches. In addition, this new approach improves the power by allowing different inheritance models for each genetic variant as illustrated in the simulation study performed and allows the existence of correlation between the SNPs. Real data analysis of an Attention-deficit/hyperactivity disorder (ADHD) study illustrates the importance of using GSA approaches for exploring new susceptibility genes. Specifically, the globalEVT method is able to detect genes related to Cyclophilin A like domain proteins which is known to play an important role in the mechanisms of ADHD development.


Assuntos
Algoritmos , Estudo de Associação Genômica Ampla/métodos , Transtorno do Deficit de Atenção com Hiperatividade/genética , Criança , Simulação por Computador , Feminino , Humanos , Masculino , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único/genética
12.
Biom J ; 56(5): 901-11, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25082012

RESUMO

Gene set analysis (GSA) aims to assess the overall association of a set of genetic variants with a phenotype and has the potential to detect subtle effects of variants in a gene or a pathway that might be missed when assessed individually. We present a new implementation of the Adaptive Rank Truncated Product method (ARTP) for analyzing the association of a set of Single Nucleotide Polymorphisms (SNPs) in a gene or pathway. The new implementation, referred to as globalARTP, improves the original one by allowing the different SNPs in the set to have different modes of inheritance. We perform a simulation study for exploring the power of the proposed methodology in a set of scenarios with different numbers of causal SNPs with different effect sizes. Moreover, we show the advantage of using the gene set approach in the context of an Alzheimer's disease case-control study where we explore the endocytosis pathway. The new method is implemented in the R function globalARTP of the globalGSA package available at http://cran.r-project.org.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Doença de Alzheimer/genética , Estudos de Casos e Controles , Simulação por Computador , Endocitose , Variação Genética , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética
13.
PLoS One ; 9(5): e89952, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24818791

RESUMO

INTRODUCTION: Germline variants in TP63 have been consistently associated with several tumors, including bladder cancer, indicating the importance of TP53 pathway in cancer genetic susceptibility. However, variants in other related genes, including TP53 rs1042522 (Arg72Pro), still present controversial results. We carried out an in depth assessment of associations between common germline variants in the TP53 pathway and bladder cancer risk. MATERIAL AND METHODS: We investigated 184 tagSNPs from 18 genes in 1,058 cases and 1,138 controls from the Spanish Bladder Cancer/EPICURO Study. Cases were newly-diagnosed bladder cancer patients during 1998-2001. Hospital controls were age-gender, and area matched to cases. SNPs were genotyped in blood DNA using Illumina Golden Gate and TaqMan assays. Cases were subphenotyped according to stage/grade and tumor p53 expression. We applied classical tests to assess individual SNP associations and the Least Absolute Shrinkage and Selection Operator (LASSO)-penalized logistic regression analysis to assess multiple SNPs simultaneously. RESULTS: Based on classical analyses, SNPs in BAK1 (1), IGF1R (5), P53AIP1 (1), PMAIP1 (2), SERINPB5 (3), TP63 (3), and TP73 (1) showed significant associations at p-value≤0.05. However, no evidence of association, either with overall risk or with specific disease subtypes, was observed after correction for multiple testing (p-value≥0.8). LASSO selected the SNP rs6567355 in SERPINB5 with 83% of reproducibility. This SNP provided an OR = 1.21, 95%CI 1.05-1.38, p-value = 0.006, and a corrected p-value = 0.5 when controlling for over-estimation. DISCUSSION: We found no strong evidence that common variants in the TP53 pathway are associated with bladder cancer susceptibility. Our study suggests that it is unlikely that TP53 Arg72Pro is implicated in the UCB in white Europeans. SERPINB5 and TP63 variation deserve further exploration in extended studies.


Assuntos
Proteína Supressora de Tumor p53/genética , Neoplasias da Bexiga Urinária/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Feminino , Predisposição Genética para Doença/genética , Variação Genética/genética , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único/genética , Adulto Jovem
14.
Genet Epidemiol ; 38(5): 467-76, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-24796258

RESUMO

To build a predictive model for urothelial carcinoma of the bladder (UCB) risk combining both genomic and nongenomic data, 1,127 cases and 1,090 controls from the Spanish Bladder Cancer/EPICURO study were genotyped using the HumanHap 1M SNP array. After quality control filters, genotypes from 475,290 variants were available. Nongenomic information comprised age, gender, region, and smoking status. Three Bayesian threshold models were implemented including: (1) only genomic information, (2) only nongenomic data, and (3) both sources of information. The three models were applied to the whole population, to only nonsmokers, to male smokers, and to extreme phenotypes to potentiate the UCB genetic component. The area under the ROC curve allowed evaluating the predictive ability of each model in a 10-fold cross-validation scenario. Smoking status showed the highest predictive ability of UCB risk (AUCtest = 0.62). On the other hand, the AUC of all genetic variants was poorer (0.53). When the extreme phenotype approach was applied, the predictive ability of the genomic model improved 15%. This study represents a first attempt to build a predictive model for UCB risk combining both genomic and nongenomic data and applying state-of-the-art statistical approaches. However, the lack of genetic relatedness among individuals, the complexity of UCB etiology, as well as a relatively small statistical power, may explain the low predictive ability for UCB risk. The study confirms the difficulty of predicting complex diseases using genetic data, and suggests the limited translational potential of findings from this type of data into public health interventions.


Assuntos
Predisposição Genética para Doença/genética , Genoma Humano/genética , Neoplasias da Bexiga Urinária/genética , Adulto , Idoso , Idoso de 80 Anos ou mais , Teorema de Bayes , Estudos de Casos e Controles , Feminino , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Curva ROC , Fatores de Risco , Fumar/efeitos adversos
15.
PLoS One ; 8(12): e83745, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24391818

RESUMO

The relationship between inflammation and cancer is well established in several tumor types, including bladder cancer. We performed an association study between 886 inflammatory-gene variants and bladder cancer risk in 1,047 cases and 988 controls from the Spanish Bladder Cancer (SBC)/EPICURO Study. A preliminary exploration with the widely used univariate logistic regression approach did not identify any significant SNP after correcting for multiple testing. We further applied two more comprehensive methods to capture the complexity of bladder cancer genetic susceptibility: Bayesian Threshold LASSO (BTL), a regularized regression method, and AUC-Random Forest, a machine-learning algorithm. Both approaches explore the joint effect of markers. BTL analysis identified a signature of 37 SNPs in 34 genes showing an association with bladder cancer. AUC-RF detected an optimal predictive subset of 56 SNPs. 13 SNPs were identified by both methods in the total population. Using resources from the Texas Bladder Cancer study we were able to replicate 30% of the SNPs assessed. The associations between inflammatory SNPs and bladder cancer were reexamined among non-smokers to eliminate the effect of tobacco, one of the strongest and most prevalent environmental risk factor for this tumor. A 9 SNP-signature was detected by BTL. Here we report, for the first time, a set of SNP in inflammatory genes jointly associated with bladder cancer risk. These results highlight the importance of the complex structure of genetic susceptibility associated with cancer risk.


Assuntos
Teorema de Bayes , Biomarcadores Tumorais/genética , Mediadores da Inflamação/análise , Inflamação/genética , Polimorfismo de Nucleotídeo Único/genética , Fumar/genética , Neoplasias da Bexiga Urinária/etiologia , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Inteligência Artificial , Estudos de Casos e Controles , Feminino , Seguimentos , Predisposição Genética para Doença , Humanos , Masculino , Pessoa de Meia-Idade , Prognóstico , Fatores de Risco , Fumar/efeitos adversos , Texas , Adulto Jovem
16.
Stat Med ; 31(3): 287-300, 2012 Feb 10.
Artigo em Inglês | MEDLINE | ID: mdl-22161505

RESUMO

We propose a multistate modeling approach to describe the observed evolution of patients diagnosed with non-muscle-invasive bladder cancer. On the basis of data from the Spanish Bladder Cancer/EPICURO study, we adjust a multistate model taking into account the disease-related events of interest (recurrence, progression, and disease-related deaths) as well as competing deaths due to other causes. We then develop a dynamic predictive process for bladder cancer progression, which allows the risk of a patient to be updated whenever new information of his or her evolution is available. By using specific measures of prospective accuracy in the presence of competing risks, the proposed dynamic model has shown to improve prediction accuracy and provides a more personalized management of bladder patients.


Assuntos
Carcinoma/mortalidade , Progressão da Doença , Modelos Biológicos , Análise de Sobrevida , Neoplasias da Bexiga Urinária/mortalidade , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Risco , Espanha/epidemiologia
17.
Hum Hered ; 72(2): 121-32, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21996641

RESUMO

OBJECTIVE: Genomic profiling, the use of genetic variants at multiple loci simultaneously for the prediction of disease risk, requires the selection of a set of genetic variants that best predicts disease status. The goal of this work was to provide a new selection algorithm for genomic profiling. METHODS: We propose a new algorithm for genomic profiling based on optimizing the area under the receiver operating characteristic curve (AUC) of the random forest (RF). The proposed strategy implements a backward elimination process based on the initial ranking of variables. RESULTS AND CONCLUSIONS: We demonstrate the advantage of using the AUC instead of the classification error as a measure of predictive accuracy of RF. In particular, we show that the use of the classification error is especially inappropriate when dealing with unbalanced data sets. The new procedure for variable selection and prediction, namely AUC-RF, is illustrated with data from a bladder cancer study and also with simulated data. The algorithm is publicly available as an R package, named AUCRF, at http://cran.r-project.org/.


Assuntos
Algoritmos , Área Sob a Curva , Biologia Computacional , Perfilação da Expressão Gênica/métodos , Software , Humanos , Internet , Polimorfismo de Nucleotídeo Único , Curva ROC , Reprodutibilidade dos Testes , Fatores de Risco , Neoplasias da Bexiga Urinária/genética
18.
Brief Bioinform ; 12(1): 86-9, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-20360022

RESUMO

The goal of this article (letter to the editor) is to emphasize the value of exploring ranking stability when using the importance measures, mean decrease accuracy (MDA) and mean decrease Gini (MDG), provided by Random Forest. We illustrate with a real and a simulated example that ranks based on the MDA are unstable to small perturbations of the dataset and ranks based on the MDG provide more robust results.


Assuntos
Inteligência Artificial , Algoritmos , Simulação por Computador , Bases de Dados Factuais , Perfilação da Expressão Gênica
19.
Ann Hum Genet ; 75(1): 78-89, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21158747

RESUMO

Analyzing the combined effects of genes and/or environmental factors on the development of complex diseases is a great challenge from both the statistical and computational perspective, even using a relatively small number of genetic and nongenetic exposures. Several data-mining methods have been proposed for interaction analysis, among them, the Multifactor Dimensionality Reduction Method (MDR) has proven its utility in a variety of theoretical and practical settings. Model-Based Multifactor Dimensionality Reduction (MB-MDR), a relatively new MDR-based technique that is able to unify the best of both nonparametric and parametric worlds, was developed to address some of the remaining concerns that go along with an MDR analysis. These include the restriction to univariate, dichotomous traits, the absence of flexible ways to adjust for lower order effects and important confounders, and the difficulty in highlighting epistatic effects when too many multilocus genotype cells are pooled into two new genotype groups. We investigate the empirical power of MB-MDR to detect gene-gene interactions in the absence of any noise and in the presence of genotyping error, missing data, phenocopy, and genetic heterogeneity. Power is generally higher for MB-MDR than for MDR, in particular in the presence of genetic heterogeneity, phenocopy, or low minor allele frequencies.


Assuntos
Doença/genética , Epistasia Genética , Modelos Genéticos , Redução Dimensional com Múltiplos Fatores , Estudos de Casos e Controles , Simulação por Computador
20.
Bioinformatics ; 26(17): 2198-9, 2010 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-20595460

RESUMO

SUMMARY: We describe mbmdr, an R package for implementing the model-based multifactor dimensionality reduction (MB-MDR) method. MB-MDR has been proposed by Calle et al. as a dimension reduction method for exploring gene-gene interactions in case-control association studies. It is an extension of the popular multifactor dimensionality reduction (MDR) method of Ritchie et al. allowing a more flexible definition of risk cells. In MB-MDR, risk categories are defined using a regression model which allows adjustment for covariates and main effects and, in addition to the classical low risk and high risk categories, MB-MDR considers a third category of indeterminate or not informative cells. An important improvement added to the current mbmdr algorithm with respect to the original MB-MDR formulation in Calle et al. and also to the classical MDR approach, is the extension of the methodology to different outcome types. While MB-MDR was initially proposed for binary traits in the context of case-control studies, the mbmdr package provides options to analyze both binary or quantitative traits for unrelated individuals. AVAILABILITY: http://cran.r-project.org/.


Assuntos
Algoritmos , Epistasia Genética , Modelos Genéticos , Software , Estudos de Casos e Controles , Expressão Gênica , Genótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Análise de Regressão
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA