Pesquisa | BVS CLAP/SMR-OPAS/OMS

R.ROSETTA: an interpretable machine learning framework.

Garbulowski, Mateusz; Diamanti, Klev; Smolinska, Karolina; Baltzer, Nicholas; Stoll, Patricia; Bornelöv, Susanne; Øhrn, Aleksander; Feuk, Lars; Komorowski, Jan.

BMC Bioinformatics ; 22(1): 110, 2021 Mar 06.

Artigo em Inglês | MEDLINE | ID: mdl-33676405

RESUMO

BACKGROUND: Machine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough set theory. An important aim of our work was provision of statistical properties of the models and their components. RESULTS: We present the R.ROSETTA package, which is an R wrapper of ROSETTA framework. The original ROSETTA functions have been improved and adapted to the R programming environment. The package allows for building and analyzing non-linear interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA . To illustrate the usage of the package, we applied it to a transcriptome dataset from an autism case-control study. Our tool provided hypotheses for potential co-predictive mechanisms among features that discerned phenotype classes. These co-predictors represented neurodevelopmental and autism-related genes. CONCLUSIONS: R.ROSETTA provides new insights for interpretable machine learning analyses and knowledge-based systems. We demonstrated that our package facilitated detection of dependencies for autism-related genes. Although the sample application of R.ROSETTA illustrates transcriptome data analysis, the package can be used to analyze any data organized in decision tables.

Assuntos

Algoritmos , Aprendizado de Máquina , Estudos de Casos e Controles , Biologia Computacional , Mineração de Dados

Machine Learning-Based Analysis of Glioma Grades Reveals Co-Enrichment.

Garbulowski, Mateusz; Smolinska, Karolina; Çabuk, Ugur; Yones, Sara A; Celli, Ludovica; Yaz, Esma Nur; Barrenäs, Fredrik; Diamanti, Klev; Wadelius, Claes; Komorowski, Jan.

Cancers (Basel) ; 14(4)2022 Feb 17.

Artigo em Inglês | MEDLINE | ID: mdl-35205761

RESUMO

Gliomas develop and grow in the brain and central nervous system. Examining glioma grading processes is valuable for improving therapeutic challenges. One of the most extensive repositories storing transcriptomics data for gliomas is The Cancer Genome Atlas (TCGA). However, such big cohorts should be processed with caution and evaluated thoroughly as they can contain batch and other effects. Furthermore, biological mechanisms of cancer contain interactions among biomarkers. Thus, we applied an interpretable machine learning approach to discover such relationships. This type of transparent learning provides not only good predictability, but also reveals co-predictive mechanisms among features. In this study, we corrected the strong and confounded batch effect in the TCGA glioma data. We further used the corrected datasets to perform comprehensive machine learning analysis applied on single-sample gene set enrichment scores using collections from the Molecular Signature Database. Furthermore, using rule-based classifiers, we displayed networks of co-enrichment related to glioma grades. Moreover, we validated our results using the external glioma cohorts. We believe that utilizing corrected glioma cohorts from TCGA may improve the application and validation of any future studies. Finally, the co-enrichment and survival analysis provided detailed explanations for glioma progression and consequently, it should support the targeted treatment.

Functional annotation of noncoding mutations in cancer.

Umer, Husen M; Smolinska, Karolina; Komorowski, Jan; Wadelius, Claes.

Life Sci Alliance ; 4(9)2021 09.

Artigo em Inglês | MEDLINE | ID: mdl-34282050

RESUMO

In a cancer genome, the noncoding sequence contains the vast majority of somatic mutations. While very few are expected to be cancer drivers, those affecting regulatory elements have the potential to have downstream effects on gene regulation that may contribute to cancer progression. To prioritize regulatory mutations, we screened somatic mutations in the Pan-Cancer Analysis of Whole Genomes cohort of 2,515 cancer genomes on individual bases to assess their potential regulatory roles in their respective cancer types. We found a highly significant enrichment of regulatory mutations associated with the deamination signature overlapping a CpG site in the CCAAT/Enhancer Binding Protein ß recognition sites in many cancer types. Overall, 5,749 mutated regulatory elements were identified in 1,844 tumor samples from 39 cohorts containing 11,962 candidate regulatory mutations. Our analysis indicated 20 or more regulatory mutations in 5.5% of the samples, and an overall average of six per tumor. Several recurrent elements were identified, and major cancer-related pathways were significantly enriched for genes nearby the mutated regulatory elements. Our results provide a detailed view of the role of regulatory elements in cancer genomes.

Assuntos

Biologia Computacional , Genômica , Anotação de Sequência Molecular , Mutação , Neoplasias/genética , Regiões não Traduzidas , Sítios de Ligação , Biomarcadores Tumorais , Biologia Computacional/métodos , Suscetibilidade a Doenças , Regulação Neoplásica da Expressão Gênica , Predisposição Genética para Doença , Genômica/métodos , Humanos , Taxa de Mutação , Neoplasias/metabolismo , Motivos de Nucleotídeos , Ligação Proteica , Sequências Reguladoras de Ácido Nucleico , Transdução de Sinais , Fatores de Transcrição/metabolismo

Interpretable Machine Learning Reveals Dissimilarities Between Subtypes of Autism Spectrum Disorder.

Garbulowski, Mateusz; Smolinska, Karolina; Diamanti, Klev; Pan, Gang; Maqbool, Khurram; Feuk, Lars; Komorowski, Jan.

Front Genet ; 12: 618277, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33719335

RESUMO

Autism spectrum disorder (ASD) is a heterogeneous neuropsychiatric disorder with a complex genetic background. Analysis of altered molecular processes in ASD patients requires linear and nonlinear methods that provide interpretable solutions. Interpretable machine learning provides legible models that allow explaining biological mechanisms and support analysis of clinical subgroups. In this work, we investigated several case-control studies of gene expression measurements of ASD individuals. We constructed a rule-based learning model from three independent datasets that we further visualized as a nonlinear gene-gene co-predictive network. To find dissimilarities between ASD subtypes, we scrutinized a topological structure of the network and estimated a centrality distance. Our analysis revealed that autism is the most severe subtype of ASD, while pervasive developmental disorder-not otherwise specified and Asperger syndrome are closely related and milder ASD subtypes. Furthermore, we analyzed the most important ASD-related features that were described in terms of gene co-predictors. Among others, we found a strong co-predictive mechanism between EMC4 and TMEM30A, which may suggest a co-regulation between these genes. The present study demonstrates the potential of applying interpretable machine learning in bioinformatics analyses. Although the proposed methodology was designed for transcriptomics data, it can be applied to other omics disciplines.

Genomic characterization of relapsed acute myeloid leukemia reveals novel putative therapeutic targets.

Stratmann, Svea; Yones, Sara A; Mayrhofer, Markus; Norgren, Nina; Skaftason, Aron; Sun, Jitong; Smolinska, Karolina; Komorowski, Jan; Herlin, Morten Krogh; Sundström, Christer; Eriksson, Anna; Höglund, Martin; Palle, Josefine; Abrahamsson, Jonas; Jahnukainen, Kirsi; Munthe-Kaas, Monica Cheng; Zeller, Bernward; Tamm, Katja Pokrovskaja; Cavelier, Lucia; Holmfeldt, Linda.

Blood Adv ; 5(3): 900-912, 2021 02 09.

Artigo em Inglês | MEDLINE | ID: mdl-33560403

RESUMO

Relapse is the leading cause of death of adult and pediatric patients with acute myeloid leukemia (AML). Numerous studies have helped to elucidate the complex mutational landscape at diagnosis of AML, leading to improved risk stratification and new therapeutic options. However, multi-whole-genome studies of adult and pediatric AML at relapse are necessary for further advances. To this end, we performed whole-genome and whole-exome sequencing analyses of longitudinal diagnosis, relapse, and/or primary resistant specimens from 48 adult and 25 pediatric patients with AML. We identified mutations recurrently gained at relapse in ARID1A and CSF1R, both of which represent potentially actionable therapeutic alternatives. Further, we report specific differences in the mutational spectrum between adult vs pediatric relapsed AML, with MGA and H3F3A p.Lys28Met mutations recurrently found at relapse in adults, whereas internal tandem duplications in UBTF were identified solely in children. Finally, our study revealed recurrent mutations in IKZF1, KANSL1, and NIPBL at relapse. All of the mentioned genes have either never been reported at diagnosis in de novo AML or have been reported at low frequency, suggesting important roles for these alterations predominantly in disease progression and/or resistance to therapy. Our findings shed further light on the complexity of relapsed AML and identified previously unappreciated alterations that may lead to improved outcomes through personalized medicine.

Assuntos

Leucemia Mieloide Aguda , Proteínas de Ciclo Celular , Criança , Genômica , Humanos , Leucemia Mieloide Aguda/tratamento farmacológico , Leucemia Mieloide Aguda/genética , Mutação , Medicina de Precisão , Recidiva

EMQIT: a machine learning approach for energy based PWM matrix quality improvement.

Smolinska, Karolina; Pacholczyk, Marcin.

Biol Direct ; 12(1): 17, 2017 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-28764727

RESUMO

BACKGROUND: Transcription factor binding affinities to DNA play a key role for the gene regulation. Learning the specificity of the mechanisms of binding TFs to DNA is important both to experimentalists and theoreticians. With the development of high-throughput methods such as, e.g., ChiP-seq the need to provide unbiased models of binding events has been made apparent. We present EMQIT a modification to the approach introduced by Alamanova et al. and later implemented as 3DTF server. We observed that tuning of Boltzmann factor weights, used for conversion of calculated energies to nucleotide probabilities, has a significant impact on the quality of the associated PWM matrix. RESULTS: Consequently, we proposed to use receiver operator characteristics curves and the 10-fold cross-validation to learn best weights using experimentally verified data from TRANSFAC database. We applied our method to data available for various TFs. We verified the efficiency of detecting TF binding sites by the 3DTF matrices improved with our technique using experimental data from the TRANSFAC database. The comparison showed a significant similarity and comparable performance between the improved and the experimental matrices (TRANSFAC). Improved 3DTF matrices achieved significantly higher AUC values than the original 3DTF matrices (at least by 0.1) and, at the same time, detected notably more experimentally verified TFBSs. CONCLUSIONS: The resulting new improved PWM matrices for analyzed factors show similarity to TRANSFAC matrices. Matrices had comparable predictive capabilities. Moreover, improved PWMs achieve better results than matrices downloaded from 3DTF server. Presented approach is general and applicable to any energy-based matrices. EMQIT is available online at http://biosolvers.polsl.pl:3838/emqit . REVIEWERS: This article was reviewed by Oliviero Carugo, Marek Kimmel and István Simon.

Assuntos

Aprendizado de Máquina , Matrizes de Pontuação de Posição Específica , Fatores de Transcrição/química , Sítios de Ligação , Biologia Computacional/métodos , Conjuntos de Dados como Assunto , Regulação da Expressão Gênica , Modelos Genéticos , Modelos Moleculares , Curva ROC , Software , Fatores de Transcrição/metabolismo

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA