Pesquisa | Portal Regional da BVS

Machine Learning-Based Analysis of Glioma Grades Reveals Co-Enrichment.

Garbulowski, Mateusz; Smolinska, Karolina; Çabuk, Ugur; Yones, Sara A; Celli, Ludovica; Yaz, Esma Nur; Barrenäs, Fredrik; Diamanti, Klev; Wadelius, Claes; Komorowski, Jan.

Cancers (Basel) ; 14(4)2022 Feb 17.

Artigo em Inglês | MEDLINE | ID: mdl-35205761

RESUMO

Gliomas develop and grow in the brain and central nervous system. Examining glioma grading processes is valuable for improving therapeutic challenges. One of the most extensive repositories storing transcriptomics data for gliomas is The Cancer Genome Atlas (TCGA). However, such big cohorts should be processed with caution and evaluated thoroughly as they can contain batch and other effects. Furthermore, biological mechanisms of cancer contain interactions among biomarkers. Thus, we applied an interpretable machine learning approach to discover such relationships. This type of transparent learning provides not only good predictability, but also reveals co-predictive mechanisms among features. In this study, we corrected the strong and confounded batch effect in the TCGA glioma data. We further used the corrected datasets to perform comprehensive machine learning analysis applied on single-sample gene set enrichment scores using collections from the Molecular Signature Database. Furthermore, using rule-based classifiers, we displayed networks of co-enrichment related to glioma grades. Moreover, we validated our results using the external glioma cohorts. We believe that utilizing corrected glioma cohorts from TCGA may improve the application and validation of any future studies. Finally, the co-enrichment and survival analysis provided detailed explanations for glioma progression and consequently, it should support the targeted treatment.

Transcriptomic analysis reveals proinflammatory signatures associated with acute myeloid leukemia progression.

Stratmann, Svea; Yones, Sara A; Garbulowski, Mateusz; Sun, Jitong; Skaftason, Aron; Mayrhofer, Markus; Norgren, Nina; Herlin, Morten Krogh; Sundström, Christer; Eriksson, Anna; Höglund, Martin; Palle, Josefine; Abrahamsson, Jonas; Jahnukainen, Kirsi; Munthe-Kaas, Monica Cheng; Zeller, Bernward; Tamm, Katja Pokrovskaja; Cavelier, Lucia; Komorowski, Jan; Holmfeldt, Linda.

Blood Adv ; 6(1): 152-164, 2022 01 11.

Artigo em Inglês | MEDLINE | ID: mdl-34619772

RESUMO

Numerous studies have been performed over the last decade to exploit the complexity of genomic and transcriptomic lesions driving the initiation of acute myeloid leukemia (AML). These studies have helped improve risk classification and treatment options. Detailed molecular characterization of longitudinal AML samples is sparse, however; meanwhile, relapse and therapy resistance represent the main challenges in AML care. To this end, we performed transcriptome-wide RNA sequencing of longitudinal diagnosis, relapse, and/or primary resistant samples from 47 adult and 23 pediatric AML patients with known mutational background. Gene expression analysis revealed the association of short event-free survival with overexpression of GLI2 and IL1R1, as well as downregulation of ST18. Moreover, CR1 downregulation and DPEP1 upregulation were associated with AML relapse both in adults and children. Finally, machine learning-based and network-based analysis identified overexpressed CD6 and downregulated INSR as highly copredictive genes depicting important relapse-associated characteristics among adult patients with AML. Our findings highlight the importance of a tumor-promoting inflammatory environment in leukemia progression, as indicated by several of the herein identified differentially expressed genes. Together, this knowledge provides the foundation for novel personalized drug targets and has the potential to maximize the benefit of current treatments to improve cure rates in AML.

Assuntos

Leucemia Mieloide Aguda , Transcriptoma , Adulto , Criança , Perfilação da Expressão Gênica , Genômica , Humanos , Leucemia Mieloide Aguda/tratamento farmacológico , Mutação

Interpretable Machine Learning Reveals Dissimilarities Between Subtypes of Autism Spectrum Disorder.

Garbulowski, Mateusz; Smolinska, Karolina; Diamanti, Klev; Pan, Gang; Maqbool, Khurram; Feuk, Lars; Komorowski, Jan.

Front Genet ; 12: 618277, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33719335

RESUMO

Autism spectrum disorder (ASD) is a heterogeneous neuropsychiatric disorder with a complex genetic background. Analysis of altered molecular processes in ASD patients requires linear and nonlinear methods that provide interpretable solutions. Interpretable machine learning provides legible models that allow explaining biological mechanisms and support analysis of clinical subgroups. In this work, we investigated several case-control studies of gene expression measurements of ASD individuals. We constructed a rule-based learning model from three independent datasets that we further visualized as a nonlinear gene-gene co-predictive network. To find dissimilarities between ASD subtypes, we scrutinized a topological structure of the network and estimated a centrality distance. Our analysis revealed that autism is the most severe subtype of ASD, while pervasive developmental disorder-not otherwise specified and Asperger syndrome are closely related and milder ASD subtypes. Furthermore, we analyzed the most important ASD-related features that were described in terms of gene co-predictors. Among others, we found a strong co-predictive mechanism between EMC4 and TMEM30A, which may suggest a co-regulation between these genes. The present study demonstrates the potential of applying interpretable machine learning in bioinformatics analyses. Although the proposed methodology was designed for transcriptomics data, it can be applied to other omics disciplines.

R.ROSETTA: an interpretable machine learning framework.

Garbulowski, Mateusz; Diamanti, Klev; Smolinska, Karolina; Baltzer, Nicholas; Stoll, Patricia; Bornelöv, Susanne; Øhrn, Aleksander; Feuk, Lars; Komorowski, Jan.

BMC Bioinformatics ; 22(1): 110, 2021 Mar 06.

Artigo em Inglês | MEDLINE | ID: mdl-33676405

RESUMO

BACKGROUND: Machine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough set theory. An important aim of our work was provision of statistical properties of the models and their components. RESULTS: We present the R.ROSETTA package, which is an R wrapper of ROSETTA framework. The original ROSETTA functions have been improved and adapted to the R programming environment. The package allows for building and analyzing non-linear interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA . To illustrate the usage of the package, we applied it to a transcriptome dataset from an autism case-control study. Our tool provided hypotheses for potential co-predictive mechanisms among features that discerned phenotype classes. These co-predictors represented neurodevelopmental and autism-related genes. CONCLUSIONS: R.ROSETTA provides new insights for interpretable machine learning analyses and knowledge-based systems. We demonstrated that our package facilitated detection of dependencies for autism-related genes. Although the sample application of R.ROSETTA illustrates transcriptome data analysis, the package can be used to analyze any data organized in decision tables.

Assuntos

Algoritmos , Aprendizado de Máquina , Estudos de Casos e Controles , Biologia Computacional , Mineração de Dados

Coalescence computations for large samples drawn from populations of time-varying sizes.

Polanski, Andrzej; Szczesna, Agnieszka; Garbulowski, Mateusz; Kimmel, Marek.

PLoS One ; 12(2): e0170701, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28170404

RESUMO

We present new results concerning probability distributions of times in the coalescence tree and expected allele frequencies for coalescent with large sample size. The obtained results are based on computational methodologies, which involve combining coalescence time scale changes with techniques of integral transformations and using analytical formulae for infinite products. We show applications of the proposed methodologies for computing probability distributions of times in the coalescence tree and their limits, for evaluation of accuracy of approximate expressions for times in the coalescence tree and expected allele frequencies, and for analysis of large human mitochondrial DNA dataset.

Assuntos

Modelos Genéticos , Modelos Estatísticos , Algoritmos , DNA Mitocondrial , Bases de Dados de Ácidos Nucleicos , Frequência do Gene , Genética Populacional , Humanos , Funções Verossimilhança , Probabilidade , Reprodutibilidade dos Testes

RareVariantVis: new tool for visualization of causative variants in rare monogenic disorders using whole genome sequencing data.

Stokowy, Tomasz; Garbulowski, Mateusz; Fiskerstrand, Torunn; Holdhus, Rita; Labun, Kornel; Sztromwasser, Pawel; Gilissen, Christian; Hoischen, Alexander; Houge, Gunnar; Petersen, Kjell; Jonassen, Inge; Steen, Vidar M.

Bioinformatics ; 32(19): 3018-20, 2016 10 01.

Artigo em Inglês | MEDLINE | ID: mdl-27288501

RESUMO

MOTIVATION: The search for causative genetic variants in rare diseases of presumed monogenic inheritance has been boosted by the implementation of whole exome (WES) and whole genome (WGS) sequencing. In many cases, WGS seems to be superior to WES, but the analysis and visualization of the vast amounts of data is demanding. RESULTS: To aid this challenge, we have developed a new tool-RareVariantVis-for analysis of genome sequence data (including non-coding regions) for both germ line and somatic variants. It visualizes variants along their respective chromosomes, providing information about exact chromosomal position, zygosity and frequency, with point-and-click information regarding dbSNP IDs, gene association and variant inheritance. Rare variants as well as de novo variants can be flagged in different colors. We show the performance of the RareVariantVis tool in the Genome in a Bottle WGS data set. AVAILABILITY AND IMPLEMENTATION: https://www.bioconductor.org/packages/3.3/bioc/html/RareVariantVis.html CONTACT: tomasz.stokowy@k2.uib.no SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Exoma , Genoma Humano , Doenças Raras/genética , Análise de Sequência de DNA/métodos , Variação Genética , Humanos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA