Pesquisa | Portal Regional da BVS

GPUTreeShap: massively parallel exact calculation of SHAP scores for tree ensembles.

Mitchell, Rory; Frank, Eibe; Holmes, Geoffrey.

PeerJ Comput Sci ; 8: e880, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35494875

RESUMO

SHapley Additive exPlanation (SHAP) values (Lundberg & Lee, 2017) provide a game theoretic interpretation of the predictions of machine learning models based on Shapley values (Shapley, 1953). While exact calculation of SHAP values is computationally intractable in general, a recursive polynomial-time algorithm called TreeShap (Lundberg et al., 2020) is available for decision tree models. However, despite its polynomial time complexity, TreeShap can become a significant bottleneck in practical machine learning pipelines when applied to large decision tree ensembles. Unfortunately, the complicated TreeShap algorithm is difficult to map to hardware accelerators such as GPUs. In this work, we present GPUTreeShap, a reformulated TreeShap algorithm suitable for massively parallel computation on graphics processing units. Our approach first preprocesses each decision tree to isolate variable sized sub-problems from the original recursive algorithm, then solves a bin packing problem, and finally maps sub-problems to single-instruction, multiple-thread (SIMT) tasks for parallel execution with specialised hardware instructions. With a single NVIDIA Tesla V100-32 GPU, we achieve speedups of up to 19× for SHAP values, and speedups of up to 340× for SHAP interaction values, over a state-of-the-art multi-core CPU implementation executed on two 20-core Xeon E5-2698 v4 2.2 GHz CPUs. We also experiment with multi-GPU computing using eight V100 GPUs, demonstrating throughput of 1.2 M rows per second-equivalent CPU-based performance is estimated to require 6850 CPU cores.

Deep learning in diabetic foot ulcers detection: A comprehensive evaluation.

Yap, Moi Hoon; Hachiuma, Ryo; Alavi, Azadeh; Brüngel, Raphael; Cassidy, Bill; Goyal, Manu; Zhu, Hongtao; Rückert, Johannes; Olshansky, Moshe; Huang, Xiao; Saito, Hideo; Hassanpour, Saeed; Friedrich, Christoph M; Ascher, David B; Song, Anping; Kajita, Hiroki; Gillespie, David; Reeves, Neil D; Pappachan, Joseph M; O'Shea, Claire; Frank, Eibe.

Comput Biol Med ; 135: 104596, 2021 08.

Artigo em Inglês | MEDLINE | ID: mdl-34247133

RESUMO

There has been a substantial amount of research involving computer methods and technology for the detection and recognition of diabetic foot ulcers (DFUs), but there is a lack of systematic comparisons of state-of-the-art deep learning object detection frameworks applied to this problem. DFUC2020 provided participants with a comprehensive dataset consisting of 2,000 images for training and 2,000 images for testing. This paper summarizes the results of DFUC2020 by comparing the deep learning-based algorithms proposed by the winning teams: Faster R-CNN, three variants of Faster R-CNN and an ensemble method; YOLOv3; YOLOv5; EfficientDet; and a new Cascade Attention Network. For each deep learning method, we provide a detailed description of model architecture, parameter settings for training and additional stages including pre-processing, data augmentation and post-processing. We provide a comprehensive evaluation for each method. All the methods required a data augmentation stage to increase the number of images available for training and a post-processing stage to remove false positives. The best performance was obtained from Deformable Convolution, a variant of Faster R-CNN, with a mean average precision (mAP) of 0.6940 and an F1-Score of 0.7434. Finally, we demonstrate that the ensemble method based on different deep learning methods can enhance the F1-Score but not the mAP.

Assuntos

Aprendizado Profundo , Diabetes Mellitus , Pé Diabético , Algoritmos , Pé Diabético/diagnóstico , Humanos , Projetos de Pesquisa

The DFUC 2020 Dataset: Analysis Towards Diabetic Foot Ulcer Detection.

Cassidy, Bill; Reeves, Neil D; Pappachan, Joseph M; Gillespie, David; O'Shea, Claire; Rajbhandari, Satyan; Maiya, Arun G; Frank, Eibe; Boulton, Andrew Jm; Armstrong, David G; Najafi, Bijan; Wu, Justina; Kochhar, Rupinder Singh; Yap, Moi Hoon.

touchREV Endocrinol ; 17(1): 5-11, 2021 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-35118441

RESUMO

Every 20 seconds a limb is amputated somewhere in the world due to diabetes. This is a global health problem that requires a global solution. The International Conference on Medical Image Computing and Computer Assisted Intervention challenge, which concerns the automated detection of diabetic foot ulcers (DFUs) using machine learning techniques, will accelerate the development of innovative healthcare technology to address this unmet medical need. In an effort to improve patient care and reduce the strain on healthcare systems, recent research has focused on the creation of cloud-based detection algorithms. These can be consumed as a service by a mobile app that patients (or a carer, partner or family member) could use themselves at home to monitor their condition and to detect the appearance of a DFU. Collaborative work between Manchester Metropolitan University, Lancashire Teaching Hospitals and the Manchester University NHS Foundation Trust has created a repository of 4,000 DFU images for the purpose of supporting research toward more advanced methods of DFU detection. This paper presents a dataset description and analysis, assessment methods, benchmark algorithms and initial evaluation results. It facilitates the challenge by providing useful insights into state-of-the-art and ongoing research.

Introducing Machine Learning Concepts with WEKA.

Smith, Tony C; Frank, Eibe.

Methods Mol Biol ; 1418: 353-78, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27008023

RESUMO

This chapter presents an introduction to data mining with machine learning. It gives an overview of various types of machine learning, along with some examples. It explains how to download, install, and run the WEKA data mining toolkit on a simple data set, then proceeds to explain how one might approach a bioinformatics problem. Finally, it includes a brief summary of machine learning algorithms for other types of data mining problems, and provides suggestions about where to find additional information.

Assuntos

Biologia Computacional/métodos , Mineração de Dados/métodos , Aprendizado de Máquina , Software , Algoritmos , Bases de Dados Genéticas

DNA methylation-associated colonic mucosal immune and defense responses in treatment-naïve pediatric ulcerative colitis.

Harris, R Alan; Nagy-Szakal, Dorottya; Mir, Sabina A V; Frank, Eibe; Szigeti, Reka; Kaplan, Jess L; Bronsky, Jiri; Opekun, Antone; Ferry, George D; Winter, Harland; Kellermayer, Richard.

Epigenetics ; 9(8): 1131-7, 2014 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-24937444

RESUMO

Inflammatory bowel diseases (IBD) are emerging globally, indicating that environmental factors may be important in their pathogenesis. Colonic mucosal epigenetic changes, such as DNA methylation, can occur in response to the environment and have been implicated in IBD pathology. However, mucosal DNA methylation has not been examined in treatment-naïve patients. We studied DNA methylation in untreated, left sided colonic biopsy specimens using the Infinium HumanMethylation450 BeadChip array. We analyzed 22 control (C) patients, 15 untreated Crohn's disease (CD) patients, and 9 untreated ulcerative colitis (UC) patients from two cohorts. Samples obtained at the time of clinical remission from two of the treatment-naïve UC patients were also included into the analysis. UC-specific gene expression was interrogated in a subset of adjacent samples (5 C and 5 UC) using the Affymetrix GeneChip PrimeView Human Gene Expression Arrays. Only treatment-naïve UC separated from control. One-hundred-and-twenty genes with significant expression change in UC (> 2-fold, P<0.05) were associated with differentially methylated regions (DMRs). Epigenetically associated gene expression changes (including gene expression changes in the IFITM1, ITGB2, S100A9, SLPI, SAA1, and STAT3 genes) were linked to colonic mucosal immune and defense responses. These findings underscore the relationship between epigenetic changes and inflammation in pediatric treatment-naïve UC and may have potential etiologic, diagnostic, and therapeutic relevance for IBD.

Assuntos

Colite Ulcerativa/genética , Colo/imunologia , Doença de Crohn/genética , Metilação de DNA/imunologia , Mucosa Intestinal/imunologia , Adolescente , Estudos de Casos e Controles , Criança , Pré-Escolar , Colite Ulcerativa/imunologia , Doença de Crohn/imunologia , Epigênese Genética , Feminino , Expressão Gênica , Humanos , Imunidade nas Mucosas , Masculino , Adulto Jovem

A study of hierarchical and flat classification of proteins.

Zimek, Arthur; Buchwald, Fabian; Frank, Eibe; Kramer, Stefan.

IEEE/ACM Trans Comput Biol Bioinform ; 7(3): 563-71, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-20671325

RESUMO

Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article, we investigate empirically whether this is the case for two such hierarchies. We compare multiclass classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multiclass settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data but not in the case of the protein classification problems. Based on this, we recommend that strong flat multiclass methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area.

Assuntos

Algoritmos , Inteligência Artificial , Proteínas/química , Proteínas/classificação , Metodologias Computacionais , Dados de Sequência Molecular , Reconhecimento Automatizado de Padrão , Dobramento de Proteína

Gene selection from microarray data for cancer classification--a machine learning approach.

Wang, Yu; Tetko, Igor V; Hall, Mark A; Frank, Eibe; Facius, Axel; Mayer, Klaus F X; Mewes, Hans W.

Comput Biol Chem ; 29(1): 37-46, 2005 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-15680584

RESUMO

A DNA microarray can track the expression levels of thousands of genes simultaneously. Previous research has demonstrated that this technology can be useful in the classification of cancers. Cancer microarray data normally contains a small number of samples which have a large number of gene expression levels as features. To select relevant genes involved in different types of cancer remains a challenge. In order to extract useful gene information from cancer microarray data and reduce dimensionality, feature selection algorithms were systematically investigated in this study. Using a correlation-based feature selector combined with machine learning algorithms such as decision trees, naïve Bayes and support vector machines, we show that classification performance at least as good as published results can be obtained on acute leukemia and diffuse large B-cell lymphoma microarray data sets. We also demonstrate that a combined use of different classification and feature selection approaches makes it possible to select relevant genes with high confidence. This is also the first paper which discusses both computational and biological evidence for the involvement of zyxin in leukaemogenesis.

Assuntos

Inteligência Artificial , Leucemia Mieloide Aguda/classificação , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Leucemia-Linfoma Linfoblástico de Células Precursoras/classificação , Algoritmos , Proteínas do Citoesqueleto , Perfilação da Expressão Gênica , Glicoproteínas/genética , Humanos , Leucemia Mieloide Aguda/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Zixina

Data mining in bioinformatics using Weka.

Frank, Eibe; Hall, Mark; Trigg, Len; Holmes, Geoffrey; Witten, Ian H.

Bioinformatics ; 20(15): 2479-81, 2004 Oct 12.

Artigo em Inglês | MEDLINE | ID: mdl-15073010

RESUMO

UNLABELLED: The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data pre-processing methods complemented by graphical user interfaces for data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it. AVAILABILITY: http://www.cs.waikato.ac.nz/ml/weka.

Assuntos

Algoritmos , Inteligência Artificial , Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Armazenamento e Recuperação da Informação/métodos , Interface Usuário-Computador , Processamento de Linguagem Natural , Software

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA