Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
BMC Nephrol ; 24(1): 133, 2023 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-37161365

RESUMO

BACKGROUND: Acute Kidney Injury (AKI) is frequently seen in hospitalized and critically ill patients. Studies have shown that AKI is a risk factor for the development of acute kidney disease (AKD), chronic kidney disease (CKD), and mortality. METHODS: A systematic review is performed on validated risk prediction models for developing poor renal outcomes after AKI scenarios. Medline, EMBASE, Cochrane, and Web of Science were searched for articles that developed or validated a prediction model. Moreover, studies that report prediction models for recovery after AKI also have been included. This review was registered with PROSPERO (CRD42022303197). RESULT: We screened 25,812 potentially relevant abstracts. Among the 149 remaining articles in the first selection, eight met the inclusion criteria. All of the included models developed more than one prediction model with different variables. The models included between 3 and 28 independent variables and c-statistics ranged from 0.55 to 1. CONCLUSION: Few validated risk prediction models targeting the development of renal insufficiency after experiencing AKI have been developed, most of which are based on simple statistical or machine learning models. While some of these models have been externally validated, none of these models are available in a way that can be used or evaluated in a clinical setting.


Assuntos
Injúria Renal Aguda , Insuficiência Renal Crônica , Humanos , Injúria Renal Aguda/diagnóstico , Rim , Aprendizado de Máquina , Fatores de Risco
2.
Behav Res Methods ; 55(4): 2109-2124, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35819719

RESUMO

To obtain more accurate and robust feedback information from the students' assessment outcomes and to communicate it to students and optimize teaching and learning strategies, educational researchers and practitioners must critically reflect on whether the existing methods of data analytics are capable of retrieving the information provided in the database. This study compared and contrasted the prediction performance of an item response theory method, particularly the use of an explanatory item response model (EIRM), and six supervised machine learning (ML) methods for predicting students' item responses in educational assessments, considering student- and item-related background information. Each of seven prediction methods was evaluated through cross-validation approaches under three prediction scenarios: (a) unrealized responses of new students to existing items, (b) unrealized responses of existing students to new items, and (c) missing responses of existing students to existing items. The results of a simulation study and two real-life assessment data examples showed that employing student- and item-related background information in addition to the item response data substantially increases the prediction accuracy for new students or items. We also found that the EIRM is as competitive as the best performing ML methods in predicting the student performance outcomes for the educational assessment datasets.


Assuntos
Avaliação Educacional , Estudantes , Humanos , Simulação por Computador , Escolaridade , Aprendizado de Máquina
3.
BMC Bioinformatics ; 21(1): 49, 2020 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-32033537

RESUMO

BACKGROUND: Computational prediction of drug-target interactions (DTI) is vital for drug discovery. The experimental identification of interactions between drugs and target proteins is very onerous. Modern technologies have mitigated the problem, leveraging the development of new drugs. However, drug development remains extremely expensive and time consuming. Therefore, in silico DTI predictions based on machine learning can alleviate the burdensome task of drug development. Many machine learning approaches have been proposed over the years for DTI prediction. Nevertheless, prediction accuracy and efficiency are persisting problems that still need to be tackled. Here, we propose a new learning method which addresses DTI prediction as a multi-output prediction task by learning ensembles of multi-output bi-clustering trees (eBICT) on reconstructed networks. In our setting, the nodes of a DTI network (drugs and proteins) are represented by features (background information). The interactions between the nodes of a DTI network are modeled as an interaction matrix and compose the output space in our problem. The proposed approach integrates background information from both drug and target protein spaces into the same global network framework. RESULTS: We performed an empirical evaluation, comparing the proposed approach to state of the art DTI prediction methods and demonstrated the effectiveness of the proposed approach in different prediction settings. For evaluation purposes, we used several benchmark datasets that represent drug-protein networks. We show that output space reconstruction can boost the predictive performance of tree-ensemble learning methods, yielding more accurate DTI predictions. CONCLUSIONS: We proposed a new DTI prediction method where bi-clustering trees are built on reconstructed networks. Building tree-ensemble learning models with output space reconstruction leads to superior prediction results, while preserving the advantages of tree-ensembles, such as scalability, interpretability and inductive setting.


Assuntos
Descoberta de Drogas/métodos , Aprendizado de Máquina , Proteínas/efeitos dos fármacos , Análise por Conglomerados , Simulação por Computador , Desenvolvimento de Medicamentos
4.
BMC Bioinformatics ; 20(1): 525, 2019 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-31660848

RESUMO

BACKGROUND: Network inference is crucial for biomedicine and systems biology. Biological entities and their associations are often modeled as interaction networks. Examples include drug protein interaction or gene regulatory networks. Studying and elucidating such networks can lead to the comprehension of complex biological processes. However, usually we have only partial knowledge of those networks and the experimental identification of all the existing associations between biological entities is very time consuming and particularly expensive. Many computational approaches have been proposed over the years for network inference, nonetheless, efficiency and accuracy are still persisting open problems. Here, we propose bi-clustering tree ensembles as a new machine learning method for network inference, extending the traditional tree-ensemble models to the global network setting. The proposed approach addresses the network inference problem as a multi-label classification task. More specifically, the nodes of a network (e.g., drugs or proteins in a drug-protein interaction network) are modelled as samples described by features (e.g., chemical structure similarities or protein sequence similarities). The labels in our setting represent the presence or absence of links connecting the nodes of the interaction network (e.g., drug-protein interactions in a drug-protein interaction network). RESULTS: We extended traditional tree-ensemble methods, such as extremely randomized trees (ERT) and random forests (RF) to ensembles of bi-clustering trees, integrating background information from both node sets of a heterogeneous network into the same learning framework. We performed an empirical evaluation, comparing the proposed approach to currently used tree-ensemble based approaches as well as other approaches from the literature. We demonstrated the effectiveness of our approach in different interaction prediction (network inference) settings. For evaluation purposes, we used several benchmark datasets that represent drug-protein and gene regulatory networks. We also applied our proposed method to two versions of a chemical-protein association network extracted from the STITCH database, demonstrating the potential of our model in predicting non-reported interactions. CONCLUSIONS: Bi-clustering trees outperform existing tree-based strategies as well as machine learning methods based on other algorithms. Since our approach is based on tree-ensembles it inherits the advantages of tree-ensemble learning, such as handling of missing values, scalability and interpretability.


Assuntos
Análise por Conglomerados , Algoritmos , Bases de Dados Factuais , Redes Reguladoras de Genes , Aprendizado de Máquina , Mapas de Interação de Proteínas , Proteínas/metabolismo
5.
BMC Bioinformatics ; 20(1): 485, 2019 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-31547800

RESUMO

BACKGROUND: A massive amount of proteomic data is generated on a daily basis, nonetheless annotating all sequences is costly and often unfeasible. As a countermeasure, machine learning methods have been used to automatically annotate new protein functions. More specifically, many studies have investigated hierarchical multi-label classification (HMC) methods to predict annotations, using the Functional Catalogue (FunCat) or Gene Ontology (GO) label hierarchies. Most of these studies employed benchmark datasets created more than a decade ago, and thus train their models on outdated information. In this work, we provide an updated version of these datasets. By querying recent versions of FunCat and GO yeast annotations, we provide 24 new datasets in total. We compare four HMC methods, providing baseline results for the new datasets. Furthermore, we also evaluate whether the predictive models are able to discover new or wrong annotations, by training them on the old data and evaluating their results against the most recent information. RESULTS: The results demonstrated that the method based on predictive clustering trees, Clus-Ensemble, proposed in 2008, achieved superior results compared to more recent methods on the standard evaluation task. For the discovery of new knowledge, Clus-Ensemble performed better when discovering new annotations in the FunCat taxonomy, whereas hierarchical multi-label classification with genetic algorithm (HMC-GA), a method based on genetic algorithms, was overall superior when detecting annotations that were removed. In the GO datasets, Clus-Ensemble once again had the upper hand when discovering new annotations, HMC-GA performed better for detecting removed annotations. However, in this evaluation, there were less significant differences among the methods. CONCLUSIONS: The experiments have showed that protein function prediction is a very challenging task which should be further investigated. We believe that the baseline results associated with the updated datasets provided in this work should be considered as guidelines for future studies, nonetheless the old versions of the datasets should not be disregarded since other tasks in machine learning could benefit from them.


Assuntos
Aprendizado de Máquina , Anotação de Sequência Molecular/métodos , Proteômica/métodos , Análise por Conglomerados , Eucariotos/metabolismo , Ontologia Genética , Humanos
6.
PLoS Comput Biol ; 14(4): e1006097, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-29684010

RESUMO

Transposable elements (TEs) are repetitive nucleotide sequences that make up a large portion of eukaryotic genomes. They can move and duplicate within a genome, increasing genome size and contributing to genetic diversity within and across species. Accurate identification and classification of TEs present in a genome is an important step towards understanding their effects on genes and their role in genome evolution. We introduce TE-Learner, a framework based on machine learning that automatically identifies TEs in a given genome and assigns a classification to them. We present an implementation of our framework towards LTR retrotransposons, a particular type of TEs characterized by having long terminal repeats (LTRs) at their boundaries. We evaluate the predictive performance of our framework on the well-annotated genomes of Drosophila melanogaster and Arabidopsis thaliana and we compare our results for three LTR retrotransposon superfamilies with the results of three widely used methods for TE identification or classification: RepeatMasker, Censor and LtrDigest. In contrast to these methods, TE-Learner is the first to incorporate machine learning techniques, outperforming these methods in terms of predictive performance, while able to learn models and make predictions efficiently. Moreover, we show that our method was able to identify TEs that none of the above method could find, and we investigated TE-Learner's predictions which did not correspond to an official annotation. It turns out that many of these predictions are in fact strongly homologous to a known TE.


Assuntos
Aprendizado de Máquina , Retroelementos , Sequências Repetidas Terminais , Animais , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Biologia Computacional , Sequência Conservada , DNA de Plantas/genética , Árvores de Decisões , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Evolução Molecular , Genoma de Inseto , Genoma de Planta , Software
7.
J Biomed Inform ; 85: 40-48, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-30012356

RESUMO

The volume of biomedical data available to the machine learning community grows very rapidly. A rational question is how informative these data really are or how discriminant the features describing the data instances are. Several biomedical datasets suffer from lack of variance in the instance representation, or even worse, contain instances with identical features and different class labels. Indisputably, this directly affects the performance of machine learning algorithms, as well as the ability to interpret their results. In this article, we emphasize on the aforementioned problem and propose a target-informed feature induction method based on tree ensemble learning. The method brings more variance into the data representation, thereby potentially increasing predictive performance of a learner applied to the induced features. The contribution of this article is twofold. Firstly, a problem affecting the quality of biomedical data is highlighted, and secondly, a method to handle that problem is proposed. The efficiency of the presented approach is validated on multi-target prediction tasks. The obtained results indicate that the proposed approach is able to boost the discrimination between the data instances and increase the predictive performance.


Assuntos
Análise por Conglomerados , Mineração de Dados/métodos , Árvores de Decisões , Aprendizado de Máquina , Algoritmos , Biologia Computacional , Bases de Dados Factuais/estatística & dados numéricos , Escherichia coli/genética , Escherichia coli/metabolismo , Redes Reguladoras de Genes , Humanos , Redes e Vias Metabólicas , Mapas de Interação de Proteínas , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
8.
Bioinformatics ; 31(11): 1836-8, 2015 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-25638811

RESUMO

Profile hidden Markov models (profile HMMs) are known to efficiently predict whether an amino acid (AA) sequence belongs to a specific protein family. Profile HMMs can also be used to search for protein domains in genome sequences. In this case, HMMs are typically learned from AA sequences and then used to search on the six-frame translation of nucleotide (NT) sequences. However, this approach demands additional processing of the original data and search results. Here, we propose an alternative and more direct method which converts an AA alignment into an NT one, after which an NT-based HMM is trained to be applied directly on a genome.


Assuntos
Genômica/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Animais , Bactérias/enzimologia , Bactérias/genética , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Cadeias de Markov , Monoéster Fosfórico Hidrolases/química , Monoéster Fosfórico Hidrolases/genética , Estrutura Terciária de Proteína , Ribonuclease H/química
9.
Cytometry A ; 89(1): 22-9, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26243673

RESUMO

Advances in flow cytometry bioinformatics have resulted in a wide variety of clustering, classification and visualization techniques. To objectively evaluate the performance of such methods, common benchmarks such as the FlowCAP initiative have proven to be of great value. In this work, we report on a novel method, FloReMi, which was developed to tackle the most recent FlowCAP IV challenge. This challenge was formulated as a survival modeling problem, where participants were expected to design a model to predict the time until progression to AIDS for HIV patients. It is known that variability in progression rate cannot be fully predicted by simple CD4(+) T cell counts. However, it is hypothesized that the immunopathogenesis established early in HIV already indicates the course of future disease. Adequately estimating the progression rate of HIV patients is crucial in their treatment. Using an automated pipeline to preprocess the data, and subsequently identify and select informative cell subsets, a survival regression method based on random survival forests was built, which obtained the best results of all submitted approaches to the FlowCAP IV challenge.


Assuntos
Síndrome da Imunodeficiência Adquirida/patologia , Benchmarking , Biologia Computacional/métodos , Progressão da Doença , Citometria de Fluxo/métodos , Síndrome da Imunodeficiência Adquirida/diagnóstico , Síndrome da Imunodeficiência Adquirida/mortalidade , Algoritmos , Análise por Conglomerados , Interpretação Estatística de Dados , Soropositividade para HIV , Humanos , Aprendizado de Máquina , Análise de Regressão , Coloração e Rotulagem , Linfócitos T/citologia
10.
Cytometry A ; 89(1): 16-21, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26447924

RESUMO

The Flow Cytometry: Critical Assessment of Population Identification Methods (FlowCAP) challenges were established to compare the performance of computational methods for identifying cell populations in multidimensional flow cytometry data. Here we report the results of FlowCAP-IV where algorithms from seven different research groups predicted the time to progression to AIDS among a cohort of 384 HIV+ subjects, using antigen-stimulated peripheral blood mononuclear cell (PBMC) samples analyzed with a 14-color staining panel. Two approaches (FlowReMi.1 and flowDensity-flowType-RchyOptimyx) provided statistically significant predictive value in the blinded test set. Manual validation of submitted results indicated that unbiased analysis of single cell phenotypes could reveal unexpected cell types that correlated with outcomes of interest in high dimensional flow cytometry datasets.


Assuntos
Síndrome da Imunodeficiência Adquirida/patologia , Benchmarking , Biologia Computacional/métodos , Progressão da Doença , Citometria de Fluxo/métodos , Linfócitos T/citologia , Síndrome da Imunodeficiência Adquirida/diagnóstico , Algoritmos , Interpretação Estatística de Dados , Soropositividade para HIV , Humanos , Coloração e Rotulagem
11.
Artif Intell Med ; 150: 102817, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38553157

RESUMO

Intubation for mechanical ventilation (MV) is one of the most common high-risk procedures performed in Intensive Care Units (ICUs). Early prediction of intubation may have a positive impact by providing timely alerts to clinicians and consequently avoiding high-risk late intubations. In this work, we propose a new machine learning method to predict the time to intubation during the first five days of ICU admission, based on the concept of cure survival models. Our approach combines classification and survival analysis, to effectively accommodate the fraction of patients not at risk of intubation, and provide a better estimate of time to intubation, for patients at risk. We tested our approach and compared it to other predictive models on a dataset collected from a secondary care hospital (AZ Groeninge, Kortrijk, Belgium) from 2015 to 2021, consisting of 3425 ICU stays. Furthermore, we utilised SHAP for feature importance analysis, extracting key insights into the relative significance of variables such as vital signs, blood gases, and patient characteristics in predicting intubation in ICU settings. The results corroborate that our approach improves the prediction of time to intubation in critically ill patients, by using routinely collected data within the first hours of admission in the ICU. Early warning of the need for intubation may be used to help clinicians predict the risk of intubation and rank patients according to their expected time to intubation.


Assuntos
Cuidados Críticos , Hospitalização , Humanos , Unidades de Terapia Intensiva , Intubação , Aprendizado de Máquina , Estado Terminal , Estudos Retrospectivos
12.
Nat Commun ; 15(1): 5013, 2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38866782

RESUMO

Multiple sclerosis (MS) is characterized by heterogeneity in disease course and prediction of long-term outcome remains a major challenge. Here, we investigate five myeloid markers - CHIT1, CHI3L1, sTREM2, GPNMB and CCL18 - in the cerebrospinal fluid (CSF) at diagnostic lumbar puncture in a longitudinal cohort of 192 MS patients. Through mixed-effects and machine learning models, we show that CHIT1 is a robust predictor for faster disability progression. Integrative analysis of 11 CSF and 26 central nervous system (CNS) parenchyma single-cell/nucleus RNA sequencing samples reveals CHIT1 to be predominantly expressed by microglia located in active MS lesions and enriched for lipid metabolism pathways. Furthermore, we find CHIT1 expression to accompany the transition from a homeostatic towards a more activated, MS-associated cell state in microglia. Neuropathological evaluation in post-mortem tissue from 12 MS patients confirms CHIT1 production by lipid-laden phagocytes in actively demyelinating lesions, already in early disease stages. Altogether, we provide a rationale for CHIT1 as an early biomarker for faster disability progression in MS.


Assuntos
Biomarcadores , Progressão da Doença , Microglia , Esclerose Múltipla , Humanos , Microglia/metabolismo , Microglia/patologia , Esclerose Múltipla/líquido cefalorraquidiano , Esclerose Múltipla/genética , Esclerose Múltipla/metabolismo , Esclerose Múltipla/diagnóstico , Esclerose Múltipla/patologia , Biomarcadores/líquido cefalorraquidiano , Biomarcadores/metabolismo , Feminino , Masculino , Adulto , Pessoa de Meia-Idade , Hexosaminidases/metabolismo , Hexosaminidases/genética , Hexosaminidases/líquido cefalorraquidiano , Estudos Longitudinais , Proteína 1 Semelhante à Quitinase-3/líquido cefalorraquidiano , Proteína 1 Semelhante à Quitinase-3/metabolismo , Proteína 1 Semelhante à Quitinase-3/genética
13.
Comput Methods Programs Biomed ; 250: 108166, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38614026

RESUMO

BACKGROUND AND OBJECTIVE: Critically ill children may suffer from impaired neurocognitive functions years after ICU (intensive care unit) discharge. To assess neurocognitive functions, these children are subjected to a fixed sequence of tests. Undergoing all tests is, however, arduous for former pediatric ICU patients, resulting in interrupted evaluations where several neurocognitive deficiencies remain undetected. As a solution, we propose using machine learning to predict the optimal order of tests for each child, reducing the number of tests required to identify the most severe neurocognitive deficiencies. METHODS: We have compared the current clinical approach against several machine learning methods, mainly multi-target regression and label ranking methods. We have also proposed a new method that builds several multi-target predictive models and combines the outputs into a ranking that prioritizes the worse neurocognitive outcomes. We used data available at discharge, from children who participated in the PEPaNIC-RCT trial (ClinicalTrials.gov-NCT01536275), as well as data from a 2-year follow-up study. The institutional review boards at each participating site have also approved this follow-up study (ML8052; NL49708.078; Pro00038098). RESULTS: Our proposed method managed to outperform other machine learning methods and also the current clinical practice. Precisely, our method reaches approximately 80% precision when considering top-4 outcomes, in comparison to 65% and 78% obtained by the current clinical practice and the state-of-the-art method in label ranking, respectively. CONCLUSIONS: Our experiments demonstrated that machine learning can be competitive or even superior to the current testing order employed in clinical practice, suggesting that our model can be used to severely reduce the number of tests necessary for each child. Moreover, the results indicate that possible long-term adverse outcomes are already predictable as early as at ICU discharge. Thus, our work can be seen as the first step to allow more personalized follow-up after ICU discharge leading to preventive care rather than curative.


Assuntos
Unidades de Terapia Intensiva Pediátrica , Aprendizado de Máquina , Humanos , Criança , Masculino , Feminino , Pré-Escolar , Estado Terminal , Seguimentos , Alta do Paciente
14.
Comput Biol Med ; 152: 106423, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36529023

RESUMO

With the development of new sequencing technologies, availability of genomic data has grown exponentially. Over the past decade, numerous studies have used genomic data to identify associations between genes and biological functions. While these studies have shown success in annotating genes with functions, they often assume that genes are completely annotated and fail to take into account that datasets are sparse and noisy. This work proposes a method to detect missing annotations in the context of hierarchical multi-label classification. More precisely, our method exploits the relations of functions, represented as a hierarchy, by computing probabilities based on the paths of functions in the hierarchy. By performing several experiments on a variety of rice (Oriza sativa Japonica), we showcase that the proposed method accurately detects missing annotations and yields superior results when compared to state-of-art methods from the literature.


Assuntos
Genômica , Ontologia Genética , Anotação de Sequência Molecular , Probabilidade
15.
Sci Rep ; 13(1): 9864, 2023 06 18.
Artigo em Inglês | MEDLINE | ID: mdl-37331979

RESUMO

Acute Kidney Injury (AKI) is a sudden episode of kidney failure that is frequently seen in critically ill patients. AKI has been linked to chronic kidney disease (CKD) and mortality. We developed machine learning-based prediction models to predict outcomes following AKI stage 3 events in the intensive care unit. We conducted a prospective observational study that used the medical records of ICU patients diagnosed with AKI stage 3. A random forest algorithm was used to develop two models that can predict patients who will progress to CKD after three and six months of experiencing AKI stage 3. To predict mortality, two survival prediction models have been presented using random survival forests and survival XGBoost. We evaluated established CKD prediction models using AUCROC, and AUPR curves and compared them with the baseline logistic regression models. The mortality prediction models were evaluated with an external test set, and the C-indices were compared to baseline COXPH. We included 101 critically ill patients who experienced AKI stage 3. To increase the training set for the mortality prediction task, an unlabeled dataset has been added. The RF (AUPR: 0.895 and 0.848) and XGBoost (c-index: 0.8248) models have a better performance than the baseline models in predicting CKD and mortality, respectively Machine learning-based models can assist clinicians in making clinical decisions regarding critically ill patients with severe AKI who are likely to develop CKD following discharge. Additionally, we have shown better performance when unlabeled data are incorporated into the survival analysis task.


Assuntos
Injúria Renal Aguda , Insuficiência Renal Crônica , Humanos , Estado Terminal , Estudos Prospectivos , Injúria Renal Aguda/diagnóstico , Aprendizado de Máquina
16.
IEEE Trans Neural Netw Learn Syst ; 34(10): 6755-6767, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-36269923

RESUMO

Data production has followed an increased growth in the last years, to the point that traditional or batch machine-learning (ML) algorithms cannot cope with the sheer volume of generated data. Stream or online ML presents itself as a viable solution to deal with the dynamic nature of streaming data. Besides coping with the inherent challenges of streaming data, online ML solutions must be accurate, fast, and bear a reduced memory footprint. We propose a new decision tree-based ensemble algorithm for online ML regression named online extra trees (OXT). Our proposal takes inspiration from the batch learning extra trees (XT) algorithm, a popular and faster alternative to random forest (RF). While speed and memory costs might not be a central concern in most batch applications, they become crucial in data stream data learning. Our proposal combines subbagging (sampling without replacement), random tree split points, and model trees to deliver competitive prediction errors and reduced computational costs. Throughout an extensive experimental evaluation comprising 22 real-world and synthetic datasets, we compare OXT against the state-of-the-art adaptive RF (ARF) and other incremental regressors. OXT is generally more accurate than its competitors while running significantly faster than ARF and expending significantly less memory.


Assuntos
Algoritmos , Redes Neurais de Computação , Aprendizado de Máquina , Algoritmo Florestas Aleatórias
17.
Bioinformatics ; 27(9): 1231-8, 2011 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-21372086

RESUMO

MOTIVATION: Identification of conserved motifs in biological sequences is crucial to unveil common shared functions. Many tools exist for motif identification, including some that allow degenerate positions with multiple possible nucleotides or amino acids. Most efficient methods available today search conserved motifs in a set of sequences, but do not check for their specificity regarding to a set of negative sequences. RESULTS: We present a tool to identify degenerate motifs, based on a given classification of amino acids according to their physico-chemical properties. It returns the top K motifs that are most frequent in a positive set of sequences involved in a biological process of interest, and absent from a negative set. Thus, our method discovers discriminative motifs in biological sequences that may be used to identify new sequences involved in the same process. We used this tool to identify candidate effector proteins secreted into plant tissues by the root knot nematode Meloidogyne incognita. Our tool identified a series of motifs specifically present in a positive set of known effectors while totally absent from a negative set of evolutionarily conserved housekeeping proteins. Scanning the proteome of M. incognita, we detected 2579 proteins that contain these specific motifs and can be considered as new putative effectors. AVAILABILITY AND IMPLEMENTATION: The motif discovery tool and the proteins used in the experiments are available at http://dtai.cs.kuleuven.be/ml/systems/merci.


Assuntos
Algoritmos , Motivos de Aminoácidos , Biologia Computacional/métodos , Sequência Conservada , Proteínas de Helminto/química , Sequência de Aminoácidos , Animais , Análise Discriminante , Proteínas de Helminto/classificação , Dados de Sequência Molecular , Proteoma/análise , Alinhamento de Sequência , Tylenchoidea/química
18.
Comput Biol Med ; 141: 105001, 2022 02.
Artigo em Inglês | MEDLINE | ID: mdl-34782112

RESUMO

Many clinical studies follow patients over time and record the time until the occurrence of an event of interest (e.g., recovery, death, …). When patients drop out of the study or when their event did not happen before the study ended, the collected dataset is said to contain censored observations. Given the rise of personalized medicine, clinicians are often interested in accurate risk prediction models that predict, for unseen patients, a survival profile, including the expected time until the event. Survival analysis methods are used to detect associations or compare subpopulations of patients in this context. In this article, we propose to cast the time-to-event prediction task as a multi-target regression task, with censored observations modeled as partially labeled examples. We then apply semi-supervised learning to the resulting data representation. More specifically, we use semi-supervised predictive clustering trees and ensembles thereof. Empirical results over eleven real-life datasets demonstrate superior or equivalent predictive performance of the proposed approach as compared to three competitor methods. Moreover, smaller models are obtained compared to random survival forests, another tree ensemble method. Finally, we illustrate the informative feature selection mechanism of our method, by interpreting the splits induced by a single tree model when predicting survival for amyotrophic lateral sclerosis patients.


Assuntos
Aprendizado de Máquina Supervisionado , Análise por Conglomerados , Humanos , Análise Multivariada , Análise de Sobrevida
19.
J Clin Med ; 11(24)2022 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-36555881

RESUMO

Background: Acute kidney injury (AKI) in critically ill patients is associated with a significant increase in mortality as well as long-term renal dysfunction and chronic kidney disease (CKD). Serum creatinine (SCr), the most widely used biomarker to evaluate kidney function, does not always accurately predict the glomerular filtration rate (GFR), since it is affected by some non-GFR determinants such as muscle mass and recent meat ingestion. Researchers and clinicians have gained interest in cystatin C (CysC), another biomarker of kidney function. The study objective was to compare GFR estimation using SCr and CysC in detecting CKD over a 1-year follow-up after an AKI stage-3 event in the ICU, as well as to analyze the association between eGFR (using SCr and CysC) and mortality after the AKI event. Method: This prospective observational study used the medical records of ICU patients diagnosed with AKI stage 3. SCr and CysC were measured twice during the ICU stay and four times following diagnosis of AKI. The eGFR was calculated using the EKFC equation for SCr and FAS equation for CysC in order to check the prevalence of CKD (defined as eGFR < 60 mL/min/1.73 m2). Results: The study enrolled 101 patients, 36.6% of whom were female, with a median age of 74 years (30−92), and a median length of stay of 14.5 days in intensive care. A significant difference was observed in the estimation of GFR when comparing formulas based on SCrand CysC, resulting in large differences in the prediction of CKD. Three months after the AKI event, eGFRCysC < 25 mL/min/1.73 m2 was a predictive factor of mortality later on; however, this was not the case for eGFRSCr. Conclusion: The incidence of CKD was highly discrepant with eGFRCysC versus eGFRSCr during the follow-up period. CysC detects more CKD events compared to SCr in the follow-up phase and eGFRCysC is a predictor for mortality in follow-up but not eGFRSCr. Determining the proper marker to estimate GFR in the post-ICU period in AKI stage-3 populations needs further study to improve risk stratification.

20.
J Heart Lung Transplant ; 41(7): 928-936, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35568604

RESUMO

BACKGROUND: Outcome prediction following heart transplant is critical to explaining risks and benefits to patients and decision-making when considering potential organ offers. Given the large number of potential variables to be considered, this task may be most efficiently performed using machine learning (ML). We trained and tested ML and statistical algorithms to predict outcomes following cardiac transplant using the United Network of Organ Sharing (UNOS) database. METHODS: We included 59,590 adult and 8,349 pediatric patients enrolled in the UNOS database between January 1994 and December 2016 who underwent cardiac transplantation. We evaluated 3 classification and 3 survival methods. Algorithms were evaluated using shuffled 10-fold cross-validation (CV) and rolling CV. Predictive performance for 1 year and 90 days all-cause mortality was characterized using the area under the receiver-operating characteristic curve (AUC) with 95% confidence interval. RESULTS: In total, 8,394 (12.4%) patients died within 1 year of transplant. For predicting 1-year survival, using the shuffled 10-fold CV, Random Forest achieved the highest AUC (0.893; 0.889-0.897) followed by XGBoost and logistic regression. In the rolling CV, prediction performance was more modest and comparable among the models with XGBoost and Logistic regression achieving the highest AUC 0.657 (0.647-0.667) and 0.641(0.631-0.651), respectively. There was a trend toward higher prediction performance in pediatric patients. CONCLUSIONS: Our study suggests that ML and statistical models can be used to predict mortality post-transplant, but based on the results from rolling CV, the overall prediction performance will be limited by temporal shifts inpatient and donor selection.


Assuntos
Transplante de Coração , Aprendizado de Máquina , Adulto , Algoritmos , Criança , Bases de Dados Factuais , Humanos , Curva ROC
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa