Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

Biclustering data analysis: a comprehensive survey.

Castanho, Eduardo N; Aidos, Helena; Madeira, Sara C.

Brief Bioinform ; 25(4)2024 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-39007596

RESUMO

Biclustering, the simultaneous clustering of rows and columns of a data matrix, has proved its effectiveness in bioinformatics due to its capacity to produce local instead of global models, evolving from a key technique used in gene expression data analysis into one of the most used approaches for pattern discovery and identification of biological modules, used in both descriptive and predictive learning tasks. This survey presents a comprehensive overview of biclustering. It proposes an updated taxonomy for its fundamental components (bicluster, biclustering solution, biclustering algorithms, and evaluation measures) and applications. We unify scattered concepts in the literature with new definitions to accommodate the diversity of data types (such as tabular, network, and time series data) and the specificities of biological and biomedical data domains. We further propose a pipeline for biclustering data analysis and discuss practical aspects of incorporating biclustering in real-world applications. We highlight prominent application domains, particularly in bioinformatics, and identify typical biclusters to illustrate the analysis output. Moreover, we discuss important aspects to consider when choosing, applying, and evaluating a biclustering algorithm. We also relate biclustering with other data mining tasks (clustering, pattern mining, classification, triclustering, N-way clustering, and graph mining). Thus, it provides theoretical and practical guidance on biclustering data analysis, demonstrating its potential to uncover actionable insights from complex datasets.

Assuntos

Algoritmos , Biologia Computacional , Análise por Conglomerados , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos

2.

G-bic: generating synthetic benchmarks for biclustering.

Castanho, Eduardo N; Lobo, João P; Henriques, Rui; Madeira, Sara C.

BMC Bioinformatics ; 24(1): 457, 2023 Dec 06.

Artigo em Inglês | MEDLINE | ID: mdl-38053078

RESUMO

BACKGROUND: Biclustering is increasingly used in biomedical data analysis, recommendation tasks, and text mining domains, with hundreds of biclustering algorithms proposed. When assessing the performance of these algorithms, more than real datasets are required as they do not offer a solid ground truth. Synthetic data surpass this limitation by producing reference solutions to be compared with the found patterns. However, generating synthetic datasets is challenging since the generated data must ensure reproducibility, pattern representativity, and real data resemblance. RESULTS: We propose G-Bic, a dataset generator conceived to produce synthetic benchmarks for the normative assessment of biclustering algorithms. Beyond expanding on aspects of pattern coherence, data quality, and positioning properties, it further handles specificities related to mixed-type datasets and time-series data.G-Bic has the flexibility to replicate real data regularities from diverse domains. We provide the default configurations to generate reproducible benchmarks to evaluate and compare diverse aspects of biclustering algorithms. Additionally, we discuss empirical strategies to simulate the properties of real data. CONCLUSION: G-Bic is a parametrizable generator for biclustering analysis, offering a solid means to assess biclustering solutions according to internal and external metrics robustly.

Assuntos

Benchmarking , Perfilação da Expressão Gênica , Reprodutibilidade dos Testes , Análise por Conglomerados , Algoritmos

3.

Biclustering fMRI time series: a comparative study.

Castanho, Eduardo N; Aidos, Helena; Madeira, Sara C.

BMC Bioinformatics ; 23(1): 192, 2022 May 23.

Artigo em Inglês | MEDLINE | ID: mdl-35606701

RESUMO

BACKGROUND: The effectiveness of biclustering, simultaneous clustering of rows and columns in a data matrix, was shown in gene expression data analysis. Several researchers recognize its potentialities in other research areas. Nevertheless, the last two decades have witnessed the development of a significant number of biclustering algorithms targeting gene expression data analysis and a lack of consistent studies exploring the capacities of biclustering outside this traditional application domain. RESULTS: This work evaluates the potential use of biclustering in fMRI time series data, targeting the Region × Time dimensions by comparing seven state-in-the-art biclustering and three traditional clustering algorithms on artificial and real data. It further proposes a methodology for biclustering evaluation beyond gene expression data analysis. The results discuss the use of different search strategies in both artificial and real fMRI time series showed the superiority of exhaustive biclustering approaches, obtaining the most homogeneous biclusters. However, their high computational costs are a challenge, and further work is needed for the efficient use of biclustering in fMRI data analysis. CONCLUSIONS: This work pinpoints avenues for the use of biclustering in spatio-temporal data analysis, in particular neurosciences applications. The proposed evaluation methodology showed evidence of the effectiveness of biclustering in finding local patterns in fMRI time series data. Further work is needed regarding scalability to promote the application in real scenarios.

Assuntos

Perfilação da Expressão Gênica , Imageamento por Ressonância Magnética , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Fatores de Tempo

4.

Dynamic Bayesian networks for stratification of disease progression in amyotrophic lateral sclerosis.

Gromicho, Marta; Leão, Tiago; Oliveira Santos, Miguel; Pinto, Susana; Carvalho, Alexandra M; Madeira, Sara C; De Carvalho, Mamede.

Eur J Neurol ; 29(8): 2201-2210, 2022 08.

Artigo em Inglês | MEDLINE | ID: mdl-35426195

RESUMO

BACKGROUND AND PURPOSE: Progression rate is quite variable in amyotrophic lateral sclerosis (ALS); thus, tools for profiling disease progression are essential for timely interventions. The objective was to apply dynamic Bayesian networks (DBNs) to establish the influence of clinical and demographic variables on disease progression rate. METHODS: In all, 664 ALS patients from our database were included stratified into slow (SP), average (AP) and fast (FP) progressors, according to the Amyotrophic Lateral Sclerosis Functional Rating Scale Revised (ALSFRS-R) rate of decay. The sdtDBN framework was used, a machine learning model which learnt optimal DBNs with both static (gender, age at onset, onset region, body mass index, disease duration at entry, familial history, revised El Escorial criteria and C9orf72) and dynamic (ALSFRS-R scores and sub-scores, forced vital capacity, maximum inspiratory pressure, maximum expiratory pressure and phrenic amplitude) variables. RESULTS: Disease duration and body mass index at diagnosis are the foremost influences amongst static variables. Disease duration is the variable that better discriminates the three groups. Maximum expiratory pressure is the respiratory test with prevalent influence on all groups. ALSFRS score has a higher influence on FP, but lower on AP and SP. The bulbar sub-score has considerable influence on FP but limited on SP. Limb function has a more decisive influence on AP and SP. The respiratory sub-score has little influence in all groups. ALSFRS-R questions 1 (speech) and 9 (climbing stairs) are the most influential in FP and SP, respectively. CONCLUSIONS: The sdtDBN analysis identified five variables, easily obtained during clinical evaluation, which are the most influential for each progression group. This insightful information may help to improve prognosis and care.

Assuntos

Esclerose Lateral Amiotrófica , Esclerose Lateral Amiotrófica/diagnóstico , Teorema de Bayes , Progressão da Doença , Humanos , Capacidade Vital

5.

Learning prognostic models using a mixture of biclustering and triclustering: Predicting the need for non-invasive ventilation in Amyotrophic Lateral Sclerosis.

Soares, Diogo F; Henriques, Rui; Gromicho, Marta; de Carvalho, Mamede; Madeira, Sara C.

J Biomed Inform ; 134: 104172, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-36055638

RESUMO

Longitudinal cohort studies to study disease progression generally combine temporal features produced under periodic assessments (clinical follow-up) with static features associated with single-time assessments, genetic, psychophysiological, and demographic profiles. Subspace clustering, including biclustering and triclustering stances, enables the discovery of local and discriminative patterns from such multidimensional cohort data. These patterns, highly interpretable, are relevant to identifying groups of patients with similar traits or progression patterns. Despite their potential, their use for improving predictive tasks in clinical domains remains unexplored. In this work, we propose to learn predictive models from static and temporal data using discriminative patterns, obtained via biclustering and triclustering, as features within a state-of-the-art classifier, thus enhancing model interpretation. triCluster is extended to find time-contiguous triclusters in temporal data (temporal patterns) and a biclustering algorithm to discover coherent patterns in static data. The transformed data space, composed of bicluster and tricluster features, capture local and cross-variable associations with discriminative power, yielding unique statistical properties of interest. As a case study, we applied our methodology to follow-up data from Portuguese patients with Amyotrophic Lateral Sclerosis (ALS) to predict the need for non-invasive ventilation (NIV) since the last appointment. The results showed that, in general, our methodology outperformed baseline results using the original features. Furthermore, the bicluster/tricluster-based patterns used by the classifier can be used by clinicians to understand the models by highlighting relevant prognostic patterns.

Assuntos

Esclerose Lateral Amiotrófica , Ventilação não Invasiva , Esclerose Lateral Amiotrófica/diagnóstico , Esclerose Lateral Amiotrófica/terapia , Análise por Conglomerados , Humanos , Estudos Longitudinais , Prognóstico

6.

G-Tric: generating three-way synthetic datasets with triclustering solutions.

Lobo, João; Henriques, Rui; Madeira, Sara C.

BMC Bioinformatics ; 22(1): 16, 2021 Jan 07.

Artigo em Inglês | MEDLINE | ID: mdl-33413095

RESUMO

BACKGROUND: Three-way data started to gain popularity due to their increasing capacity to describe inherently multivariate and temporal events, such as biological responses, social interactions along time, urban dynamics, or complex geophysical phenomena. Triclustering, subspace clustering of three-way data, enables the discovery of patterns corresponding to data subspaces (triclusters) with values correlated across the three dimensions (observations [Formula: see text] features [Formula: see text] contexts). With increasing number of algorithms being proposed, effectively comparing them with state-of-the-art algorithms is paramount. These comparisons are usually performed using real data, without a known ground-truth, thus limiting the assessments. In this context, we propose a synthetic data generator, G-Tric, allowing the creation of synthetic datasets with configurable properties and the possibility to plant triclusters. The generator is prepared to create datasets resembling real 3-way data from biomedical and social data domains, with the additional advantage of further providing the ground truth (triclustering solution) as output. RESULTS: G-Tric can replicate real-world datasets and create new ones that match researchers needs across several properties, including data type (numeric or symbolic), dimensions, and background distribution. Users can tune the patterns and structure that characterize the planted triclusters (subspaces) and how they interact (overlapping). Data quality can also be controlled, by defining the amount of missing, noise or errors. Furthermore, a benchmark of datasets resembling real data is made available, together with the corresponding triclustering solutions (planted triclusters) and generating parameters. CONCLUSIONS: Triclustering evaluation using G-Tric provides the possibility to combine both intrinsic and extrinsic metrics to compare solutions that produce more reliable analyses. A set of predefined datasets, mimicking widely used three-way data and exploring crucial properties was generated and made available, highlighting G-Tric's potential to advance triclustering state-of-the-art by easing the process of evaluating the quality of new triclustering approaches.

Assuntos

Algoritmos , Análise por Conglomerados , Bases de Dados Factuais , Humanos , Software , Temperatura , Leveduras

7.

Learning dynamic Bayesian networks from time-dependent and time-independent data: Unraveling disease progression in Amyotrophic Lateral Sclerosis.

Leão, Tiago; Madeira, Sara C; Gromicho, Marta; de Carvalho, Mamede; Carvalho, Alexandra M.

J Biomed Inform ; 117: 103730, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-33737206

RESUMO

Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease causing patients to quickly lose motor neurons. The disease is characterized by a fast functional impairment and ventilatory decline, leading most patients to die from respiratory failure. To estimate when patients should get ventilatory support, it is helpful to adequately profile the disease progression. For this purpose, we use dynamic Bayesian networks (DBNs), a machine learning model, that graphically represents the conditional dependencies among variables. However, the standard DBN framework only includes dynamic (time-dependent) variables, while most ALS datasets have dynamic and static (time-independent) observations. Therefore, we propose the sdtDBN framework, which learns optimal DBNs with static and dynamic variables. Besides learning DBNs from data, with polynomial-time complexity in the number of variables, the proposed framework enables the user to insert prior knowledge and to make inference in the learned DBNs. We use sdtDBNs to study the progression of 1214 patients from a Portuguese ALS dataset. First, we predict the values of every functional indicator in the patients' consultations, achieving results competitive with state-of-the-art studies. Then, we determine the influence of each variable in patients' decline before and after getting ventilatory support. This insightful information can lead clinicians to pay particular attention to specific variables when evaluating the patients, thus improving prognosis. The case study with ALS shows that sdtDBNs are a promising predictive and descriptive tool, which can also be applied to assess the progression of other diseases, given time-dependent and time-independent clinical observations.

Assuntos

Esclerose Lateral Amiotrófica , Doenças Neurodegenerativas , Algoritmos , Teorema de Bayes , Progressão da Doença , Humanos

8.

Targeting the uncertainty of predictions at patient-level using an ensemble of classifiers coupled with calibration methods, Venn-ABERS, and Conformal Predictors: A case study in AD.

Pereira, Telma; Cardoso, Sandra; Guerreiro, Manuela; Mendonça, Alexandre; Madeira, Sara C.

J Biomed Inform ; 101: 103350, 2020 01.

Artigo em Inglês | MEDLINE | ID: mdl-31816401

RESUMO

Despite being able to make accurate predictions, most existing prognostic models lack a proper indication about the uncertainty of each prediction, that is, the risk of prediction error for individual patients. This hampers their translation to primary care settings through decision support systems. To address this problem, we studied different methods for transforming classifiers into probabilistic/confidence-based predictors (here called uncertainty methods), where predictions are complemented with probability estimates/confidence regions reflecting their uncertainty (uncertainty estimates). We tested several uncertainty methods: two well-known calibration methods (Platt Scaling and Isotonic Regression), Conformal Predictors, and Venn-ABERS predictors. We evaluated whether these methods produce valid predictions, where uncertainty estimates reflect the ground truth probabilities. Furthermore, we assessed the proportion of valid predictions made at high-certainty thresholds (predictions with uncertainty measures above a given threshold) since this impacts their usefulness in clinical decisions. Finally, we proposed an ensemble-based approach where predictions from multiple pairs of (classifier, uncertainty method) are combined to predict whether a given MCI patient will convert to AD. This ensemble should putatively provide predictions for a larger number of patients while releasing users from deciding which pair of (classifier, uncertainty method) is more appropriate for data under study. The analysis was performed with a Portuguese cohort (CCC) of around 400 patients and validated in the publicly available ADNI cohort. Despite our focus on MCI to AD prognosis, the proposed approach can be applied to other diseases and prognostic problems.

Assuntos

Doença de Alzheimer , Calibragem , Humanos , Probabilidade , Prognóstico , Incerteza

9.

Correction: G-bic: generating synthetic benchmarks for biclustering.

Castanho, Eduardo N; Lobo, João P; Henriques, Rui; Madeira, Sara C.

BMC Bioinformatics ; 25(1): 16, 2024 Jan 11.

Artigo em Inglês | MEDLINE | ID: mdl-38212689

10.

Neuropsychological predictors of conversion from mild cognitive impairment to Alzheimer's disease: a feature selection ensemble combining stability and predictability.

Pereira, Telma; Ferreira, Francisco L; Cardoso, Sandra; Silva, Dina; de Mendonça, Alexandre; Guerreiro, Manuela; Madeira, Sara C.

BMC Med Inform Decis Mak ; 18(1): 137, 2018 12 19.

Artigo em Inglês | MEDLINE | ID: mdl-30567554

RESUMO

BACKGROUND: Predicting progression from Mild Cognitive Impairment (MCI) to Alzheimer's Disease (AD) is an utmost open issue in AD-related research. Neuropsychological assessment has proven to be useful in identifying MCI patients who are likely to convert to dementia. However, the large battery of neuropsychological tests (NPTs) performed in clinical practice and the limited number of training examples are challenge to machine learning when learning prognostic models. In this context, it is paramount to pursue approaches that effectively seek for reduced sets of relevant features. Subsets of NPTs from which prognostic models can be learnt should not only be good predictors, but also stable, promoting generalizable and explainable models. METHODS: We propose a feature selection (FS) ensemble combining stability and predictability to choose the most relevant NPTs for prognostic prediction in AD. First, we combine the outcome of multiple (filter and embedded) FS methods. Then, we use a wrapper-based approach optimizing both stability and predictability to compute the number of selected features. We use two large prospective studies (ADNI and the Portuguese Cognitive Complaints Cohort, CCC) to evaluate the approach and assess the predictive value of a large number of NPTs. RESULTS: The best subsets of features include approximately 30 and 20 (from the original 79 and 40) features, for ADNI and CCC data, respectively, yielding stability above 0.89 and 0.95, and AUC above 0.87 and 0.82. Most NPTs learnt using the proposed feature selection ensemble have been identified in the literature as strong predictors of conversion from MCI to AD. CONCLUSIONS: The FS ensemble approach was able to 1) identify subsets of stable and relevant predictors from a consensus of multiple FS methods using baseline NPTs and 2) learn reliable prognostic models of conversion from MCI to AD using these subsets of features. The machine learning models learnt from these features outperformed the models trained without FS and achieved competitive results when compared to commonly used FS algorithms. Furthermore, the selected features are derived from a consensus of methods thus being more robust, while releasing users from choosing the most appropriate FS method to be used in their classification task.

Assuntos

Doença de Alzheimer/diagnóstico , Doença de Alzheimer/etiologia , Disfunção Cognitiva/complicações , Disfunção Cognitiva/psicologia , Idoso , Algoritmos , Disfunção Cognitiva/diagnóstico , Progressão da Doença , Feminino , Humanos , Aprendizado de Máquina , Masculino , Testes Neuropsicológicos , Prognóstico , Estudos Prospectivos

11.

BicPAMS: software for biological data analysis with pattern-based biclustering.

Henriques, Rui; Ferreira, Francisco L; Madeira, Sara C.

BMC Bioinformatics ; 18(1): 82, 2017 Feb 02.

Artigo em Inglês | MEDLINE | ID: mdl-28153040

RESUMO

BACKGROUND: Biclustering has been largely applied for the unsupervised analysis of biological data, being recognised today as a key technique to discover putative modules in both expression data (subsets of genes correlated in subsets of conditions) and network data (groups of coherently interconnected biological entities). However, given its computational complexity, only recent breakthroughs on pattern-based biclustering enabled efficient searches without the restrictions that state-of-the-art biclustering algorithms place on the structure and homogeneity of biclusters. As a result, pattern-based biclustering provides the unprecedented opportunity to discover non-trivial yet meaningful biological modules with putative functions, whose coherency and tolerance to noise can be tuned and made problem-specific. METHODS: To enable the effective use of pattern-based biclustering by the scientific community, we developed BicPAMS (Biclustering based on PAttern Mining Software), a software that: 1) makes available state-of-the-art pattern-based biclustering algorithms (BicPAM (Henriques and Madeira, Alg Mol Biol 9:27, 2014), BicNET (Henriques and Madeira, Alg Mol Biol 11:23, 2016), BicSPAM (Henriques and Madeira, BMC Bioinforma 15:130, 2014), BiC2PAM (Henriques and Madeira, Alg Mol Biol 11:1-30, 2016), BiP (Henriques and Madeira, IEEE/ACM Trans Comput Biol Bioinforma, 2015), DeBi (Serin and Vingron, AMB 6:1-12, 2011) and BiModule (Okada et al., IPSJ Trans Bioinf 48(SIG5):39-48, 2007)); 2) consistently integrates their dispersed contributions; 3) further explores additional accuracy and efficiency gains; and 4) makes available graphical and application programming interfaces. RESULTS: Results on both synthetic and real data confirm the relevance of BicPAMS for biological data analysis, highlighting its essential role for the discovery of putative modules with non-trivial yet biologically significant functions from expression and network data. CONCLUSIONS: BicPAMS is the first biclustering tool offering the possibility to: 1) parametrically customize the structure, coherency and quality of biclusters; 2) analyze large-scale biological networks; and 3) tackle the restrictive assumptions placed by state-of-the-art biclustering algorithms. These contributions are shown to be key for an adequate, complete and user-assisted unsupervised analysis of biological data. SOFTWARE: BicPAMS and its tutorial available in http://www.bicpams.com .

Assuntos

Expressão Gênica , Software , Algoritmos , Linhagem Celular Tumoral , Análise por Conglomerados , Redes Reguladoras de Genes , Humanos , Mapeamento de Interação de Proteínas

12.

Predicting progression of mild cognitive impairment to dementia using neuropsychological data: a supervised learning approach using time windows.

Pereira, Telma; Lemos, Luís; Cardoso, Sandra; Silva, Dina; Rodrigues, Ana; Santana, Isabel; de Mendonça, Alexandre; Guerreiro, Manuela; Madeira, Sara C.

BMC Med Inform Decis Mak ; 17(1): 110, 2017 Jul 19.

Artigo em Inglês | MEDLINE | ID: mdl-28724366

RESUMO

BACKGROUND: Predicting progression from a stage of Mild Cognitive Impairment to dementia is a major pursuit in current research. It is broadly accepted that cognition declines with a continuum between MCI and dementia. As such, cohorts of MCI patients are usually heterogeneous, containing patients at different stages of the neurodegenerative process. This hampers the prognostic task. Nevertheless, when learning prognostic models, most studies use the entire cohort of MCI patients regardless of their disease stages. In this paper, we propose a Time Windows approach to predict conversion to dementia, learning with patients stratified using time windows, thus fine-tuning the prognosis regarding the time to conversion. METHODS: In the proposed Time Windows approach, we grouped patients based on the clinical information of whether they converted (converter MCI) or remained MCI (stable MCI) within a specific time window. We tested time windows of 2, 3, 4 and 5 years. We developed a prognostic model for each time window using clinical and neuropsychological data and compared this approach with the commonly used in the literature, where all patients are used to learn the models, named as First Last approach. This enables to move from the traditional question "Will a MCI patient convert to dementia somewhere in the future" to the question "Will a MCI patient convert to dementia in a specific time window". RESULTS: The proposed Time Windows approach outperformed the First Last approach. The results showed that we can predict conversion to dementia as early as 5 years before the event with an AUC of 0.88 in the cross-validation set and 0.76 in an independent validation set. CONCLUSIONS: Prognostic models using time windows have higher performance when predicting progression from MCI to dementia, when compared to the prognostic approach commonly used in the literature. Furthermore, the proposed Time Windows approach is more relevant from a clinical point of view, predicting conversion within a temporal interval rather than sometime in the future and allowing clinicians to timely adjust treatments and clinical appointments.

Assuntos

Disfunção Cognitiva/diagnóstico , Demência/diagnóstico , Progressão da Doença , Modelos Teóricos , Aprendizado de Máquina Supervisionado , Humanos , Testes Neuropsicológicos , Prognóstico , Fatores de Tempo

13.

Prognostic models based on patient snapshots and time windows: Predicting disease progression to assisted ventilation in Amyotrophic Lateral Sclerosis.

Carreiro, André V; Amaral, Pedro M T; Pinto, Susana; Tomás, Pedro; de Carvalho, Mamede; Madeira, Sara C.

J Biomed Inform ; 58: 133-144, 2015 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-26455265

RESUMO

Amyotrophic Lateral Sclerosis (ALS) is a devastating disease and the most common neurodegenerative disorder of young adults. ALS patients present a rapidly progressive motor weakness. This usually leads to death in a few years by respiratory failure. The correct prediction of respiratory insufficiency is thus key for patient management. In this context, we propose an innovative approach for prognostic prediction based on patient snapshots and time windows. We first cluster temporally-related tests to obtain snapshots of the patient's condition at a given time (patient snapshots). Then we use the snapshots to predict the probability of an ALS patient to require assisted ventilation after k days from the time of clinical evaluation (time window). This probability is based on the patient's current condition, evaluated using clinical features, including functional impairment assessments and a complete set of respiratory tests. The prognostic models include three temporal windows allowing to perform short, medium and long term prognosis regarding progression to assisted ventilation. Experimental results show an area under the receiver operating characteristics curve (AUC) in the test set of approximately 79% for time windows of 90, 180 and 365 days. Creating patient snapshots using hierarchical clustering with constraints outperforms the state of the art, and the proposed prognostic model becomes the first non population-based approach for prognostic prediction in ALS. The results are promising and should enhance the current clinical practice, largely supported by non-standardized tests and clinicians' experience.

Assuntos

Esclerose Lateral Amiotrófica/fisiopatologia , Modelos Teóricos , Respiração Artificial , Progressão da Doença , Humanos , Prognóstico

14.

BicSPAM: flexible biclustering using sequential patterns.

Henriques, Rui; Madeira, Sara C.

BMC Bioinformatics ; 15: 130, 2014 May 06.

Artigo em Inglês | MEDLINE | ID: mdl-24885271

RESUMO

BACKGROUND: Biclustering is a critical task for biomedical applications. Order-preserving biclusters, submatrices where the values of rows induce the same linear ordering across columns, capture local regularities with constant, shifting, scaling and sequential assumptions. Additionally, biclustering approaches relying on pattern mining output deliver exhaustive solutions with an arbitrary number and positioning of biclusters. However, existing order-preserving approaches suffer from robustness, scalability and/or flexibility issues. Additionally, they are not able to discover biclusters with symmetries and parameterizable levels of noise. RESULTS: We propose new biclustering algorithms to perform flexible, exhaustive and noise-tolerant biclustering based on sequential patterns (BicSPAM). Strategies are proposed to allow for symmetries and to seize efficiency gains from item-indexable properties and/or from partitioning methods with conservative distance guarantees. Results show BicSPAM ability to capture symmetries, handle planted noise, and scale in terms of memory and time. BicSPAM also achieves the best match-scores for the recovery of hidden biclusters in synthetic datasets with varying noise distributions and levels of missing values. Finally, results on gene expression data lead to complete solutions, delivering new biclusters corresponding to putative modules with heightened biological relevance. CONCLUSIONS: BicSPAM provides an exhaustive way to discover flexible structures of order-preserving biclusters. To the best of our knowledge, BicSPAM is the first attempt to deal with order-preserving biclusters that allow for symmetries and that are robust to varying levels of noise.

Assuntos

Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Mineração de Dados , Expressão Gênica , Humanos

15.

Temporal stratification of amyotrophic lateral sclerosis patients using disease progression patterns.

M Amaral, Daniela; Soares, Diogo F; Gromicho, Marta; de Carvalho, Mamede; Madeira, Sara C; Tomás, Pedro; Aidos, Helena.

Nat Commun ; 15(1): 5717, 2024 Jul 08.

Artigo em Inglês | MEDLINE | ID: mdl-38977678

RESUMO

Identifying groups of patients with similar disease progression patterns is key to understand disease heterogeneity, guide clinical decisions and improve patient care. In this paper, we propose a data-driven temporal stratification approach, ClusTric, combining triclustering and hierarchical clustering. The proposed approach enables the discovery of complex disease progression patterns not found by univariate temporal analyses. As a case study, we use Amyotrophic Lateral Sclerosis (ALS), a neurodegenerative disease with a non-linear and heterogeneous disease progression. In this context, we applied ClusTric to stratify a hospital-based population (Lisbon ALS Clinic dataset) and validate it in a clinical trial population. The results unravelled four clinically relevant disease progression groups: slow progressors, moderate bulbar and spinal progressors, and fast progressors. We compared ClusTric with a state-of-the-art method, showing its effectiveness in capturing the heterogeneity of ALS disease progression in a lower number of clinically relevant progression groups.

Assuntos

Esclerose Lateral Amiotrófica , Progressão da Doença , Esclerose Lateral Amiotrófica/patologia , Esclerose Lateral Amiotrófica/fisiopatologia , Humanos , Masculino , Análise por Conglomerados , Feminino , Pessoa de Meia-Idade , Idoso

16.

PINTA: a web server for network-based gene prioritization from expression data.

Nitsch, Daniela; Tranchevent, Léon-Charles; Gonçalves, Joana P; Vogt, Josef Korbinian; Madeira, Sara C; Moreau, Yves.

Nucleic Acids Res ; 39(Web Server issue): W334-8, 2011 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-21602267

RESUMO

PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide protein-protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user.

Assuntos

Doença/genética , Perfilação da Expressão Gênica , Mapeamento de Interação de Proteínas , Software , Animais , Genes , Humanos , Internet , Camundongos , Ratos

17.

Triclustering-based classification of longitudinal data for prognostic prediction: targeting relevant clinical endpoints in amyotrophic lateral sclerosis.

Soares, Diogo F; Henriques, Rui; Gromicho, Marta; de Carvalho, Mamede; Madeira, Sara C.

Sci Rep ; 13(1): 6182, 2023 04 15.

Artigo em Inglês | MEDLINE | ID: mdl-37061549

RESUMO

This work proposes a new class of explainable prognostic models for longitudinal data classification using triclusters. A new temporally constrained triclustering algorithm, termed TCtriCluster, is proposed to comprehensively find informative temporal patterns common to a subset of patients in a subset of features (triclusters), and use them as discriminative features within a state-of-the-art classifier with guarantees of interpretability. The proposed approach further enhances prediction with the potentialities of model explainability by revealing clinically relevant disease progression patterns underlying prognostics, describing features used for classification. The proposed methodology is used in the Amyotrophic Lateral Sclerosis (ALS) Portuguese cohort (N = 1321), providing the first comprehensive assessment of the prognostic limits of five notable clinical endpoints: need for non-invasive ventilation (NIV); need for an auxiliary communication device; need for percutaneous endoscopic gastrostomy (PEG); need for a caregiver; and need for a wheelchair. Triclustering-based predictors outperform state-of-the-art alternatives, being able to predict the need for auxiliary communication device (within 180 days) and the need for PEG (within 90 days) with an AUC above 90%. The approach was validated in clinical practice, supporting healthcare professionals in understanding the link between the highly heterogeneous patterns of ALS disease progression and the prognosis.

Assuntos

Esclerose Lateral Amiotrófica , Ventilação não Invasiva , Humanos , Esclerose Lateral Amiotrófica/diagnóstico , Esclerose Lateral Amiotrófica/terapia , Prognóstico , Progressão da Doença , Respiração Artificial , Gastrostomia

18.

Artificial intelligence and statistical methods for stratification and prediction of progression in amyotrophic lateral sclerosis: A systematic review.

Tavazzi, Erica; Longato, Enrico; Vettoretti, Martina; Aidos, Helena; Trescato, Isotta; Roversi, Chiara; Martins, Andreia S; Castanho, Eduardo N; Branco, Ruben; Soares, Diogo F; Guazzo, Alessandro; Birolo, Giovanni; Pala, Daniele; Bosoni, Pietro; Chiò, Adriano; Manera, Umberto; de Carvalho, Mamede; Miranda, Bruno; Gromicho, Marta; Alves, Inês; Bellazzi, Riccardo; Dagliati, Arianna; Fariselli, Piero; Madeira, Sara C; Di Camillo, Barbara.

Artif Intell Med ; 142: 102588, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-37316101

RESUMO

BACKGROUND: Amyotrophic Lateral Sclerosis (ALS) is a fatal neurodegenerative disorder characterised by the progressive loss of motor neurons in the brain and spinal cord. The fact that ALS's disease course is highly heterogeneous, and its determinants not fully known, combined with ALS's relatively low prevalence, renders the successful application of artificial intelligence (AI) techniques particularly arduous. OBJECTIVE: This systematic review aims at identifying areas of agreement and unanswered questions regarding two notable applications of AI in ALS, namely the automatic, data-driven stratification of patients according to their phenotype, and the prediction of ALS progression. Differently from previous works, this review is focused on the methodological landscape of AI in ALS. METHODS: We conducted a systematic search of the Scopus and PubMed databases, looking for studies on data-driven stratification methods based on unsupervised techniques resulting in (A) automatic group discovery or (B) a transformation of the feature space allowing patient subgroups to be identified; and for studies on internally or externally validated methods for the prediction of ALS progression. We described the selected studies according to the following characteristics, when applicable: variables used, methodology, splitting criteria and number of groups, prediction outcomes, validation schemes, and metrics. RESULTS: Of the starting 1604 unique reports (2837 combined hits between Scopus and PubMed), 239 were selected for thorough screening, leading to the inclusion of 15 studies on patient stratification, 28 on prediction of ALS progression, and 6 on both stratification and prediction. In terms of variables used, most stratification and prediction studies included demographics and features derived from the ALSFRS or ALSFRS-R scores, which were also the main prediction targets. The most represented stratification methods were K-means, and hierarchical and expectation-maximisation clustering; while random forests, logistic regression, the Cox proportional hazard model, and various flavours of deep learning were the most widely used prediction methods. Predictive model validation was, albeit unexpectedly, quite rarely performed in absolute terms (leading to the exclusion of 78 eligible studies), with the overwhelming majority of included studies resorting to internal validation only. CONCLUSION: This systematic review highlighted a general agreement in terms of input variable selection for both stratification and prediction of ALS progression, and in terms of prediction targets. A striking lack of validated models emerged, as well as a general difficulty in reproducing many published studies, mainly due to the absence of the corresponding parameter lists. While deep learning seems promising for prediction applications, its superiority with respect to traditional methods has not been established; there is, instead, ample room for its application in the subfield of patient stratification. Finally, an open question remains on the role of new environmental and behavioural variables collected via novel, real-time sensors.

Assuntos

Esclerose Lateral Amiotrófica , Humanos , Esclerose Lateral Amiotrófica/diagnóstico , Inteligência Artificial , Encéfalo , Análise por Conglomerados , Bases de Dados Factuais

19.

Erratum to: BicPAMS: software for biological data analysis with pattern-based biclustering.

Henriques, Rui; Ferreira, Francisco L; Madeira, Sara C.

BMC Bioinformatics ; 18(1): 162, 2017 03 09.

Artigo em Inglês | MEDLINE | ID: mdl-28279148

20.

TFRank: network-based prioritization of regulatory associations underlying transcriptional responses.

Gonçalves, Joana P; Francisco, Alexandre P; Mira, Nuno P; Teixeira, Miguel C; Sá-Correia, Isabel; Oliveira, Arlindo L; Madeira, Sara C.

Bioinformatics ; 27(22): 3149-57, 2011 Nov 15.

Artigo em Inglês | MEDLINE | ID: mdl-21965816

RESUMO

MOTIVATION: Uncovering mechanisms underlying gene expression control is crucial to understand complex cellular responses. Studies in gene regulation often aim to identify regulatory players involved in a biological process of interest, either transcription factors coregulating a set of target genes or genes eventually controlled by a set of regulators. These are frequently prioritized with respect to a context-specific relevance score. Current approaches rely on relevance measures accounting exclusively for direct transcription factor-target interactions, namely overrepresentation of binding sites or target ratios. Gene regulation has, however, intricate behavior with overlapping, indirect effect that should not be neglected. In addition, the rapid accumulation of regulatory data already enables the prediction of large-scale networks suitable for higher level exploration by methods based on graph theory. A paradigm shift is thus emerging, where isolated and constrained analyses will likely be replaced by whole-network, systemic-aware strategies. RESULTS: We present TFRank, a graph-based framework to prioritize regulatory players involved in transcriptional responses within the regulatory network of an organism, whereby every regulatory path containing genes of interest is explored and incorporated into the analysis. TFRank selected important regulators of yeast adaptation to stress induced by quinine and acetic acid, which were missed by a direct effect approach. Notably, they reportedly confer resistance toward the chemicals. In a preliminary study in human, TFRank unveiled regulators involved in breast tumor growth and metastasis when applied to genes whose expression signatures correlated with short interval to metastasis.

Assuntos

Regulação da Expressão Gênica , Redes Reguladoras de Genes , Fatores de Transcrição/metabolismo , Transcrição Gênica , Ácido Acético/farmacologia , Sítios de Ligação , Humanos , Metástase Neoplásica , Quinina/farmacologia , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Transcrição Gênica/efeitos dos fármacos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA