Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
Artif Intell Med ; 147: 102745, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38184352

RESUMO

Human accuracy in diagnosing psychiatric disorders is still low. Even though digitizing health care leads to more and more data, the successful adoption of AI-based digital decision support (DDSS) is rare. One reason is that AI algorithms are often not evaluated based on large, real-world data. This research shows the potential of using deep learning on the medical claims data of 812,853 people between 2018 and 2022, with 26,973,943 ICD-10-coded diseases, to predict depression (F32 and F33 ICD-10 codes). The dataset used represents almost the entire adult population of Estonia. Based on these data, to show the critical importance of the underlying temporal properties of the data for the detection of depression, we evaluate the performance of non-sequential models (LR, FNN), sequential models (LSTM, CNN-LSTM) and the sequential model with a decay factor (GRU-Δt, GRU-decay). Furthermore, since explainability is necessary for the medical domain, we combine a self-attention model with the GRU decay and evaluate its performance. We named this combination Att-GRU-decay. After extensive empirical experimentation, our model (Att-GRU-decay), with an AUC score of 0.990, an AUPRC score of 0.974, a specificity of 0.999 and a sensitivity of 0.944, proved to be the most accurate. The results of our novel Att-GRU-decay model outperform the current state of the art, demonstrating the potential usefulness of deep learning algorithms for DDSS development. We further expand this by describing a possible application scenario of the proposed algorithm for depression screening in a general practitioner (GP) setting-not only to decrease healthcare costs, but also to improve the quality of care and ultimately decrease people's suffering.


Assuntos
Aprendizado Profundo , Transtornos Mentais , Adulto , Humanos , Depressão/diagnóstico , Algoritmos
2.
IEEE/ACM Trans Comput Biol Bioinform ; 18(3): 1035-1048, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-32776880

RESUMO

Breast-cancer (BC) is the most common invasive cancer in women, with considerable death. Given that, BC is classified as a hormone-dependent cancer, when it collides with pregnancy, different questions may arise for which there are still no convincing answers. To deal with this issue, two new frameworks are proposed within this paper: CoRaM and Dist-CoRaM. The former is the first unified framework dedicated to the extraction of a generic basis of Correlated-Rare Association rules from gene expression data. The proposed approach has been successfully applied on a breast-cancer Gene Expression Matrix (GSE1379) with very promising results. The latter, the Dist-CoRaM approach, is a big-data processing based on Apache spark framework, dealing with correlation mining from micro-array pregnancy associated breast-cancer assays (PABC) data. It is successfully applied on the (GSE31192) gene expression matrix (GEM). The correlated patterns of gene-sets shed light on the fact that PABC exhibits heightened aggressiveness compared to cancers for Non-PABC women. Our findings suggest that higher levels of estrogen and progesterone hormones, unfortunately, are very keen to the increase of the tumor aggressiveness and the proliferation of the cancer.


Assuntos
Neoplasias da Mama/genética , Complicações Neoplásicas na Gravidez/genética , Transcriptoma/genética , Algoritmos , Biologia Computacional , Mineração de Dados , Feminino , Humanos , Aprendizado de Máquina , Gravidez
3.
IEEE/ACM Trans Comput Biol Bioinform ; 15(6): 2060-2066, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29994444

RESUMO

Due to the rapid progress of biological networks for modeling biological systems, a lot of biomolecular networks have been producing more and more protein-protein interaction (PPI) data. Analyzing protein-protein interaction networks aims to find regions of topological and functional (dis)similarities between molecular networks of different species. The study of PPI networks has the potential to teach us as much about life process and diseases at the molecular level. Although few methods have been developed for multiple PPI network alignment and thus, new network alignment methods are of a compelling need. In this paper, we propose a novel algorithm for a global alignment of multiple protein-protein interaction networks called MAPPIN. The latter relies on information available for the proteins in the networks, such as sequence, function, and network topology. Our algorithm is perfectly designed to exploit current multi-core CPU architectures, and has been extensively tested on a real data (eight species). Our experimental results show that MAPPIN significantly outperforms NetCoffee in terms of coverage. Nevertheless, MAPPIN is handicapped by the time required to load the gene annotation file. An extensive comparison versus the pioneering PPI methods also show that MAPPIN is often efficient in terms of coverage, mean entropy, or mean normalized.


Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Algoritmos , Animais , Ensaios de Triagem em Larga Escala , Humanos , Mapas de Interação de Proteínas , Proteínas/química , Alinhamento de Sequência
4.
PLoS One ; 9(6): e95275, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24901648

RESUMO

Identification of protein domains is a key step for understanding protein function. Hidden Markov Models (HMMs) have proved to be a powerful tool for this task. The Pfam database notably provides a large collection of HMMs which are widely used for the annotation of proteins in sequenced organisms. This is done via sequence/HMM comparisons. However, this approach may lack sensitivity when searching for domains in divergent species. Recently, methods for HMM/HMM comparisons have been proposed and proved to be more sensitive than sequence/HMM approaches in certain cases. However, these approaches are usually not used for protein domain discovery at a genome scale, and the benefit that could be expected from their utilization for this problem has not been investigated. Using proteins of P. falciparum and L. major as examples, we investigate the extent to which HMM/HMM comparisons can identify new domain occurrences not already identified by sequence/HMM approaches. We show that although HMM/HMM comparisons are much more sensitive than sequence/HMM comparisons, they are not sufficiently accurate to be used as a standalone complement of sequence/HMM approaches at the genome scale. Hence, we propose to use domain co-occurrence--the general domain tendency to preferentially appear along with some favorite domains in the proteins--to improve the accuracy of the approach. We show that the combination of HMM/HMM comparisons and co-occurrence domain detection boosts protein annotations. At an estimated False Discovery Rate of 5%, it revealed 901 and 1098 new domains in Plasmodium and Leishmania proteins, respectively. Manual inspection of part of these predictions shows that it contains several domain families that were missing in the two organisms. All new domain occurrences have been integrated in the EuPathDomains database, along with the GO annotations that can be deduced.


Assuntos
Biologia Computacional , Cadeias de Markov , Domínios e Motivos de Interação entre Proteínas , Proteínas/química , Biologia Computacional/métodos , Anotação de Sequência Molecular , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
5.
Infect Genet Evol ; 9(3): 328-36, 2009 May.
Artigo em Inglês | MEDLINE | ID: mdl-18992849

RESUMO

The production of increasingly reliable and accessible gene expression data has stimulated the development of computational tools to interpret such data and to organize them efficiently. The clustering techniques are largely recognized as useful exploratory tools for gene expression data analysis. Genes that show similar expression patterns over a wide range of experimental conditions can be clustered together. This relies on the hypothesis that genes that belong to the same cluster are coregulated and involved in related functions. Nevertheless, clustering algorithms still show limits, particularly for the estimation of the number of clusters and the interpretation of hierarchical dendrogram, which may significantly influence the outputs of the analysis process. We propose here a multi level SOM based clustering algorithm named Multi-SOM. Through the use of clustering validity indices, Multi-SOM overcomes the problem of the estimation of clusters number. To test the validity of the proposed clustering algorithm, we first tested it on supervised training data sets. Results were evaluated by computing the number of misclassified samples. We have then used Multi-SOM for the analysis of macrophage gene expression data generated in vitro from the same individual blood infected with 5 different pathogens. This analysis led to the identification of sets of tightly coregulated genes across different pathogens. Gene Ontology tools were then used to estimate the biological significance of the clustering, which showed that the obtained clusters are coherent and biologically significant.


Assuntos
Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Macrófagos/fisiologia , Redes Neurais de Computação , Algoritmos , Animais , Neoplasias da Mama/diagnóstico , Diabetes Mellitus/diagnóstico , Feminino , Regulação da Expressão Gênica , Humanos , Família Multigênica , Análise de Sequência com Séries de Oligonucleotídeos , Reconhecimento Automatizado de Padrão , Infecções por Protozoários/genética , Tuberculose Pulmonar/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA