Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Assunto principal
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38719204

RESUMO

OBJECTIVE: Natural language processing (NLP) algorithms are increasingly being applied to obtain unsupervised representations of electronic health record (EHR) data, but their comparative performance at predicting clinical endpoints remains unclear. Our objective was to compare the performance of unsupervised representations of sequences of disease codes generated by bag-of-words versus sequence-based NLP algorithms at predicting clinically relevant outcomes. MATERIALS AND METHODS: This cohort study used primary care EHRs from 6 286 233 people with Multiple Long-Term Conditions in England. For each patient, an unsupervised vector representation of their time-ordered sequences of diseases was generated using 2 input strategies (212 disease categories versus 9462 diagnostic codes) and different NLP algorithms (Latent Dirichlet Allocation, doc2vec, and 2 transformer models designed for EHRs). We also developed a transformer architecture, named EHR-BERT, incorporating sociodemographic information. We compared the performance of each of these representations (without fine-tuning) as inputs into a logistic classifier to predict 1-year mortality, healthcare use, and new disease diagnosis. RESULTS: Patient representations generated by sequence-based algorithms performed consistently better than bag-of-words methods in predicting clinical endpoints, with the highest performance for EHR-BERT across all tasks, although the absolute improvement was small. Representations generated using disease categories perform similarly to those using diagnostic codes as inputs, suggesting models can equally manage smaller or larger vocabularies for prediction of these outcomes. DISCUSSION AND CONCLUSION: Patient representations produced by sequence-based NLP algorithms from sequences of disease codes demonstrate improved predictive content for patient outcomes compared with representations generated by co-occurrence-based algorithms. This suggests transformer models may be useful for generating multi-purpose representations, even without fine-tuning.

2.
Annu Int Conf IEEE Eng Med Biol Soc ; 2019: 1355-1358, 2019 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31946144

RESUMO

Elucidating protein subcellular localization is an essential topic in proteomics research due to its importance in the process of drug discovery. Unfortunately, experimentally uncovering protein subcellular targets is an arduous process that may not result in a successful localization. In contrast, computational methods can rapidly predict protein subcellular targets and are an efficient alternative to experimental methods for unannotated proteins. In this work, we introduce a new method to predict protein subcellular localization which increases the predictive power of generative probabilistic models while preserving their explanatory benefit. Our method exploits Markov models to produce a feature vector that records micro-similarities between the underlying probability distributions of a given sequence and their counterparts in reference models. Compared to ordinary Markov chain inference, we show that our method improves overall accuracy by 10% under 10-fold cross-validation on a dataset consisting of 10 subcellular locations. The source code is publicly available on https://github.com/aametwally/MC MicroSimilarities.


Assuntos
Cadeias de Markov , Biologia Computacional , Bases de Dados de Proteínas , Transporte Proteico , Proteínas , Análise de Sequência de Proteína , Software , Frações Subcelulares
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...