Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
J Med Internet Res ; 20(5): e164, 2018 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-29728351

RESUMO

BACKGROUND: Researchers are developing methods to automatically extract clinically relevant and useful patient characteristics from raw healthcare datasets. These characteristics, often capturing essential properties of patients with common medical conditions, are called computational phenotypes. Being generated by automated or semiautomated, data-driven methods, such potential phenotypes need to be validated as clinically meaningful (or not) before they are acceptable for use in decision making. OBJECTIVE: The objective of this study was to present Phenotype Instance Verification and Evaluation Tool (PIVET), a framework that uses co-occurrence analysis on an online corpus of publically available medical journal articles to build clinical relevance evidence sets for user-supplied phenotypes. PIVET adopts a conceptual framework similar to the pioneering prototype tool PheKnow-Cloud that was developed for the phenotype validation task. PIVET completely refactors each part of the PheKnow-Cloud pipeline to deliver vast improvements in speed without sacrificing the quality of the insights PheKnow-Cloud achieved. METHODS: PIVET leverages indexing in NoSQL databases to efficiently generate evidence sets. Specifically, PIVET uses a succinct representation of the phenotypes that corresponds to the index on the corpus database and an optimized co-occurrence algorithm inspired by the Aho-Corasick algorithm. We compare PIVET's phenotype representation with PheKnow-Cloud's by using PheKnow-Cloud's experimental setup. In PIVET's framework, we also introduce a statistical model trained on domain expert-verified phenotypes to automatically classify phenotypes as clinically relevant or not. Additionally, we show how the classification model can be used to examine user-supplied phenotypes in an online, rather than batch, manner. RESULTS: PIVET maintains the discriminative power of PheKnow-Cloud in terms of identifying clinically relevant phenotypes for the same corpus with which PheKnow-Cloud was originally developed, but PIVET's analysis is an order of magnitude faster than that of PheKnow-Cloud. Not only is PIVET much faster, it can be scaled to a larger corpus and still retain speed. We evaluated multiple classification models on top of the PIVET framework and found ridge regression to perform best, realizing an average F1 score of 0.91 when predicting clinically relevant phenotypes. CONCLUSIONS: Our study shows that PIVET improves on the most notable existing computational tool for phenotype validation in terms of speed and automation and is comparable in terms of accuracy.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Internet/instrumentação , MEDLARS/normas , Algoritmos , Humanos , Fenótipo
2.
Proc Int World Wide Web Conf ; 2019: 659-669, 2019 May.
Artigo em Inglês | MEDLINE | ID: mdl-31198910

RESUMO

In the past few decades, there has been rapid growth in quantity and variety of healthcare data. These large sets of data are usually high dimensional (e.g. patients, their diagnoses, and medications to treat their diagnoses) and cannot be adequately represented as matrices. Thus, many existing algorithms can not analyze them. To accommodate these high dimensional data, tensor factorization, which can be viewed as a higher-order extension of methods like PCA, has attracted much attention and emerged as a promising solution. However, tensor factorization is a computationally expensive task, and existing methods developed to factor large tensors are not flexible enough for real-world situations. To address this scaling problem more efficiently, we introduce SGranite, a distributed, scalable, and sparse tensor factorization method fit through stochastic gradient descent. SGranite offers three contributions: (1) Scalability: it employs a block partitioning and parallel processing design and thus scales to large tensors, (2) Accuracy: we show that our method can achieve results faster without sacrificing the quality of the tensor decomposition, and (3) FlexibleConstraints: we show our approach can encompass various kinds of constraints including l2 norm, l1 norm, and logistic regularization. We demonstrate SGranite's capabilities in two real-world use cases. In the first, we use Google searches for flu-like symptoms to characterize and predict influenza patterns. In the second, we use SGranite to extract clinically interesting sets (i.e., phenotypes) of patients from electronic health records. Through these case studies, we show SGranite has the potential to be used to rapidly characterize, predict, and manage a large multimodal datasets, thereby promising a novel, data-driven solution that can benefit very large segments of the population.

3.
Proc SIAM Int Conf Data Min ; 2019: 711-719, 2019 May.
Artigo em Inglês | MEDLINE | ID: mdl-31198618

RESUMO

Tensor factorization is a methodology that is applied in a variety of fields, ranging from climate modeling to medical informatics. A tensor is an n-way array that captures the relationship between n objects. These multiway arrays can be factored to study the underlying bases present in the data. Two challenges arising in tensor factorization are 1) the resulting factors can be noisy and highly overlapping with one another and 2) they may not map to insights within a domain. However, incorporating supervision to increase the number of insightful factors can be costly in terms of the time and domain expertise necessary for gathering labels or domain-specific constraints. To meet these challenges, we introduce CANDECOMP/PARAFAC (CP) tensor factorization with Cannot-Link Intermode Constraints (CP-CLIC), a framework that achieves succinct, diverse, interpretable factors. This is accomplished by gradually learning constraints that are verified with auxiliary information during the decomposition process. We demonstrate CP-CLIC's potential to extract sparse, diverse, and interpretable factors through experiments on simulated data and a real-world application in medical informatics.

4.
AMIA Annu Symp Proc ; 2018: 564-573, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30815097

RESUMO

A computational phenotype is a set of clinically relevant and interesting characteristics that describe patients with a given condition. Various machine learning methods have been proposed to derive phenotypes in an automatic, high-throughput manner. Among these methods, computational phenotyping through tensor factorization has been shown to produce clinically interesting phenotypes. However, few of these methods incorporate auxiliary patient information into the phenotype derivation process. In this work, we introduce Phenotyping through Semi-Supervised Tensor Factorization (PSST), a method that leverages disease status knowledge about subsets of patients to generate computational phenotypes from tensors constructed from the electronic health records of patients. We demonstrate the potential of PSST to uncover predictive and clinically interesting computational phenotypes through case studies focusing on type-2 diabetes and resistant hypertension. PSST yields more discriminative phenotypes compared to the unsupervised methods and more meaningful phenotypes compared to a supervised method.


Assuntos
Algoritmos , Biologia Computacional , Fenótipo , Diabetes Mellitus Tipo 2/diagnóstico , Registros Eletrônicos de Saúde , Humanos , Hipertensão/diagnóstico , Aprendizado de Máquina
5.
Annu Int Conf IEEE Eng Med Biol Soc ; 2017: 3644-3647, 2017 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-29060688

RESUMO

We propose gamAID, an exploratory, supervised nonnegative tensor factorization method that iteratively extracts phenotypes from tensors constructed from medical count data. Using data from diabetic patients who later on get diagnosed with chronic kidney disorder (CKD) as well as diabetic patients who do not receive a CKD diagnosis, we demonstrate the potential of gamAID to discover phenotypes that characterize patients who are at risk for developing a disease.


Assuntos
Diabetes Mellitus , Algoritmos , Humanos , Insuficiência Renal Crônica
6.
AMIA Jt Summits Transl Sci Proc ; 2017: 149-157, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28815124

RESUMO

As the adoption of Electronic Healthcare Records has grown, the need to transform manual processes that extract and characterize medical data into automatic and high-throughput processes has also grown. Recently, researchers have tackled the problem of automatically extracting candidate phenotypes from EHR data. Since these phenotypes are usually generated using unsupervised or semi-supervised methods, it is necessary to examine and validate the clinical relevance of the generated "candidate" phenotypes. We present PheKnow-Cloud, a framework that uses co-occurrence analysis on the publicly available, online repository ofjournal articles, PubMed, to build sets of evidence for user-supplied candidate phenotypes. PheKnow-Cloud works in an interactive manner to present the results of the candidate phenotype analysis. This tool seeks to help researchers and clinical professionals evaluate the automatically generated phenotypes so they may tune their processes and understand the candidate phenotypes.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA