Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
1.
J Biomed Inform ; 58 Suppl: S92-S102, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26241355

RESUMO

Automated phenotype identification plays a critical role in cohort selection and bioinformatics data mining. Natural Language Processing (NLP)-informed classification techniques can robustly identify phenotypes in unstructured medical notes. In this paper, we systematically assess the effect of naive, lexically normalized, and semantic feature spaces on classifier performance for obesity, atherosclerotic cardiovascular disease (CAD), hyperlipidemia, hypertension, and diabetes. We train support vector machines (SVMs) using individual feature spaces as well as combinations of these feature spaces on two small training corpora (730 and 790 documents) and a combined (1520 documents) training corpus. We assess the importance of feature spaces and training data size on SVM model performance. We show that inclusion of semantically-informed features does not statistically improve performance for these models. The addition of training data has weak effects of mixed statistical significance across disease classes suggesting larger corpora are not necessary to achieve relatively high performance with these models.


Assuntos
Doenças Cardiovasculares/diagnóstico , Diabetes Mellitus/diagnóstico , Diagnóstico por Computador/métodos , Registros Eletrônicos de Saúde/estatística & dados numéricos , Processamento de Linguagem Natural , Obesidade/diagnóstico , Doenças Cardiovasculares/epidemiologia , Mineração de Dados/métodos , Sistemas de Apoio a Decisões Clínicas/organização & administração , Humanos , New York , Reconhecimento Automatizado de Padrão/métodos , Fenótipo , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Máquina de Vetores de Suporte
2.
J Biomed Inform ; 58 Suppl: S67-S77, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26210362

RESUMO

The second track of the 2014 i2b2/UTHealth natural language processing shared task focused on identifying medical risk factors related to Coronary Artery Disease (CAD) in the narratives of longitudinal medical records of diabetic patients. The risk factors included hypertension, hyperlipidemia, obesity, smoking status, and family history, as well as diabetes and CAD, and indicators that suggest the presence of those diseases. In addition to identifying the risk factors, this track of the 2014 i2b2/UTHealth shared task studied the presence and progression of the risk factors in longitudinal medical records. Twenty teams participated in this track, and submitted 49 system runs for evaluation. Six of the top 10 teams achieved F1 scores over 0.90, and all 10 scored over 0.87. The most successful system used a combination of additional annotations, external lexicons, hand-written rules and Support Vector Machines. The results of this track indicate that identification of risk factors and their progression over time is well within the reach of automated systems.


Assuntos
Doença da Artéria Coronariana/epidemiologia , Mineração de Dados/métodos , Complicações do Diabetes/epidemiologia , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Idoso , Boston/epidemiologia , Estudos de Coortes , Comorbidade , Segurança Computacional , Confidencialidade , Doença da Artéria Coronariana/diagnóstico , Complicações do Diabetes/diagnóstico , Feminino , Humanos , Incidência , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Reconhecimento Automatizado de Padrão/métodos , Medição de Risco/métodos , Vocabulário Controlado
3.
J Biomed Inform ; 58 Suppl: S11-S19, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26225918

RESUMO

The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured four tracks. The first of these was the de-identification track focused on identifying protected health information (PHI) in longitudinal clinical narratives. The longitudinal nature of clinical narratives calls particular attention to details of information that, while benign on their own in separate records, can lead to identification of patients in combination in longitudinal records. Accordingly, the 2014 de-identification track addressed a broader set of entities and PHI than covered by the Health Insurance Portability and Accountability Act - the focus of the de-identification shared task that was organized in 2006. Ten teams tackled the 2014 de-identification task and submitted 22 system outputs for evaluation. Each team was evaluated on their best performing system output. Three of the 10 systems achieved F1 scores over .90, and seven of the top 10 scored over .75. The most successful systems combined conditional random fields and hand-written rules. Our findings indicate that automated systems can be very effective for this task, but that de-identification is not yet a solved problem.


Assuntos
Segurança Computacional , Mineração de Dados/métodos , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Estudos de Coortes , Confidencialidade , Estudos Longitudinais , Vocabulário Controlado
4.
IEEE Trans Vis Comput Graph ; 12(6): 1414-26, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17073365

RESUMO

Despite extensive research, it is still difficult to produce effective interactive layouts for large graphs. Dense layout and occlusion make food webs, ontologies, and social networks difficult to understand and interact with. We propose a new interactive Visual Analytics component called TreePlus that is based on a tree-style layout. TreePlus reveals the missing graph structure with visualization and interaction while maintaining good readability. To support exploration of the local structure of the graph and gathering of information from the extensive reading of labels, we use a guiding metaphor of "Plant a seed and watch it grow." It allows users to start with a node and expand the graph as needed, which complements the classic overview techniques that can be effective at (but often limited to) revealing clusters. We describe our design goals, describe the interface, and report on a controlled user study with 28 participants comparing TreePlus with a traditional graph interface for six tasks. In general, the advantage of TreePlus over the traditional interface increased as the density of the displayed data increased. Participants also reported higher levels of confidence in their answers with TreePlus and most of them preferred TreePlus.


Assuntos
Algoritmos , Gráficos por Computador , Armazenamento e Recuperação da Informação/métodos , Modelos Biológicos , Software , Interface Usuário-Computador , Simulação por Computador , Reconhecimento Automatizado de Padrão
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA