Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
medRxiv ; 2023 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-36798376

RESUMO

The application of machine learning (ML) tools in electronic health records (EHRs) can help reduce the underdiagnosis of dementia, but models that are not designed to reflect minority population may perpetuate that underdiagnosis. To address the underdiagnosis of dementia in both Black Americans (BAs) and white Americans (WAs), we sought to develop and validate ML models that assign race-specific risk scores. These scores were used to identify undiagnosed dementia in BA and WA Veterans in EHRs. More specifically, risk scores were generated separately for BAs (n=10K) and WAs (n=10K) in training samples of cases and controls by performing ML, equivalence mapping, topic modeling, and a support vector-machine (SVM) in structured and unstructured EHR data. Scores were validated via blinded manual chart reviews (n=1.2K) of controls from a separate sample (n=20K). AUCs and negative and positive predictive values (NPVs and PPVs) were calculated to evaluate the models. There was a strong positive relationship between SVM-generated risk scores and undiagnosed dementia. BAs were more likely than WAs to have undiagnosed dementia per chart review, both overall (15.3% vs 9.5%) and among Veterans with >90th percentile cutoff scores (25.6% vs 15.3%). With chart reviews as the reference standard and varied cutoff scores, the BA model performed slightly better than the WA model (AUC=0.86 with NPV=0.98 and PPV=0.26 at >90th percentile cutoff vs AUC=0.77 with NPV=0.98 and PPV=0.15 at >90th). The AUCs, NPVs, and PPVs suggest that race-specific ML models can assist in the identification of undiagnosed dementia, particularly in BAs. Future studies should investigate implementing EHR-based risk scores in clinics that serve both BA and WA Veterans.

2.
Int J Med Inform ; 139: 104122, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32339929

RESUMO

BACKGROUND: In ambulatory care settings, physicians largely rely on clinical guidelines and guideline-based clinical decision support (CDS) systems to make decisions on hypertension treatment. However, current clinical evidence, which is the knowledge base of clinical guidelines, is insufficient to support definitive optimal treatment. OBJECTIVE: The goal of this study is to test the feasibility of using deep learning predictive models to identify optimal hypertension treatment pathways for individual patients, based on empirical data available from an electronic health record database. MATERIALS AND METHODS: This study used data on 245,499 unique patients who were initially diagnosed with essential hypertension and received anti-hypertensive treatment from January 1, 2001 to December 31, 2010 in ambulatory care settings. We used recurrent neural networks (RNN), including long short-term memory (LSTM) and bi-directional LSTM, to create risk-adapted models to predict the probability of reaching the BP control targets associated with different BP treatment regimens. The ratios for the training set, the validation set, and the test set were 6:2:2. The samples for each set were independently randomly drawn from individual years with corresponding proportions. RESULTS: The LSTM models achieved high accuracy when predicting individual probability of reaching BP goals on different treatments: for systolic BP (<140 mmHg), diastolic BP (<90 mmHg), and both systolic BP and diastolic BP (<140/90 mmHg), F1-scores were 0.928, 0.960, and 0.913, respectively. CONCLUSIONS: The results demonstrated the potential of using predictive models to select optimal hypertension treatment pathways. Along with clinical guidelines and guideline-based CDS systems, the LSTM models could be used as a powerful decision-support tool to form risk-adapted, personalized strategies for hypertension treatment plans, especially for difficult-to-treat patients.


Assuntos
Anti-Hipertensivos/uso terapêutico , Pressão Sanguínea/efeitos dos fármacos , Hipertensão/diagnóstico , Redes Neurais de Computação , Planejamento de Assistência ao Paciente/normas , Guias de Prática Clínica como Assunto/normas , Determinação da Pressão Arterial , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Estudos de Viabilidade , Humanos , Hipertensão/tratamento farmacológico , Monitorização Fisiológica
3.
BMC Med Inform Decis Mak ; 19(1): 128, 2019 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-31288818

RESUMO

BACKGROUND: Dementia is underdiagnosed in both the general population and among Veterans. This underdiagnosis decreases quality of life, reduces opportunities for interventions, and increases health-care costs. New approaches are therefore necessary to facilitate the timely detection of dementia. This study seeks to identify cases of undiagnosed dementia by developing and validating a weakly supervised machine-learning approach that incorporates the analysis of both structured and unstructured electronic health record (EHR) data. METHODS: A topic modeling approach that included latent Dirichlet allocation, stable topic extraction, and random sampling was applied to VHA EHRs. Topic features from unstructured data and features from structured data were compared between Veterans with (n = 1861) and without (n = 9305) ICD-9 dementia codes. A logistic regression model was used to develop dementia prediction scores, and manual reviews were conducted to validate the machine-learning results. RESULTS: A total of 853 features were identified (290 topics, 174 non-dementia ICD codes, 159 CPT codes, 59 medications, and 171 note types) for the development of logistic regression prediction scores. These scores were validated in a subset of Veterans without ICD-9 dementia codes (n = 120) by experts in dementia who performed manual record reviews and achieved a high level of inter-rater agreement. The manual reviews were used to develop a receiver of characteristic (ROC) curve with different thresholds for case detection, including a threshold of 0.061, which produced an optimal sensitivity (0.825) and specificity (0.832). CONCLUSIONS: Dementia is underdiagnosed, and thus, ICD codes alone cannot serve as a gold standard for diagnosis. However, this study suggests that imperfect data (e.g., ICD codes in combination with other EHR features) can serve as a silver standard to develop a risk model, apply that model to patients without dementia codes, and then select a case-detection threshold. The study is one of the first to utilize both structured and unstructured EHRs to develop risk scores for the diagnosis of dementia.


Assuntos
Diagnóstico Tardio , Demência/diagnóstico , Registros Eletrônicos de Saúde , Classificação Internacional de Doenças , Aprendizado de Máquina , Idoso , Idoso de 80 Anos ou mais , Feminino , Humanos , Masculino , Veteranos
4.
AMIA Annu Symp Proc ; 2019: 275-284, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32308820

RESUMO

Greater transparency in salaries overall and in factors associated with differing salaries can help students and professionals plan their careers, discover biases and obstacles, and help advance professional disciplines broadly. In March 2018, we conducted the first salary survey of American Medical Informatics Association members. Our goal was to summarize salary information and provide a nuanced view pertaining to the diverse biomedical informatics community. To identify factors associated with higher salaries, we reviewed average salaries for different groups (physician status, academic status, and different leadership positions) by gender. We also fitted multiple linear regression models for all participants (N = 201) and for gender, physician- and academic-status subgroup. The mean (standard deviation) salary was $181,774 ($99,566). Men earned more than women on average, and especially among professionals from academic settings. More years working in informatics and full-time employment were two factors that were consistently associated with higher salary.


Assuntos
Informática Médica/economia , Salários e Benefícios , Emprego/economia , Docentes , Feminino , Humanos , Masculino , Médicos/economia , Fatores Sexuais , Sociedades Médicas , Estudantes , Inquéritos e Questionários , Estados Unidos
5.
AMIA Annu Symp Proc ; 2017: 585-594, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29854123

RESUMO

Supplementing patient education content with pictographs can improve the comprehension and recall of information, especially patients with low health literacy. Pictograph design and testing, however, are costly and time consuming. We created a Web-based game, Doodle Health, for crowdsourcing the drawing and validation of pictographs. The objective of this pilot study was to test the usability of the game and its appeal to healthcare consumers. The chief purpose of the game is to involve a diverse population in the co-design and evaluation of pictographs. We conducted a community-based focus group to inform the game design. Game designers, health sciences librarians, informatics researchers, clinicians, and community members participated in two Design Box meetings. The results of the meetings were used to create the Doodle Health crowdsourcing game. The game was presented and tested at two public fairs. Initial testing indicates crowdsourcing is a promising approach to pictograph development and testing for relevancy and comprehension. Over 596 drawings were collected and 1,758 guesses were performed to date with 70-90% accuracies, which are satisfactorily high.


Assuntos
Compreensão , Crowdsourcing , Educação de Pacientes como Assunto/métodos , Jogos de Vídeo , Grupos Focais , Letramento em Saúde , Humanos , Projetos Piloto
6.
EGEMS (Wash DC) ; 4(3): 1228, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27683667

RESUMO

INTRODUCTION: Substantial amounts of clinically significant information are contained only within the narrative of the clinical notes in electronic medical records. The v3NLP Framework is a set of "best-of-breed" functionalities developed to transform this information into structured data for use in quality improvement, research, population health surveillance, and decision support. BACKGROUND: MetaMap, cTAKES and similar well-known natural language processing (NLP) tools do not have sufficient scalability out of the box. The v3NLP Framework evolved out of the necessity to scale-up these tools up and provide a framework to customize and tune techniques that fit a variety of tasks, including document classification, tuned concept extraction for specific conditions, patient classification, and information retrieval. INNOVATION: Beyond scalability, several v3NLP Framework-developed projects have been efficacy tested and benchmarked. While v3NLP Framework includes annotators, pipelines and applications, its functionalities enable developers to create novel annotators and to place annotators into pipelines and scaled applications. DISCUSSION: The v3NLP Framework has been successfully utilized in many projects including general concept extraction, risk factors for homelessness among veterans, and identification of mentions of the presence of an indwelling urinary catheter. Projects as diverse as predicting colonization with methicillin-resistant Staphylococcus aureus and extracting references to military sexual trauma are being built using v3NLP Framework components. CONCLUSION: The v3NLP Framework is a set of functionalities and components that provide Java developers with the ability to create novel annotators and to place those annotators into pipelines and applications to extract concepts from clinical text. There are scale-up and scale-out functionalities to process large numbers of records.

7.
J Am Med Inform Assoc ; 23(2): 269-75, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26269536

RESUMO

OBJECTIVE: ClinicalTrials.gov serves critical functions of disseminating trial information to the public and helping the trials recruit participants. This study assessed the readability of trial descriptions at ClinicalTrials.gov using multiple quantitative measures. MATERIALS AND METHODS: The analysis included all 165,988 trials registered at ClinicalTrials.gov as of April 30, 2014. To obtain benchmarks, the authors also analyzed 2 other medical corpora: (1) all 955 Health Topics articles from MedlinePlus and (2) a random sample of 100,000 clinician notes retrieved from an electronic health records system intended for conveying internal communication among medical professionals. The authors characterized each of the corpora using 4 surface metrics, and then applied 5 different scoring algorithms to assess their readability. The authors hypothesized that clinician notes would be most difficult to read, followed by trial descriptions and MedlinePlus Health Topics articles. RESULTS: Trial descriptions have the longest average sentence length (26.1 words) across all corpora; 65% of their words used are not covered by a basic medical English dictionary. In comparison, average sentence length of MedlinePlus Health Topics articles is 61% shorter, vocabulary size is 95% smaller, and dictionary coverage is 46% higher. All 5 scoring algorithms consistently rated CliniclTrials.gov trial descriptions the most difficult corpus to read, even harder than clinician notes. On average, it requires 18 years of education to properly understand these trial descriptions according to the results generated by the readability assessment algorithms. DISCUSSION AND CONCLUSION: Trial descriptions at CliniclTrials.gov are extremely difficult to read. Significant work is warranted to improve their readability in order to achieve CliniclTrials.gov's goal of facilitating information dissemination and subject recruitment.


Assuntos
Ensaios Clínicos como Assunto , Compreensão , Bases de Dados Factuais , Vocabulário , Algoritmos , Análise de Variância , Informação de Saúde ao Consumidor , MedlinePlus , Terminologia como Assunto
8.
Comput Biol Med ; 53: 203-5, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25168254

RESUMO

BACKGROUND: Electronic medical records (EMR) provide an ideal opportunity for the detection, diagnosis, and management of systemic sclerosis (SSc) patients within the Veterans Health Administration (VHA). The objective of this project was to use informatics to identify potential SSc patients in the VHA that were on prednisone, in order to inform an outreach project to prevent scleroderma renal crisis (SRC). METHODS: The electronic medical data for this study came from Veterans Informatics and Computing Infrastructure (VINCI). For natural language processing (NLP) analysis, a set of retrieval criteria was developed for documents expected to have a high correlation to SSc. The two annotators reviewed the ratings to assemble a single adjudicated set of ratings, from which a support vector machine (SVM) based document classifier was trained. Any patient having at least one document positively classified for SSc was considered positive for SSc and the use of prednisone≥10mg in the clinical document was reviewed to determine whether it was an active medication on the prescription list. RESULTS: In the VHA, there were 4272 patients that have a diagnosis of SSc determined by the presence of an ICD-9 code. From these patients, 1118 patients (21%) had the use of prednisone≥10mg. Of these patients, 26 had a concurrent diagnosis of hypertension, thus these patients should not be on prednisone. By the use of natural language processing (NLP) an additional 16,522 patients were identified as possible SSc, highlighting that cases of SSc in the VHA may exist that are unidentified by ICD-9. A 10-fold cross validation of the classifier resulted in a precision (positive predictive value) of 0.814, recall (sensitivity) of 0.973, and f-measure of 0.873. CONCLUSIONS: Our study demonstrated that current clinical practice in the VHA includes the potentially dangerous use of prednisone for veterans with SSc. This present study also suggests there may be many undetected cases of SSc and NLP can successfully identify these patients.


Assuntos
Registros Eletrônicos de Saúde , Insuficiência Renal/prevenção & controle , Escleroderma Sistêmico/complicações , Escleroderma Sistêmico/epidemiologia , Anti-Inflamatórios/efeitos adversos , Anti-Inflamatórios/uso terapêutico , Mineração de Dados , Humanos , Hipertensão , Aplicações da Informática Médica , Processamento de Linguagem Natural , Prednisona/efeitos adversos , Prednisona/uso terapêutico , Insuficiência Renal/epidemiologia , Fatores de Risco , Escleroderma Sistêmico/tratamento farmacológico , Escleroderma Sistêmico/fisiopatologia
9.
AMIA Annu Symp Proc ; 2014: 467-76, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25954351

RESUMO

An opportunity exists for meaningful concept extraction and indexing from large corpora of clinical notes in the Veterans Affairs (VA) electronic medical record. Currently available tools such as MetaMap, cTAKES and HITex do not scale up to address this big data need. Sophia, a rapid UMLS concept extraction annotator was developed to fulfill a mandate and address extraction where high throughput is needed while preserving performance. We report on the development, testing and benchmarking of Sophia against MetaMap and cTAKEs. Sophia demonstrated improved performance on recall as compared to cTAKES and MetaMap (0.71 vs 0.66 and 0.38). The overall f-score was similar to cTAKES and an improvement over MetaMap (0.53 vs 0.57 and 0.43). With regard to speed of processing records, we noted Sophia to be several fold faster than cTAKES and the scaled-out MetaMap service. Sophia offers a viable alternative for high-throughput information extraction tasks.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Unified Medical Language System , Processamento de Linguagem Natural , Estados Unidos , United States Department of Veterans Affairs
10.
AMIA Annu Symp Proc ; 2012: 1050-9, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23304381

RESUMO

We present a study that developed and tested three query expansion methods for the retrieval of clinical documents. Finding relevant documents in a large clinical data warehouse is a challenging task. To address this issue, first, we implemented a synonym expansion strategy that used a few selected vocabularies. Second, we trained a topic model on a large set of clinical documents, which was then used to identify related terms for query expansion. Third, we obtained related terms from a large predicate database derived from Medline abstracts for query expansion. The three expansion methods were tested on a set of clinical notes. All three methods successfully achieved higher average recalls and average F-measures when compared with the baseline method. The average precisions and precision at 10, however, decreased with all expansions. Amongst the three expansion methods, the topic model-based method performed the best in terms of recall and F-measure.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Prontuários Médicos , Processamento de Linguagem Natural , Vocabulário Controlado , Indexação e Redação de Resumos , Medicina Clínica , MEDLINE , Unified Medical Language System
11.
J Med Internet Res ; 9(1): e4, 2007 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-17478413

RESUMO

BACKGROUND: The development of consumer health information applications such as health education websites has motivated the research on consumer health vocabulary (CHV). Term identification is a critical task in vocabulary development. Because of the heterogeneity and ambiguity of consumer expressions, term identification for CHV is more challenging than for professional health vocabularies. OBJECTIVE: For the development of a CHV, we explored several term identification methods, including collaborative human review and automated term recognition methods. METHODS: A set of criteria was established to ensure consistency in the collaborative review, which analyzed 1893 strings. Using the results from the human review, we tested two automated methods-C-value formula and a logistic regression model. RESULTS: The study identified 753 consumer terms and found the logistic regression model to be highly effective for CHV term identification (area under the receiver operating characteristic curve = 95.5%). CONCLUSIONS: The collaborative human review and logistic regression methods were effective for identifying terms for CHV development.


Assuntos
Educação em Saúde/métodos , Vocabulário Controlado , Automação/métodos , Comportamento Cooperativo , Humanos , Modelos Logísticos , Modelos Teóricos , Curva ROC
12.
Hum Pathol ; 38(8): 1212-25, 2007 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-17490722

RESUMO

This report presents an overview for pathologists of the development and potential applications of a novel Web enabled system allowing indexing and retrieval of pathology specimens across multiple institutions. The system was developed through the National Cancer Institute's Shared Pathology Informatics Network program with the goal of creating a prototype system to find existing pathology specimens derived from routine surgical and autopsy procedures ("paraffin blocks") that may be relevant to cancer research. To reach this goal, a number of challenges needed to be met. A central aspect was the development of an informatics system that supported Web-based searching while retaining local control of data. Additional aspects included the development of an eXtensible Markup Language schema, representation of tissue specimen annotation, methods for deidentifying pathology reports, tools for autocoding critical data from these reports using the Unified Medical Language System, and hierarchies of confidentiality and consent that met or exceeded federal requirements. The prototype system supported Web-based querying of millions of pathology reports from 6 participating institutions across the country in a matter of seconds to minutes and the ability of bona fide researchers to identify and potentially to request specific paraffin blocks from the participating institutions. With the addition of associated clinical and outcome information, this system could vastly expand the pool of annotated tissues available for cancer research as well as other diseases.


Assuntos
Informática Médica/organização & administração , Patologia Cirúrgica/organização & administração , Manejo de Espécimes/métodos , Bancos de Tecidos , Humanos , Estados Unidos
13.
J Biomed Inform ; 40(3): 325-31, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16901761

RESUMO

BACKGROUND: Flow cytometry produces large multi-dimensional datasets of the physical and molecular characteristics of individual cells. The objective of this study was to simplify the cytometry datasets by arranging or clustering "objects" (cells) into a smaller number of relatively homogeneous groups (clusters) on the basis of interobject similarities and dissimilarities. RESULTS: The algorithm was designed to be driven by histogram features; that is, the relevant single parameter histogram features were used to guide multidimensional k-means clustering without an a priori estimate of cluster number. To test this approach, we simulated cell-derived datasets using protein-coated microspheres (artificial "cells"). The microspheres were constructed to provide 119 populations in 40 samples. The feature-guided (FG) approach accurately identified 100% of the predetermined cluster combinations. In contrast, an approach based on the partition index (PI) cluster validity measure accurately identified 83.2% of the clusters. Direct comparisons of the two methods indicated that the FG method was significantly more accurate than PI in identifying both the number of clusters and the number of objects within the clusters (p<.0001). CONCLUSION: We conclude that parameter feature analysis can be used to effectively guide k-means clustering of flow cytometry datasets.


Assuntos
Análise por Conglomerados , Biologia Computacional/métodos , Citometria de Fluxo/métodos , Microesferas , Algoritmos , Animais , Anticorpos/química , Inteligência Artificial , Lógica Fuzzy , Humanos , Modelos Estatísticos , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão , Linguagens de Programação , Análise de Regressão
14.
BMC Med Inform Decis Mak ; 6: 30, 2006 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-16872495

RESUMO

BACKGROUND: The text descriptions in electronic medical records are a rich source of information. We have developed a Health Information Text Extraction (HITEx) tool and used it to extract key findings for a research study on airways disease. METHODS: The principal diagnosis, co-morbidity and smoking status extracted by HITEx from a set of 150 discharge summaries were compared to an expert-generated gold standard. RESULTS: The accuracy of HITEx was 82% for principal diagnosis, 87% for co-morbidity, and 90% for smoking status extraction, when cases labeled "Insufficient Data" by the gold standard were excluded. CONCLUSION: We consider the results promising, given the complexity of the discharge summaries and the extraction tasks.


Assuntos
Asma/diagnóstico , Sistemas Computadorizados de Registros Médicos/normas , Processamento de Linguagem Natural , Alta do Paciente , Asma/complicações , Comorbidade , Humanos , Classificação Internacional de Doenças , Doença Pulmonar Obstrutiva Crônica/complicações , Doença Pulmonar Obstrutiva Crônica/diagnóstico , Sensibilidade e Especificidade , Fumar/epidemiologia
15.
J Am Med Inform Assoc ; 13(1): 80-90, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16221944

RESUMO

OBJECTIVE: Health information retrieval (HIR) on the Internet has become an important practice for millions of people, many of whom have problems forming effective queries. We have developed and evaluated a tool to assist people in health-related query formation. DESIGN: We developed the Health Information Query Assistant (HIQuA) system. The system suggests alternative/additional query terms related to the user's initial query that can be used as building blocks to construct a better, more specific query. The recommended terms are selected according to their semantic distance from the original query, which is calculated on the basis of concept co-occurrences in medical literature and log data as well as semantic relations in medical vocabularies. MEASUREMENTS: An evaluation of the HIQuA system was conducted and a total of 213 subjects participated in the study. The subjects were randomized into 2 groups. One group was given query recommendations and the other was not. Each subject performed HIR for both a predefined and a self-defined task. RESULTS: The study showed that providing HIQuA recommendations resulted in statistically significantly higher rates of successful queries (odds ratio = 1.66, 95% confidence interval = 1.16-2.38), although no statistically significant impact on user satisfaction or the users' ability to accomplish the predefined retrieval task was found. CONCLUSION: Providing semantic-distance-based query recommendations can help consumers with query formation during HIR.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Terminologia como Assunto , Adulto , Comportamento do Consumidor , Saúde , Humanos , Internet , Pacientes , Semântica , Interface Usuário-Computador , Vocabulário Controlado
16.
J Am Med Inform Assoc ; 13(1): 24-9, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16221948

RESUMO

Laypersons ("consumers") often have difficulty finding, understanding, and acting on health information due to gaps in their domain knowledge. Ideally, consumer health vocabularies (CHVs) would reflect the different ways consumers express and think about health topics, helping to bridge this vocabulary gap. However, despite the recent research on mismatches between consumer and professional language (e.g., lexical, semantic, and explanatory), there have been few systematic efforts to develop and evaluate CHVs. This paper presents the point of view that CHV development is practical and necessary for extending research on informatics-based tools to facilitate consumer health information seeking, retrieval, and understanding. In support of the view, we briefly describe a distributed, bottom-up approach for (1) exploring the relationship between common consumer health expressions and professional concepts and (2) developing an open-access, preliminary (draft) "first-generation" CHV. While recognizing the limitations of the approach (e.g., not addressing psychosocial and cultural factors), we suggest that such exploratory research and development will yield insights into the nature of consumer health expressions and assist developers in creating tools and applications to support consumer health information seeking.


Assuntos
Pacientes , Vocabulário Controlado , Vocabulário , Saúde , Informática , Terminologia como Assunto
17.
AMIA Annu Symp Proc ; : 931, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17238550

RESUMO

Textual medical records contain a wealth of information that needs to be extracted and / or indexed in order to be analyzed and interpreted by the automated tools. We have developed a collection of natural language processing (NLP) tools to extract various types of information from unstructured medical records. The generic NLP components, when assembled in pipelines and initialized with custom configuration parameters, become a powerful medical data mining instrument. We have successfully extracted such medical concepts as diagnoses, comorbidities, discharge medications, and smoking status from various types of medical records.


Assuntos
Armazenamento e Recuperação da Informação , Prontuários Médicos , Processamento de Linguagem Natural , Indexação e Redação de Resumos
18.
AMIA Annu Symp Proc ; : 1155, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17238774

RESUMO

The Consumer Health Vocabulary Initiative (http://consumerhealthvocab.org/) is a multi-disciplinary effort to facilitate the research and development of consumer health vocabularies (CHVs). We are currently investigating different types of lexical forms used in lay expressions (i.e. words and phrases): consumer-friendly display forms, consumer forms that have different semantics in professional and lay contexts and consumer forms not covered by professional health terminologies. The next step will address lay and professional conceptual differences in second-generation CHVs.


Assuntos
Aplicações da Informática Médica , Vocabulário , Atenção à Saúde , Humanos
19.
AMIA Annu Symp Proc ; : 859-63, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16779162

RESUMO

We have developed a systematic methodology using corpus-based text analysis followed by human review to assign "consumer-friendly display (CFD) names" to medical concepts from the National Library of Medicine (NLM) Unified Medical Language System (UMLS) Metathesaurus. Using NLM MedlinePlus queries as a corpus of consumer expressions and a collaborative Web-based tool to facilitate review, we analyzed 425 frequently occurring concepts. As a preliminary test of our method, we evaluated 34 ana-lyzed concepts and their CFD names, using a questionnaire modeled on standard reading assessments. The initial results that consumers (n=10) are more likely to understand and recognize CFD names than alternate labels suggest that the approach is useful in the development of consumer health vocabularies for displaying understandable health information.


Assuntos
Pacientes , Unified Medical Language System , Vocabulário , Participação da Comunidade , Humanos
20.
J Med Internet Res ; 6(3): e27, 2004 Sep 03.
Artigo em Inglês | MEDLINE | ID: mdl-15471753

RESUMO

BACKGROUND: The Internet is becoming an increasingly important resource for health-information seekers. However, consumers often do not use effective search strategies. Query reformulation is one potential intervention to improve the effectiveness of consumer searches. OBJECTIVE: We endeavored to answer the research question: "Does reformulating original consumer queries with preferred terminology from the Unified Medical Language System (UMLS) Metathesaurus lead to better search returns?" METHODS: Consumer-generated queries with known goals (n=16) that could be mapped to UMLS Metathesaurus terminology were used as test samples. Reformulated queries were generated by replacing user terms with Metathesaurus-preferred synonyms (n=18). Searches (n=36) were performed using both a consumer information site and a general search engine. Top 30 precision was used as a performance indicator to compare the performance of the original and reformulated queries. RESULTS: Forty-two percent of the searches utilizing reformulated queries yielded better search returns than their associated original queries, 19% yielded worse results, and the results for the remaining 39% did not change. We identified ambiguous lay terms, expansion of acronyms, and arcane professional terms as causes for changes in performance. CONCLUSIONS: We noted a trend towards increased precision when providing substitutions for lay terms, abbreviations, and acronyms. We have found qualitative evidence that reformulating queries with professional terminology may be a promising strategy to improve consumer health-information searches, although we caution that automated reformulation could in fact worsen search performance when the terminology is ill-fitted or arcane.


Assuntos
Serviços de Informação/normas , Internet/normas , Unified Medical Language System/normas , Comportamento do Consumidor , Humanos , MedlinePlus/normas , Educação de Pacientes como Assunto , Projetos Piloto , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...