Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 96
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
J Biomed Inform ; 140: 104324, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36842490

RESUMO

BACKGROUND: Online health communities (OHCs) have emerged as prominent platforms for behavior modification, and the digitization of online peer interactions has afforded researchers with unique opportunities to model multilevel mechanisms that drive behavior change. Existing studies, however, have been limited by a lack of methods that allow the capture of conversational context and socio-behavioral dynamics at scale, as manifested in these digital platforms. OBJECTIVE: We develop, evaluate, and apply a novel methodological framework, Pragmatics to Reveal Intent in Social Media (PRISM), to facilitate granular characterization of peer interactions by combining multidimensional facets of human communication. METHODS: We developed and applied PRISM to analyze peer interactions (N = 2.23 million) in QuitNet, an OHC for tobacco cessation. First, we generated a labeled set of peer interactions (n = 2,005) through manual annotation along three dimensions: communication themes (CTs), behavior change techniques (BCTs), and speech acts (SAs). Second, we used deep learning models to apply our qualitative codes at scale. Third, we applied our validated model to perform a retrospective analysis. Finally, using social network analysis (SNA), we portrayed large-scale patterns and relationships among the aforementioned communication dimensions embedded in peer interactions in QuitNet. RESULTS: Qualitative analysis showed that the themes of social support and behavioral progress were common. The most used BCTs were feedback and monitoring and comparison of behavior, and users most commonly expressed their intentions using SAs-expressive and emotion. With additional in-domain pre-training, bidirectional encoder representations from Transformers (BERT) outperformed other deep learning models on the classification tasks. Content-specific SNA revealed that users' engagement or abstinence status is associated with the prevalence of various categories of BCTs and SAs, which also was evident from the visualization of network structures. CONCLUSIONS: Our study describes the interplay of multilevel characteristics of online communication and their association with individual health behaviors.


Assuntos
Mídias Sociais , Humanos , Estudos Retrospectivos , Intenção , Apoio Social , Comunicação
2.
BMC Med Inform Decis Mak ; 23(Suppl 1): 87, 2023 05 09.
Artigo em Inglês | MEDLINE | ID: mdl-37161566

RESUMO

BACKGROUND: Biomedical ontologies are representations of biomedical knowledge that provide terms with precisely defined meanings. They play a vital role in facilitating biomedical research in a cross-disciplinary manner. Quality issues of biomedical ontologies will hinder their effective usage. One such quality issue is missing concepts. In this study, we introduce a logical definition-based approach to identify potential missing concepts in SNOMED CT. A unique contribution of our approach is that it is capable of obtaining both logical definitions and fully specified names for potential missing concepts. METHOD: The logical definitions of unrelated pairs of fully defined concepts in non-lattice subgraphs that indicate quality issues are intersected to generate the logical definitions of potential missing concepts. A text summarization model (called PEGASUS) is fine-tuned to predict the fully specified names of the potential missing concepts from their generated logical definitions. Furthermore, the identified potential missing concepts are validated using external resources including the Unified Medical Language System (UMLS), biomedical literature in PubMed, and a newer version of SNOMED CT. RESULTS: From the March 2021 US Edition of SNOMED CT, we obtained a total of 30,313 unique logical definitions for potential missing concepts through the intersecting process. We fine-tuned a PEGASUS summarization model with 289,169 training instances and tested it on 36,146 instances. The model achieved 72.83 of ROUGE-1, 51.06 of ROUGE-2, and 71.76 of ROUGE-L on the test dataset. The model correctly predicted 11,549 out of 36,146 fully specified names in the test dataset. Applying the fine-tuned model on the 30,313 unique logical definitions, 23,031 total potential missing concepts were identified. Out of these, a total of 2,312 (10.04%) were automatically validated by either of the three resources. CONCLUSIONS: The results showed that our logical definition-based approach for identification of potential missing concepts in SNOMED CT is encouraging. Nevertheless, there is still room for improving the performance of naming concepts based on logical definitions.


Assuntos
Ontologias Biológicas , Pesquisa Biomédica , Humanos , Systematized Nomenclature of Medicine , Conhecimento , Idioma
3.
BMC Bioinformatics ; 23(Suppl 6): 281, 2022 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-35836130

RESUMO

BACKGROUND: Model card reports aim to provide informative and transparent description of machine learning models to stakeholders. This report document is of interest to the National Institutes of Health's Bridge2AI initiative to address the FAIR challenges with artificial intelligence-based machine learning models for biomedical research. We present our early undertaking in developing an ontology for capturing the conceptual-level information embedded in model card reports. RESULTS: Sourcing from existing ontologies and developing the core framework, we generated the Model Card Report Ontology. Our development efforts yielded an OWL2-based artifact that represents and formalizes model card report information. The current release of this ontology utilizes standard concepts and properties from OBO Foundry ontologies. Also, the software reasoner indicated no logical inconsistencies with the ontology. With sample model cards of machine learning models for bioinformatics research (HIV social networks and adverse outcome prediction for stent implantation), we showed the coverage and usefulness of our model in transforming static model card reports to a computable format for machine-based processing. CONCLUSIONS: The benefit of our work is that it utilizes expansive and standard terminologies and scientific rigor promoted by biomedical ontologists, as well as, generating an avenue to make model cards machine-readable using semantic web technology. Our future goal is to assess the veracity of our model and later expand the model to include additional concepts to address terminological gaps. We discuss tools and software that will utilize our ontology for potential application services.


Assuntos
Ontologias Biológicas , Semântica , Inteligência Artificial , Biologia Computacional , Aprendizado de Máquina , Software
4.
J Biomed Inform ; 116: 103726, 2021 04.
Artigo em Inglês | MEDLINE | ID: mdl-33711541

RESUMO

The paradigm of representation learning through transfer learning has the potential to greatly enhance clinical natural language processing. In this work, we propose a multi-task pre-training and fine-tuning approach for learning generalized and transferable patient representations from medical language. The model is first pre-trained with different but related high-prevalence phenotypes and further fine-tuned on downstream target tasks. Our main contribution focuses on the impact this technique can have on low-prevalence phenotypes, a challenging task due to the dearth of data. We validate the representation from pre-training, and fine-tune the multi-task pre-trained models on low-prevalence phenotypes including 38 circulatory diseases, 23 respiratory diseases, and 17 genitourinary diseases. We find multi-task pre-training increases learning efficiency and achieves consistently high performance across the majority of phenotypes. Most important, the multi-task pre-training is almost always either the best-performing model or performs tolerably close to the best-performing model, a property we refer to as robust. All these results lead us to conclude that this multi-task transfer learning architecture is a robust approach for developing generalized and transferable patient language representations for numerous phenotypes.


Assuntos
Idioma , Processamento de Linguagem Natural , Humanos
5.
J Biomed Inform ; 115: 103671, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33387683

RESUMO

OBJECTIVES: Patient representation learning refers to learning a dense mathematical representation of a patient that encodes meaningful information from Electronic Health Records (EHRs). This is generally performed using advanced deep learning methods. This study presents a systematic review of this field and provides both qualitative and quantitative analyses from a methodological perspective. METHODS: We identified studies developing patient representations from EHRs with deep learning methods from MEDLINE, EMBASE, Scopus, the Association for Computing Machinery (ACM) Digital Library, and the Institute of Electrical and Electronics Engineers (IEEE) Xplore Digital Library. After screening 363 articles, 49 papers were included for a comprehensive data collection. RESULTS: Publications developing patient representations almost doubled each year from 2015 until 2019. We noticed a typical workflow starting with feeding raw data, applying deep learning models, and ending with clinical outcome predictions as evaluations of the learned representations. Specifically, learning representations from structured EHR data was dominant (37 out of 49 studies). Recurrent Neural Networks were widely applied as the deep learning architecture (Long short-term memory: 13 studies, Gated recurrent unit: 11 studies). Learning was mainly performed in a supervised manner (30 studies) optimized with cross-entropy loss. Disease prediction was the most common application and evaluation (31 studies). Benchmark datasets were mostly unavailable (28 studies) due to privacy concerns of EHR data, and code availability was assured in 20 studies. DISCUSSION & CONCLUSION: The existing predictive models mainly focus on the prediction of single diseases, rather than considering the complex mechanisms of patients from a holistic review. We show the importance and feasibility of learning comprehensive representations of patient EHR data through a systematic review. Advances in patient representation learning techniques will be essential for powering patient-level EHR analyses. Future work will still be devoted to leveraging the richness and potential of available EHR data. Reproducibility and transparency of reported results will hopefully improve. Knowledge distillation and advanced learning techniques will be exploited to assist the capability of learning patient representation further.


Assuntos
Aprendizado Profundo , Registros Eletrônicos de Saúde , Humanos , Redes Neurais de Computação , Prognóstico , Reprodutibilidade dos Testes
6.
J Biomed Inform ; 121: 103865, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34245913

RESUMO

We present an overview of the TREC-COVID Challenge, an information retrieval (IR) shared task to evaluate search on scientific literature related to COVID-19. The goals of TREC-COVID include the construction of a pandemic search test collection and the evaluation of IR methods for COVID-19. The challenge was conducted over five rounds from April to July 2020, with participation from 92 unique teams and 556 individual submissions. A total of 50 topics (sets of related queries) were used in the evaluation, starting at 30 topics for Round 1 and adding 5 new topics per round to target emerging topics at that state of the still-emerging pandemic. This paper provides a comprehensive overview of the structure and results of TREC-COVID. Specifically, the paper provides details on the background, task structure, topic structure, corpus, participation, pooling, assessment, judgments, results, top-performing systems, lessons learned, and benchmark datasets.


Assuntos
COVID-19 , Pandemias , Humanos , Armazenamento e Recuperação da Informação , SARS-CoV-2
7.
J Biomed Inform ; 104: 103399, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-32151769

RESUMO

OBJECTIVE: The centrality of data to biomedical research is difficult to understate, and the same is true for the importance of the biomedical literature in disseminating empirical findings to scientific questions made on such data. But the connections between the literature and related datasets are often weak, hampering the ability of scientists to easily move between existing datasets and existing findings to derive new scientific hypotheses. This work aims to recommend relevant literature articles for datasets with the ultimate goal of increasing the productivity of researchers. Our approach to literature recommendation for datasets is a part of the dataset reusability platform developed at the University Texas Health Science Center at Houston for datasets related to gene expression. This platform incorporates datasets from Gene Expression Omnibus (GEO). An average of 34 datasets were added to GEO daily in the last five years (i.e. 2014 to 2018), demonstrating the need for automatic methods to connect these datasets with relevant literature. The relevant literature for a given dataset may describe that dataset, provide a scientific finding based on that dataset, or even describe prior and related work to the dataset's topic that is of interest to users of the dataset. MATERIALS AND METHODS: We adopt an information retrieval paradigm for literature recommendation. In our experiments, distributional semantic features are created from the title and abstract of MEDLINE articles. Then, related articles are identified for datasets in GEO. We evaluate multiple distributional methods such as TF-IDF, BM25, Latent Semantic Analysis, Latent Dirichlet Allocation, word2vec, and doc2vec. Top similar papers are recommended for each dataset using cosine similarity between the dataset's vector representation and every paper's vector representation. We also propose several novel re-ranking and normalization methods over embeddings to improve the recommendations. RESULTS: The top-performing literature recommendation technique achieved a strict precision at 10 of 0.8333 and a partial precision at 10 of 0.9000 using BM25 based on a manual evaluation of 36 datasets. Evaluation on a larger, automatically-collected benchmark shows small but consistent gains by emphasizing the similarity of dataset and article titles. CONCLUSION: This work is the first step toward developing a literature recommendation tool by recommending relevant literature for datasets. This will hopefully lead to better data reuse experience.


Assuntos
Pesquisa Biomédica , Armazenamento e Recuperação da Informação , Expressão Gênica , Humanos , Publicações , Semântica
8.
J Biomed Inform ; 108: 103473, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32562898

RESUMO

Radiology reports contain a radiologist's interpretations of images, and these images frequently describe spatial relations. Important radiographic findings are mostly described in reference to an anatomical location through spatial prepositions. Such spatial relationships are also linked to various differential diagnoses and often described through uncertainty phrases. Structured representation of this clinically significant spatial information has the potential to be used in a variety of downstream clinical informatics applications. Our focus is to extract these spatial representations from the reports. For this, we first define a representation framework based on the Spatial Role Labeling (SpRL) scheme, which we refer to as Rad-SpRL. In Rad-SpRL, common radiological entities tied to spatial relations are encoded through four spatial roles: Trajector, Landmark, Diagnosis, and Hedge, all identified in relation to a spatial preposition (or Spatial Indicator). We annotated a total of 2,000 chest X-ray reports following Rad-SpRL. We then propose a deep learning-based natural language processing (NLP) method involving word and character-level encodings to first extract the Spatial Indicators followed by identifying the corresponding spatial roles. Specifically, we use a bidirectional long short-term memory (Bi-LSTM) conditional random field (CRF) neural network as the baseline model. Additionally, we incorporate contextualized word representations from pre-trained language models (BERT and XLNet) for extracting the spatial information. We evaluate both gold and predicted Spatial Indicators to extract the four types of spatial roles. The results are promising, with the highest average F1 measure for Spatial Indicator extraction being 91.29 (XLNet); the highest average overall F1 measure considering all the four spatial roles being 92.9 using gold Indicators (XLNet); and 85.6 using predicted Indicators (BERT pre-trained on MIMIC notes). The corpus is available in Mendeley at http://dx.doi.org/10.17632/yhb26hfz8n.1 and https://github.com/krobertslab/datasets/blob/master/Rad-SpRL.xml.


Assuntos
Aprendizado Profundo , Radiologia , Idioma , Processamento de Linguagem Natural , Raios X
9.
BMC Med Inform Decis Mak ; 20(Suppl 4): 259, 2020 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-33317519

RESUMO

BACKGROUND: Previously, we introduced our Patient Health Information Dialogue Ontology (PHIDO) that manages the dialogue and contextual information of the session between an agent and a health consumer. In this study, we take the next step and introduce the Conversational Ontology Operator (COO), the software engine harnessing PHIDO. We also developed a question-answering subsystem called Frankenstein Ontology Question-Answering for User-centric Systems (FOQUS) to support the dialogue interaction. METHODS: We tested both the dialogue engine and the question-answering system using application-based competency questions and questions furnished from our previous Wizard of OZ simulation trials. RESULTS: Our results revealed that the dialogue engine is able to perform the core tasks of communicating health information and conversational flow. Inter-rater agreement and accuracy scores among four reviewers indicated perceived, acceptable responses to the questions asked by participants from the simulation studies, yet the composition of the responses was deemed mediocre by our evaluators. CONCLUSIONS: Overall, we present some preliminary evidence of a functioning ontology-based system to manage dialogue and consumer questions. Future plans for this work will involve deploying this system in a speech-enabled agent to assess its usage with potential health consumer users.


Assuntos
Comunicação , Vacinas , Humanos , Assistência Centrada no Paciente , Software , Vacinação
10.
BMC Bioinformatics ; 20(Suppl 21): 706, 2019 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-31865902

RESUMO

BACKGROUND: In the United States and parts of the world, the human papillomavirus vaccine uptake is below the prescribed coverage rate for the population. Some research have noted that dialogue that communicates the risks and benefits, as well as patient concerns, can improve the uptake levels. In this paper, we introduce an application ontology for health information dialogue called Patient Health Information Dialogue Ontology for patient-level human papillomavirus vaccine counseling and potentially for any health-related counseling. RESULTS: The ontology's class level hierarchy is segmented into 4 basic levels - Discussion, Goal, Utterance, and Speech Task. The ontology also defines core low-level utterance interaction for communicating human papillomavirus health information. We discuss the design of the ontology and the execution of the utterance interaction. CONCLUSION: With an ontology that represents patient-centric dialogue to communicate health information, we have an application-driven model that formalizes the structure for the communication of health information, and a reusable scaffold that can be integrated for software agents. Our next step will to be develop the software engine that will utilize the ontology and automate the dialogue interaction of a software agent.


Assuntos
Vacinas contra Papillomavirus , Aconselhamento , Feminino , Hospitais , Humanos , Infecções por Papillomavirus , Software
11.
J Biomed Inform ; 100: 103301, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31589927

RESUMO

OBJECTIVE: There is a lot of information about cancer in Electronic Health Record (EHR) notes that can be useful for biomedical research provided natural language processing (NLP) methods are available to extract and structure this information. In this paper, we present a scoping review of existing clinical NLP literature for cancer. METHODS: We identified studies describing an NLP method to extract specific cancer-related information from EHR sources from PubMed, Google Scholar, ACL Anthology, and existing reviews. Two exclusion criteria were used in this study. We excluded articles where the extraction techniques used were too broad to be represented as frames (e.g., document classification) and also where very low-level extraction methods were used (e.g. simply identifying clinical concepts). 78 articles were included in the final review. We organized this information according to frame semantic principles to help identify common areas of overlap and potential gaps. RESULTS: Frames were created from the reviewed articles pertaining to cancer information such as cancer diagnosis, tumor description, cancer procedure, breast cancer diagnosis, prostate cancer diagnosis and pain in prostate cancer patients. These frames included both a definition as well as specific frame elements (i.e. extractable attributes). We found that cancer diagnosis was the most common frame among the reviewed papers (36 out of 78), with recent work focusing on extracting information related to treatment and breast cancer diagnosis. CONCLUSION: The list of common frames described in this paper identifies important cancer-related information extracted by existing NLP techniques and serves as a useful resource for future researchers requiring cancer information extracted from EHR notes. We also argue, due to the heavy duplication of cancer NLP systems, that a general purpose resource of annotated cancer frames and corresponding NLP tools would be valuable.


Assuntos
Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Neoplasias , Semântica , Humanos , Neoplasias/diagnóstico , Neoplasias/terapia
12.
J Biomed Inform ; 92: 103138, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30825539

RESUMO

Electronic health record (EHR) data provide promising opportunities to explore personalized treatment regimes and to make clinical predictions. Compared with regular clinical data, EHR data are known for their irregularity and complexity. In addition, analyzing EHR data involves privacy issues and sharing such data is often infeasible among multiple research sites due to regulatory and other hurdles. A recently published work uses contextual embedding models and successfully builds one predictive model for more than seventy common diagnoses. Despite of the high predictive power, the model cannot be generalized to other institutions without sharing data. In this work, a novel method is proposed to learn from multiple databases and build predictive models based on Distributed Noise Contrastive Estimation (Distributed NCE). We use differential privacy to safeguard the intermediary information sharing. The numerical study with a real dataset demonstrates that the proposed method not only can build predictive models in a distributed manner with privacy protection, but also preserve model structure well and achieve comparable prediction accuracy. The proposed methods have been implemented as a stand-alone Python library and the implementation is available on Github (https://github.com/ziyili20/DistributedLearningPredictor) with installation instructions and use-cases.


Assuntos
Redes de Comunicação de Computadores , Registros Eletrônicos de Saúde , Aprendizado de Máquina , Diagnóstico por Computador , Humanos
13.
Prehosp Emerg Care ; 23(5): 712-717, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30626250

RESUMO

Introduction: Telehealth has been used nominally for trauma, neurological, and cardiovascular incidents in prehospital emergency medical services (EMS). Yet, much less is known about the use of telehealth for low-acuity primary care. We examine the development of one telehealth program and its impact on unnecessary ambulance transports. Objective: The objective of this study is to describe the development and impact of a large-scale telehealth program on ambulance transports. Methods: We describe the patient characteristics and results from a cohort of patients in Houston, Texas who received a prehospital telehealth consultation from an emergency medicine physician. Inclusion criteria were adults and pediatric patients with complaints considered to be non-urgent, primary care related. Data were analyzed for 36 months, from January 2015 through December 2017. Our primary dependent variable was the percentage of patients transported by ambulance. We used descriptive statistics to describe patient demographics, chi-square to examine differences between groups, and logistic regression to explore the effects with multivariate controls including age, gender, race, and chief complaint. Results: A total of 15,067 patients were enrolled (53% female; average age 44 years ± 19 years) over the three-year period. The 3 primary chief complaints were based on abdominal pains (13% of cases), nausea/vomiting/diarrhea (NVD) (9.4%), and back pain (9.3%). Ambulance transports represented 11.2% of all transports in the program, while alternative taxi transportation was used in 75.6%, and the remainder were self- or no-transports. Taxi transportation to an alternate, affiliated clinic (versus ED) was utilized in 5% of incidents. After multivariate controls, older age patients presenting with low-risk, non-acute chest pain, shortness of breath, and dizziness were much more likely to use ambulance transport. Race and gender were not significant predictors of ambulance transport. Conclusions: We found telehealth offers a technology strategy to address potentially unnecessary ambulance transports. Based on prior cost-effectiveness analyses, the reduction of unnecessary ambulance transports translates to an overall reduction in EMS agency costs. Telehealth programs offer a viable solution to support alternate destination and alternate transport programs.


Assuntos
Ambulâncias/estatística & dados numéricos , Serviços Médicos de Emergência/estatística & dados numéricos , Atenção Primária à Saúde , Telemedicina , Adulto , Idoso , Análise Custo-Benefício , Utilização de Instalações e Serviços , Feminino , Humanos , Modelos Logísticos , Masculino , Pessoa de Meia-Idade , Adulto Jovem
14.
J Biomed Inform ; 75S: S19-S27, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28602904

RESUMO

De-identification, or identifying and removing protected health information (PHI) from clinical data, is a critical step in making clinical data available for clinical applications and research. This paper presents a natural language processing system for automatic de-identification of psychiatric notes, which was designed to participate in the 2016 CEGS N-GRID shared task Track 1. The system has a hybrid structure that combines machine leaning techniques and rule-based approaches. The rule-based components exploit the structure of the psychiatric notes as well as characteristic surface patterns of PHI mentions. The machine learning components utilize supervised learning with rich features. In addition, the system performance was boosted with integration of additional data to the training set through domain adaptation. The hybrid system showed overall micro-averaged F-score 90.74 on the test set, second-best among all the participants of the CEGS N-GRID task.


Assuntos
Automação , Anonimização de Dados , Transtornos Mentais/psicologia , Processamento de Linguagem Natural , Humanos
15.
J Biomed Inform ; 75S: S129-S137, 2017 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-28624644

RESUMO

OBJECTIVE: Mental health is becoming an increasingly important topic in healthcare. Psychiatric symptoms, which consist of subjective descriptions of the patient's experience, as well as the nature and severity of mental disorders, are critical to support the phenotypic classification for personalized prevention, diagnosis, and intervention of mental disorders. However, few automated approaches have been proposed to extract psychiatric symptoms from clinical text, mainly due to (a) the lack of annotated corpora, which are time-consuming and costly to build, and (b) the inherent linguistic difficulties that symptoms present as they are not well-defined clinical concepts like diseases. The goal of this study is to investigate techniques for recognizing psychiatric symptoms in clinical text without labeled data. Instead, external knowledge in the form of publicly available "seed" lists of symptoms is leveraged using unsupervised distributional representations. MATERIALS AND METHODS: First, psychiatric symptoms are collected from three online repositories of healthcare knowledge for consumers-MedlinePlus, Mayo Clinic, and the American Psychiatric Association-for use as seed terms. Candidate symptoms in psychiatric notes are automatically extracted using phrasal syntax patterns. In particular, the 2016 CEGS N-GRID challenge data serves as the psychiatric note corpus. Second, three corpora-psychiatric notes, psychiatric forum data, and MIMIC II-are adopted to generate distributional representations with paragraph2vec. Finally, semantic similarity between the distributional representations of the seed symptoms and candidate symptoms is calculated to assess the relevance of a phrase. Experiments were performed on a set of psychiatric notes from the CEGS N-GRID 2016 Challenge. RESULTS & CONCLUSION: Our method demonstrates good performance at extracting symptoms from an unseen corpus, including symptoms with no word overlap with the provided seed terms. Semantic similarity based on the distributional representation outperformed baseline methods. Our experiment yielded two interesting results. First, distributional representations built from social media data outperformed those built from clinical data. And second, the distributional representation model built from sentences resulted in better representations of phrases than the model built from phrase alone.


Assuntos
Transtornos Mentais/psicologia , Algoritmos , Humanos , Semântica
16.
J Biomed Inform ; 58 Suppl: S111-S119, 2015 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26122527

RESUMO

This paper describes a supervised machine learning approach for identifying heart disease risk factors in clinical text, and assessing the impact of annotation granularity and quality on the system's ability to recognize these risk factors. We utilize a series of support vector machine models in conjunction with manually built lexicons to classify triggers specific to each risk factor. The features used for classification were quite simple, utilizing only lexical information and ignoring higher-level linguistic information such as syntax and semantics. Instead, we incorporated high-quality data to train the models by annotating additional information on top of a standard corpus. Despite the relative simplicity of the system, it achieves the highest scores (micro- and macro-F1, and micro- and macro-recall) out of the 20 participants in the 2014 i2b2/UTHealth Shared Task. This system obtains a micro- (macro-) precision of 0.8951 (0.8965), recall of 0.9625 (0.9611), and F1-measure of 0.9276 (0.9277). Additionally, we perform a series of experiments to assess the value of the annotated data we created. These experiments show how manually-labeled negative annotations can improve information extraction performance, demonstrating the importance of high-quality, fine-grained natural language annotations.


Assuntos
Doença da Artéria Coronariana/epidemiologia , Mineração de Dados/métodos , Complicações do Diabetes/epidemiologia , Registros Eletrônicos de Saúde/organização & administração , Processamento de Linguagem Natural , Aprendizado de Máquina Supervisionado , Idoso , Estudos de Coortes , Comorbidade , Segurança Computacional , Confidencialidade , Doença da Artéria Coronariana/diagnóstico , Complicações do Diabetes/diagnóstico , Feminino , Humanos , Incidência , Estudos Longitudinais , Masculino , Maryland/epidemiologia , Pessoa de Meia-Idade , Narração , Reconhecimento Automatizado de Padrão/métodos , Medição de Risco/métodos , Vocabulário Controlado
17.
Ann Neurol ; 73(1): 120-8, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23225603

RESUMO

OBJECTIVE: To report a novel cell surface autoantigen of encephalitis that is a critical regulatory subunit of the Kv4.2 potassium channels. METHODS: Four patients with encephalitis of unclear etiology and antibodies with a similar pattern of neuropil brain immunostaining were selected for autoantigen characterization. Techniques included immunoprecipitation, mass spectrometry, cell-base experiments with Kv4.2 and several dipeptidyl-peptidase-like protein-6 (DPPX) plasmid constructs, and comparative brain immunostaining of wild-type and DPPX-null mice. RESULTS: Immunoprecipitation studies identified DPPX as the target autoantigen. A cell-based assay confirmed that all 4 patients, but not 210 controls, had DPPX antibodies. Symptoms included agitation, confusion, myoclonus, tremor, and seizures (1 case with prominent startle response). All patients had pleocytosis, and 3 had severe prodromal diarrhea of unknown etiology. Given that DPPX tunes up the Kv4.2 potassium channels (involved in somatodendritic signal integration and attenuation of dendritic back-propagation of action potentials), we determined the epitope distribution in DPPX, DPP10 (a protein homologous to DPPX), and Kv4.2. Patients' antibodies were found to be specific for DPPX, without reacting with DPP10 or Kv4.2. The unexplained diarrhea led to a demonstration of a robust expression of DPPX in the myenteric plexus, which strongly reacted with patients' antibodies. The course of neuropsychiatric symptoms was prolonged and often associated with relapses during decreasing immunotherapy. Long-term follow-up showed substantial improvement in 3 patients (1 was lost to follow-up). INTERPRETATION: Antibodies to DPPX are associated with a protracted encephalitis characterized by central nervous system hyperexcitability (agitation, myoclonus, tremor, seizures), pleocytosis, and frequent diarrhea at symptom onset. The disorder is potentially treatable with immunotherapy.


Assuntos
Autoanticorpos/biossíntese , Dipeptidil Peptidases e Tripeptidil Peptidases/imunologia , Encefalite/imunologia , Proteínas do Tecido Nervoso/imunologia , Canais de Potássio/imunologia , Canais de Potássio Shal/metabolismo , Idoso , Animais , Reações Antígeno-Anticorpo/imunologia , Autoanticorpos/química , Encefalite/enzimologia , Encefalite/patologia , Feminino , Células HEK293 , Humanos , Masculino , Camundongos , Camundongos Knockout , Pessoa de Meia-Idade , Canais de Potássio Shal/química , Canais de Potássio Shal/imunologia
18.
AMIA Jt Summits Transl Sci Proc ; 2024: 642-651, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38827077

RESUMO

The results of clinical trials are a valuable source of evidence for researchers, policy makers, and healthcare professionals. However, online trial registries do not always contain links to the publications that report on their results, instead requiring a time-consuming manual search. Here, we explored the application of pre-trained transformer-based language models to automatically identify result-reporting publications of cancer clinical trials by computing dense vectors and performing semantic search. Models were fine-tuned on text data from trial registry fields and article metadata using a contrastive learning approach. The best performing model was PubMedBERT, which achieved a mean average precision of 0.592 and ranked 70.3% of a trial's publications in the top 5 results when tested on the holdout test trials. Our results suggest that semantic search using embeddings from transformer models may be an effective approach to the task of linking trials to their publications.

19.
Digit Health ; 10: 20552076241228430, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38357587

RESUMO

Background: Risky health behaviors place an enormous toll on public health systems. While relapse prevention support is integrated with most behavior modification programs, the results are suboptimal. Recent advances in artificial intelligence (AI) applications provide us with unique opportunities to develop just-in-time adaptive behavior change solutions. Methods: In this study, we present an innovative framework, grounded in behavioral theory, and enhanced with social media sequencing and communications scenario builder to architect a conversational agent (CA) specialized in the prevention of relapses in the context of tobacco cessation. We modeled peer interaction data (n = 1000) using the taxonomy of behavior change techniques (BCTs) and speech act (SA) theory to uncover the socio-behavioral and linguistic context embedded within the online social discourse. Further, we uncovered the sequential patterns of BCTs and SAs from social conversations (n = 339,067). We utilized grounded theory-based techniques for extracting the scenarios that best describe individuals' needs and mapped them into the architecture of the virtual CA. Results: The frequently occurring sequential patterns for BCTs were comparison of behavior and feedback and monitoring; for SAs were directive and assertion. Five cravings-related scenarios describing users' needs as they deal with nicotine cravings were identified along with the kinds of behavior change constructs that are being elicited within those scenarios. Conclusions: AI-led virtual CAs focusing on behavior change need to employ data-driven and theory-linked approaches to address issues related to engagement, sustainability, and acceptance. The sequential patterns of theory and intent manifestations need to be considered when developing effective behavior change CAs.

20.
J Allergy Clin Immunol Glob ; 3(2): 100224, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38439946

RESUMO

Background: There are now approximately 450 discrete inborn errors of immunity (IEI) described; however, diagnostic rates remain suboptimal. Use of structured health record data has proven useful for patient detection but may be augmented by natural language processing (NLP). Here we present a machine learning model that can distinguish patients from controls significantly in advance of ultimate diagnosis date. Objective: We sought to create an NLP machine learning algorithm that could identify IEI patients early during the disease course and shorten the diagnostic odyssey. Methods: Our approach involved extracting a large corpus of IEI patient clinical-note text from a major referral center's electronic health record (EHR) system and a matched control corpus for comparison. We built text classifiers with simple machine learning methods and trained them on progressively longer time epochs before date of diagnosis. Results: The top performing NLP algorithm effectively distinguished cases from controls robustly 36 months before ultimate clinical diagnosis (area under precision recall curve > 0.95). Corpus analysis demonstrated that statistically enriched, IEI-relevant terms were evident 24+ months before diagnosis, validating that clinical notes can provide a signal for early prediction of IEI. Conclusion: Mining EHR notes with NLP holds promise for improving early IEI patient detection.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA