Pesquisa | Biblioteca Virtual em Saúde

1.

KEBLM: Knowledge-Enhanced Biomedical Language Models.

Lai, Tuan Manh; Zhai, ChengXiang; Ji, Heng.

J Biomed Inform ; 143: 104392, 2023 07.

Artigo em Inglês | MEDLINE | ID: mdl-37211194

RESUMO

Pretrained language models (PLMs) have demonstrated strong performance on many natural language processing (NLP) tasks. Despite their great success, these PLMs are typically pretrained only on unstructured free texts without leveraging existing structured knowledge bases that are readily available for many domains, especially scientific domains. As a result, these PLMs may not achieve satisfactory performance on knowledge-intensive tasks such as biomedical NLP. Comprehending a complex biomedical document without domain-specific knowledge is challenging, even for humans. Inspired by this observation, we propose a general framework for incorporating various types of domain knowledge from multiple sources into biomedical PLMs. We encode domain knowledge using lightweight adapter modules, bottleneck feed-forward networks that are inserted into different locations of a backbone PLM. For each knowledge source of interest, we pretrain an adapter module to capture the knowledge in a self-supervised way. We design a wide range of self-supervised objectives to accommodate diverse types of knowledge, ranging from entity relations to description sentences. Once a set of pretrained adapters is available, we employ fusion layers to combine the knowledge encoded within these adapters for downstream tasks. Each fusion layer is a parameterized mixer of the available trained adapters that can identify and activate the most useful adapters for a given input. Our method diverges from prior work by including a knowledge consolidation phase, during which we teach the fusion layers to effectively combine knowledge from both the original PLM and newly-acquired external knowledge using a large collection of unannotated texts. After the consolidation phase, the complete knowledge-enhanced model can be fine-tuned for any downstream task of interest to achieve optimal performance. Extensive experiments on many biomedical NLP datasets show that our proposed framework consistently improves the performance of the underlying PLMs on various downstream tasks such as natural language inference, question answering, and entity linking. These results demonstrate the benefits of using multiple sources of external knowledge to enhance PLMs and the effectiveness of the framework for incorporating knowledge into PLMs. While primarily focused on the biomedical domain in this work, our framework is highly adaptable and can be easily applied to other domains, such as the bioenergy sector.

Assuntos

Idioma , Processamento de Linguagem Natural , Humanos , Bases de Conhecimento , Software

2.

Exploring collaborative caption editing to augment video-based learning.

Bhavya, Bhavya; Chen, Si; Zhang, Zhilin; Li, Wenting; Zhai, Chengxiang; Angrave, Lawrence; Huang, Yun.

Educ Technol Res Dev ; 70(5): 1755-1779, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35855355

RESUMO

Captions play a major role in making educational videos accessible to all and are known to benefit a wide range of learners. However, many educational videos either do not have captions or have inaccurate captions. Prior work has shown the benefits of using crowdsourcing to obtain accurate captions in a cost-efficient way, though there is a lack of understanding of how learners edit captions of educational videos either individually or collaboratively. In this work, we conducted a user study where 58 learners (in a course of 387 learners) participated in the editing of captions in 89 lecture videos that were generated by Automatic Speech Recognition (ASR) technologies. For each video, different learners conducted two rounds of editing. Based on editing logs, we created a taxonomy of errors in educational video captions (e.g., Discipline-Specific, General, Equations). From the interviews, we identified individual and collaborative error editing strategies. We then further demonstrated the feasibility of applying machine learning models to assist learners in editing. Our work provides practical implications for advancing video-based learning and for educational video caption editing.

3.

Big Data: Astronomical or Genomical?

Stephens, Zachary D; Lee, Skylar Y; Faghri, Faraz; Campbell, Roy H; Zhai, Chengxiang; Efron, Miles J; Iyer, Ravishankar; Schatz, Michael C; Sinha, Saurabh; Robinson, Gene E.

PLoS Biol ; 13(7): e1002195, 2015 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-26151137

RESUMO

Genomics is a Big Data science and is going to get much bigger, very soon, but it is not known whether the needs of genomics will exceed other Big Data domains. Projecting to the year 2025, we compared genomics with three other major generators of Big Data: astronomy, YouTube, and Twitter. Our estimates show that genomics is a "four-headed beast"--it is either on par with or the most demanding of the domains analyzed here in terms of data acquisition, storage, distribution, and analysis. We discuss aspects of new technologies that will need to be developed to rise up and meet the computational challenges that genomics poses for the near future. Now is the time for concerted, community-wide planning for the "genomical" challenges of the next decade.

Assuntos

Genômica/tendências , Astronomia/tendências , Armazenamento e Recuperação da Informação , Mídias Sociais/tendências , Estatística como Assunto

4.

An Online Risk Index for the Cross-Sectional Prediction of New HIV Chlamydia, and Gonorrhea Diagnoses Across U.S. Counties and Across Years.

Chan, Man-Pui Sally; Lohmann, Sophie; Morales, Alex; Zhai, Chengxiang; Ungar, Lyle; Holtgrave, David R; Albarracín, Dolores.

AIDS Behav ; 22(7): 2322-2333, 2018 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-29427233

RESUMO

The present study evaluated the potential use of Twitter data for providing risk indices of STIs. We developed online risk indices (ORIs) based on tweets to predict new HIV, gonorrhea, and chlamydia diagnoses, across U.S. counties and across 5 years. We analyzed over one hundred million tweets from 2009 to 2013 using open-vocabulary techniques and estimated the ORIs for a particular year by entering tweets from the same year into multiple semantic models (one for each year). The ORIs were moderately to strongly associated with the actual rates (.35 < rs < .68 for 93% of models), both nationwide and when applied to single states (California, Florida, and New York). Later models were slightly better than older ones at predicting gonorrhea and chlamydia, but not at predicting HIV. The proposed technique using free social media data provides signals of community health at a high temporal and spatial resolution.

Assuntos

Big Data , Infecções por Chlamydia/epidemiologia , Gonorreia/epidemiologia , Infecções por HIV/epidemiologia , Mídias Sociais , California/epidemiologia , Infecções por Chlamydia/diagnóstico , Estudos Transversais , Florida/epidemiologia , Gonorreia/diagnóstico , HIV , Infecções por HIV/diagnóstico , Humanos , New York/epidemiologia , Saúde Pública , Medição de Risco , Infecções Sexualmente Transmissíveis/epidemiologia , Estados Unidos/epidemiologia

5.

DeepMeSH: deep semantic representation for improving large-scale MeSH indexing.

Peng, Shengwen; You, Ronghui; Wang, Hongning; Zhai, Chengxiang; Mamitsuka, Hiroshi; Zhu, Shanfeng.

Bioinformatics ; 32(12): i70-i79, 2016 06 15.

Artigo em Inglês | MEDLINE | ID: mdl-27307646

RESUMO

MOTIVATION: Medical Subject Headings (MeSH) indexing, which is to assign a set of MeSH main headings to citations, is crucial for many important tasks in biomedical text mining and information retrieval. Large-scale MeSH indexing has two challenging aspects: the citation side and MeSH side. For the citation side, all existing methods, including Medical Text Indexer (MTI) by National Library of Medicine and the state-of-the-art method, MeSHLabeler, deal with text by bag-of-words, which cannot capture semantic and context-dependent information well. METHODS: We propose DeepMeSH that incorporates deep semantic information for large-scale MeSH indexing. It addresses the two challenges in both citation and MeSH sides. The citation side challenge is solved by a new deep semantic representation, D2V-TFIDF, which concatenates both sparse and dense semantic representations. The MeSH side challenge is solved by using the 'learning to rank' framework of MeSHLabeler, which integrates various types of evidence generated from the new semantic representation. RESULTS: DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI, for BioASQ3 challenge data with 6000 citations. AVAILABILITY AND IMPLEMENTATION: The software is available upon request. CONTACT: zhusf@fudan.edu.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Medical Subject Headings , Semântica , Software , Indexação e Redação de Resumos , Mineração de Dados , MEDLINE , National Library of Medicine (U.S.) , Estados Unidos

6.

Exploiting ontology graph for predicting sparsely annotated gene function.

Wang, Sheng; Cho, Hyunghoon; Zhai, ChengXiang; Berger, Bonnie; Peng, Jian.

Bioinformatics ; 31(12): i357-64, 2015 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-26072504

RESUMO

MOTIVATION: Systematically predicting gene (or protein) function based on molecular interaction networks has become an important tool in refining and enhancing the existing annotation catalogs, such as the Gene Ontology (GO) database. However, functional labels with only a few (<10) annotated genes, which constitute about half of the GO terms in yeast, mouse and human, pose a unique challenge in that any prediction algorithm that independently considers each label faces a paucity of information and thus is prone to capture non-generalizable patterns in the data, resulting in poor predictive performance. There exist a variety of algorithms for function prediction, but none properly address this 'overfitting' issue of sparsely annotated functions, or do so in a manner scalable to tens of thousands of functions in the human catalog. RESULTS: We propose a novel function prediction algorithm, clusDCA, which transfers information between similar functional labels to alleviate the overfitting problem for sparsely annotated functions. Our method is scalable to datasets with a large number of annotations. In a cross-validation experiment in yeast, mouse and human, our method greatly outperformed previous state-of-the-art function prediction algorithms in predicting sparsely annotated functions, without sacrificing the performance on labels with sufficient information. Furthermore, we show that our method can accurately predict genes that will be assigned a functional label that has no known annotations, based only on the ontology graph structure and genes associated with other labels, which further suggests that our method effectively utilizes the similarity between gene functions. AVAILABILITY AND IMPLEMENTATION: https://github.com/wangshenguiuc/clusDCA.

Assuntos

Algoritmos , Biologia Computacional/métodos , Ontologia Genética , Anotação de Sequência Molecular , Proteínas/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Animais , Redes Reguladoras de Genes , Humanos , Camundongos , Proteínas/genética , Proteínas de Saccharomyces cerevisiae/genética , Vocabulário Controlado

7.

MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence.

Liu, Ke; Peng, Shengwen; Wu, Junqiu; Zhai, Chengxiang; Mamitsuka, Hiroshi; Zhu, Shanfeng.

Bioinformatics ; 31(12): i339-47, 2015 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-26072501

RESUMO

MOTIVATION: Medical Subject Headings (MeSHs) are used by National Library of Medicine (NLM) to index almost all citations in MEDLINE, which greatly facilitates the applications of biomedical information retrieval and text mining. To reduce the time and financial cost of manual annotation, NLM has developed a software package, Medical Text Indexer (MTI), for assisting MeSH annotation, which uses k-nearest neighbors (KNN), pattern matching and indexing rules. Other types of information, such as prediction by MeSH classifiers (trained separately), can also be used for automatic MeSH annotation. However, existing methods cannot effectively integrate multiple evidence for MeSH annotation. METHODS: We propose a novel framework, MeSHLabeler, to integrate multiple evidence for accurate MeSH annotation by using 'learning to rank'. Evidence includes numerous predictions from MeSH classifiers, KNN, pattern matching, MTI and the correlation between different MeSH terms, etc. Each MeSH classifier is trained independently, and thus prediction scores from different classifiers are incomparable. To address this issue, we have developed an effective score normalization procedure to improve the prediction accuracy. RESULTS: MeSHLabeler won the first place in Task 2A of 2014 BioASQ challenge, achieving the Micro F-measure of 0.6248 for 9,040 citations provided by the BioASQ challenge. Note that this accuracy is around 9.15% higher than 0.5724, obtained by MTI. AVAILABILITY AND IMPLEMENTATION: The software is available upon request.

Assuntos

Indexação e Redação de Resumos/métodos , Medical Subject Headings , Software , Algoritmos , Mineração de Dados , MEDLINE , Reprodutibilidade dos Testes

8.

BeeSpace Navigator: exploratory analysis of gene function using semantic indexing of biological literature.

Sen Sarma, Moushumi; Arcoleo, David; Khetani, Radhika S; Chee, Brant; Ling, Xu; He, Xin; Jiang, Jing; Mei, Qiaozhu; Zhai, ChengXiang; Schatz, Bruce.

Nucleic Acids Res ; 39(Web Server issue): W462-9, 2011 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-21558175

RESUMO

With the rapid decrease in cost of genome sequencing, the classification of gene function is becoming a primary problem. Such classification has been performed by human curators who read biological literature to extract evidence. BeeSpace Navigator is a prototype software for exploratory analysis of gene function using biological literature. The software supports an automatic analogue of the curator process to extract functions, with a simple interface intended for all biologists. Since extraction is done on selected collections that are semantically indexed into conceptual spaces, the curation can be task specific. Biological literature containing references to gene lists from expression experiments can be analyzed to extract concepts that are computational equivalents of a classification such as Gene Ontology, yielding discriminating concepts that differentiate gene mentions from other mentions. The functions of individual genes can be summarized from sentences in biological literature, to produce results resembling a model organism database entry that is automatically computed. Statistical frequency analysis based on literature phrase extraction generates offline semantic indexes to support these gene function services. The website with BeeSpace Navigator is free and open to all; there is no login requirement at www.beespace.illinois.edu for version 4. Materials from the 2010 BeeSpace Software Training Workshop are available at www.beespace.illinois.edu/bstwmaterials.php.

Assuntos

Indexação e Redação de Resumos/métodos , Genes , Software , Animais , Internet , MEDLINE

9.

BSQA: integrated text mining using entity relation semantics extracted from biological literature of insects.

He, Xin; Li, Yanen; Khetani, Radhika; Sanders, Barry; Lu, Yue; Ling, Xu; Zhai, Chengxiang; Schatz, Bruce.

Nucleic Acids Res ; 38(Web Server issue): W175-81, 2010 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-20576702

RESUMO

Text mining is one promising way of extracting information automatically from the vast biological literature. To maximize its potential, the knowledge encoded in the text should be translated to some semantic representation such as entities and relations, which could be analyzed by machines. But large-scale practical systems for this purpose are rare. We present BeeSpace question/answering (BSQA) system that performs integrated text mining for insect biology, covering diverse aspects from molecular interactions of genes to insect behavior. BSQA recognizes a number of entities and relations in Medline documents about the model insect, Drosophila melanogaster. For any text query, BSQA exploits entity annotation of retrieved documents to identify important concepts in different categories. By utilizing the extracted relations, BSQA is also able to answer many biologically motivated questions, from simple ones such as, which anatomical part is a gene expressed in, to more complex ones involving multiple types of relations. BSQA is freely available at http://www.beespace.uiuc.edu/QuestionAnswer.

Assuntos

Mineração de Dados , Genes de Insetos , Insetos/genética , Software , Animais , Comportamento Animal , Proteínas de Drosophila , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Drosophila melanogaster/fisiologia , Regulação da Expressão Gênica , Proteínas de Homeodomínio/genética , Proteínas de Homeodomínio/metabolismo , Insetos/metabolismo , Internet , Integração de Sistemas , Transativadores/genética , Transativadores/metabolismo

10.

Integer Linear Programming for Constrained Multi-Aspect Committee Review Assignment.

Karimzadehgan, Maryam; Zhai, Chengxiang.

Inf Process Manag ; 48(4): 725-740, 2012 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-22711970

RESUMO

Automatic review assignment can significantly improve the productivity of many people such as conference organizers, journal editors and grant administrators. A general setup of the review assignment problem involves assigning a set of reviewers on a committee to a set of documents to be reviewed under the constraint of review quota so that the reviewers assigned to a document can collectively cover multiple topic aspects of the document. No previous work has addressed such a setup of committee review assignments while also considering matching multiple aspects of topics and expertise. In this paper, we tackle the problem of committee review assignment with multi-aspect expertise matching by casting it as an integer linear programming problem. The proposed algorithm can naturally accommodate any probabilistic or deterministic method for modeling multiple aspects to automate committee review assignments. Evaluation using a multi-aspect review assignment test set constructed using ACM SIGIR publications shows that the proposed algorithm is effective and efficient for committee review assignments based on multi-aspect expertise matching.

11.

Estimating the influence of Twitter on pre-exposure prophylaxis use and HIV testing as a function of rates of men who have sex with men in the United States.

Chan, Man-Pui Sally; Morales, Alex; Zlotorzynska, Maria; Sullivan, Patrick; Sanchez, Travis; Zhai, Chengxiang; Albarracín, Dolores.

AIDS ; 35(Suppl 1): S101-S109, 2021 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-33867493

RESUMO

OBJECTIVES: Acceptance of pre-exposure prophylaxis (PrEP) and testing for HIV is likely to vary as a function of the norms and communications within a geographic area. This study examined associations involving county tweets, in person communications, and HIV prevention and testing in regions with higher (vs. lower) estimated rates of men who have sex with men (MSM). DESIGN AND METHODS: Ecological analyses examined (a) tweets about HIV (i.e. tweet rates per 100â000 county population and topic probabilities in 1959 US counties); (b) individual-level survey data about HIV prevention and testing and communications about PrEP and HIV (Nâ=â30â675 participants); and (c) estimated county-level MSM rates (per 1â000 adult men). RESULTS: In counties with higher rates of MSM, tweet rates were directly associated with PrEP use and HIV testing (rsâ=â.06, BF10â>â10). Topics correlated with PrEP use (rsâ=â-0.06 to 0.07, BF10â>â10) and HIV testing (rsâ=â-0.05 to 0.05, BF10â>â10). Mediation analyses showed that hearing about and discussing PrEP mediated the relations between tweet rates and PrEP use (bi∗â=â0.01-0.05, BF10â>â100) and between topics and PrEP use (bi∗â=â-0.04- 0.05, BF10â>â10). Moreover, hearing about PrEP was associated with PrEP use, which was in turn associated with tweet rates (bi∗â=â0.01, BF10â>â100) and topics (bi∗â=â-0.03 - 0.01, BF10â>â10). CONCLUSIONS: Rates of MSM appear to lead to HIV tweets in a region, in person communications about PrEP, and, ultimately, actual PrEP use. Also, as more men hear about PrEP, they may use PrEP more and may tweet about HIV.

Assuntos

Fármacos Anti-HIV , Infecções por HIV , Profilaxia Pré-Exposição , Minorias Sexuais e de Gênero , Mídias Sociais , Adulto , Fármacos Anti-HIV/uso terapêutico , Infecções por HIV/diagnóstico , Infecções por HIV/tratamento farmacológico , Infecções por HIV/prevenção & controle , Teste de HIV , Homossexualidade Masculina , Humanos , Masculino , Estados Unidos

12.

Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model.

He, Xin; Sarma, Moushumi Sen; Ling, Xu; Chee, Brant; Zhai, Chengxiang; Schatz, Bruce.

BMC Bioinformatics ; 11: 272, 2010 May 20.

Artigo em Inglês | MEDLINE | ID: mdl-20487560

RESUMO

BACKGROUND: Large-scale genomic studies often identify large gene lists, for example, the genes sharing the same expression patterns. The interpretation of these gene lists is generally achieved by extracting concepts overrepresented in the gene lists. This analysis often depends on manual annotation of genes based on controlled vocabularies, in particular, Gene Ontology (GO). However, the annotation of genes is a labor-intensive process; and the vocabularies are generally incomplete, leaving some important biological domains inadequately covered. RESULTS: We propose a statistical method that uses the primary literature, i.e. free-text, as the source to perform overrepresentation analysis. The method is based on a statistical framework of mixture model and addresses the methodological flaws in several existing programs. We implemented this method within a literature mining system, BeeSpace, taking advantage of its analysis environment and added features that facilitate the interactive analysis of gene sets. Through experimentation with several datasets, we showed that our program can effectively summarize the important conceptual themes of large gene sets, even when traditional GO-based analysis does not yield informative results. CONCLUSIONS: We conclude that the current work will provide biologists with a tool that effectively complements the existing ones for overrepresentation analysis from genomic experiments. Our program, Genelist Analyzer, is freely available at: http://workerbee.igb.uiuc.edu:8080/BeeSpace/Search.jsp.

Assuntos

Perfilação da Expressão Gênica/métodos , Modelos Estatísticos , Biologia Computacional , Genes

13.

Biosystems Design by Machine Learning.

Volk, Michael Jeffrey; Lourentzou, Ismini; Mishra, Shekhar; Vo, Lam Tung; Zhai, Chengxiang; Zhao, Huimin.

ACS Synth Biol ; 9(7): 1514-1533, 2020 07 17.

Artigo em Inglês | MEDLINE | ID: mdl-32485108

RESUMO

Biosystems such as enzymes, pathways, and whole cells have been increasingly explored for biotechnological applications. However, the intricate connectivity and resulting complexity of biosystems poses a major hurdle in designing biosystems with desirable features. As -omics and other high throughput technologies have been rapidly developed, the promise of applying machine learning (ML) techniques in biosystems design has started to become a reality. ML models enable the identification of patterns within complicated biological data across multiple scales of analysis and can augment biosystems design applications by predicting new candidates for optimized performance. ML is being used at every stage of biosystems design to help find nonobvious engineering solutions with fewer design iterations. In this review, we first describe commonly used models and modeling paradigms within ML. We then discuss some applications of these models that have already shown success in biotechnological applications. Moreover, we discuss successful applications at all scales of biosystems design, including nucleic acids, genetic circuits, proteins, pathways, genomes, and bioprocesses. Finally, we discuss some limitations of these methods and potential solutions as well as prospects of the combination of ML and biosystems design.

Assuntos

Biotecnologia , Aprendizado de Máquina , Proteínas , Edição de Genes , Redes Reguladoras de Genes , Modelos Lineares , Engenharia Metabólica , Proteínas/química , Proteínas/metabolismo

14.

Multi-label literature classification based on the Gene Ontology graph.

Jin, Bo; Muller, Brian; Zhai, Chengxiang; Lu, Xinghua.

BMC Bioinformatics ; 9: 525, 2008 Dec 08.

Artigo em Inglês | MEDLINE | ID: mdl-19063730

RESUMO

BACKGROUND: The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. RESULTS: In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. CONCLUSION: Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on the literature.

Assuntos

Biologia Computacional/métodos , Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Software , Vocabulário Controlado , Algoritmos , Inteligência Artificial , Teorema de Bayes , Internet , PubMed , Reprodutibilidade dos Testes , Semântica

15.

VisAGE: Integrating external knowledge into electronic medical record visualization.

Huang, Edward W; Wang, Sheng; Zhai, ChengXiang.

Pac Symp Biocomput ; 23: 578-589, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29218916

RESUMO

In this paper, we present VisAGE, a method that visualizes electronic medical records (EMRs) in a low-dimensional space. Effective visualization of new patients allows doctors to view similar, previously treated patients and to identify the new patients' disease subtypes, reducing the chance of misdiagnosis. However, EMRs are typically incomplete or fragmented, resulting in patients who are missing many available features being placed near unrelated patients in the visualized space. VisAGE integrates several external data sources to enrich EMR databases to solve this issue. We evaluated VisAGE on a dataset of Parkinson's disease patients. We qualitatively and quantitatively show that VisAGE can more effectively cluster patients, which allows doctors to better discover patient subtypes and thus improve patient care.

Assuntos

Registros Eletrônicos de Saúde/estatística & dados numéricos , Algoritmos , Biologia Computacional/métodos , Gráficos por Computador/estatística & dados numéricos , Bases de Dados Factuais/estatística & dados numéricos , Progressão da Doença , Reações Falso-Positivas , Feminino , Humanos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Bases de Conhecimento , Masculino , Doença de Parkinson/tratamento farmacológico , Doença de Parkinson/etiologia , Polimorfismo de Nucleotídeo Único , Mapas de Interação de Proteínas

16.

Who is Saying What on Twitter: An Analysis of Messages with References to HIV and HIV Risk Behavior.

Lohmann, Sophie; Lourentzou, Ismini; Zhai, Chengxiang; Albarracín, Dolores.

Acta Investig Psicol ; 8(1): 95-100, 2018 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-31105910

RESUMO

This research aimed to determine the nature of social media discussions about HIV. With the goal of conducting a descriptive analysis, we collected almost 1,000 tweets posted February to September 2015. The sample of tweets included keywords related to HIV or behavioral risk factors (e.g., sex, drug use) and was coded for content (e.g., HIV), behavior change strategies, and message source. Seven percent of tweets concerned HIV/AIDS, which were often referred to as jokes or insults. The majority of tweets coded as behavior change attempts involved attitude change strategies. The majority of the tweets (80%) came from private users (vs. organizations). Different types of sources employed different types of behavior change strategies: For instance, private users, compared to experts or organizations, included more strategies to decrease detrimental attitudes (29% versus 6%, p < .001), and also more strategies to counter myths and misinformation (6% versus 1%, p = .008). In summary, tweets related to HIV/AIDS and associated risk factors frequently use the terms in jokes and insults, come largely from private users, and entail attitudinal and informational strategies. Online health campaigns with clear calls to action and corrections of misinformation may make important contributions to social media conversations about HIV/AIDS.

Esta investigación tuvo el objectivo de caracterizar las discusiones sobre VIH en los medios sociales. Con el objetivo de realizar un análisis descriptivo, recogimos alrededor de mil tweets entre febrero y septiembre del 2015. Estos tweets fueron seleccionados si incluían palabras claves relacionadas con el VIH o con factores de riesgo conductual tales como sexo o uso de drogas. Cuatro codificadores clasificaron los tweets en función del contenido (e.g., el VIH como enfermedad, referido a un product o servicio), la estrategia de cambio conductual (cambio conductual, llamada a la acción, o corrección de mitos), y la fuente del mensaje (e.g., usuarios privados, expertos, empresas comerciales). La mayoría de los tweets (80%) provenía de usuarios privados en lugar de institucionales. El 7% de los tweets se refería estrictamente al VIH u otras infecciones de transmisión sexual, frecuentemente utilizando esos términos como bromas o insultos, tales como escribir que una experiencia displacentera "me dio SIDA". La mayoría de los intentos de cambio conductual incluía estrategias de reducción de actitudes negativas. Fuentes de distintos tipos empleaban estrategias de cambio conductual de distintos tipos. Por ejemplo, usuarios privados (comparados con expertos, organizaciones comerciales, y otras organizaciones, tal como periódicos y ONGs), publicaban más mesajes clasificados como estrategias de promoción de actitudes negativas (29% versus 6%, p < .001), y tenían más correcciones de mitos (6% versus 1%, p = .008). En resumen, los tweets que mencionan el VIH o factores de riesgo de VIH utilizan los términos en bromas e insultos con gran frecuencia, provienen mayormente de usuarios privados, e incluyen estrategias de cambio de actitud. Las campañas de Internet con llamadas claras a la acción y con correcciones de mitos pueden hacer contribuciones importantes a las conversaciones sobre VIH en los medios sociales.

17.

HIV messaging on Twitter: an analysis of current practice and data-driven recommendations.

Lohmann, Sophie; White, Benjamin X; Zuo, Zhen; Chan, Man-Pui Sally; Morales, Alex; Li, Bo; Zhai, Chengxiang; Albarracín, Dolores.

AIDS ; 32(18): 2799-2805, 2018 11 28.

Artigo em Inglês | MEDLINE | ID: mdl-30289801

RESUMO

OBJECTIVES: Social media messages have been increasingly used in health campaigns about prevention, testing, and treatment of HIV. We identified factors leading to the retransmission of messages from expert social media accounts to create data-driven recommendations for online HIV messaging. DESIGN AND METHODS: We sampled 20â201 HIV-related tweets (posted between 2010 and 2017) from 37 HIV experts. Potential predictors of retransmission were identified based on prior literature and machine learning methods, and were subsequently analyzed using multilevel negative binomial models. RESULTS: Fear-related language, longer messages, and including images (e.g. photos, gif, or videos) were the strongest predictors of retweet counts. These findings were similar for messages authored by HIV experts, and also messages retransmitted by experts, but created by nonexperts (e.g. celebrities or politicians). CONCLUSIONS: Fear appeals affect how much HIV messages spread on Twitter, as do structural characteristics, like the length of the tweet and inclusion of images. A set of five data-driven recommendations for increasing message spread is derived and discussed in the context of current centers for disease control and prevention social media guidelines.

Assuntos

Terapia Comportamental/métodos , Transmissão de Doença Infecciosa/prevenção & controle , Infecções por HIV/prevenção & controle , Educação em Saúde/métodos , Mídias Sociais , Infecções por HIV/diagnóstico , Humanos

18.

Framing Electronic Medical Records as Polylingual Documents in Query Expansion.

Huang, Edward W; Wang, Sheng; Lee, Doris Jung-Lin; Zhang, Runshun; Liu, Baoyan; Zhou, Xuezhong; Zhai, ChengXiang.

AMIA Annu Symp Proc ; 2017: 940-949, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-29854161

RESUMO

We present a study of electronic medical record (EMR) retrieval that emulates situations in which a doctor treats a new patient. Given a query consisting of a new patient's symptoms, the retrieval system returns the set of most relevant records of previously treated patients. However, due to semantic, functional, and treatment synonyms in medical terminology, queries are often incomplete and thus require enhancement. In this paper, we present a topic model that frames symptoms and treatments as separate languages. Our experimental results show that this method improves retrieval performance over several baselines with statistical significance. These baselines include methods used in prior studies as well as state-of-the-art embedding techniques. Finally, we show that our proposed topic model discovers all three types of synonyms to improve medical record retrieval.

Assuntos

Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Multilinguismo , Humanos , Processamento de Linguagem Natural , Semântica , Terminologia como Assunto

19.

Enhancing text categorization with semantic-enriched representation and training data augmentation.

Lu, Xinghua; Zheng, Bin; Velivelli, Atulya; Zhai, Chengxiang.

J Am Med Inform Assoc ; 13(5): 526-35, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-16799127

RESUMO

OBJECTIVE: Acquiring and representing biomedical knowledge is an increasingly important component of contemporary bioinformatics. A critical step of the process is to identify and retrieve relevant documents among the vast volume of modern biomedical literature efficiently. In the real world, many information retrieval tasks are difficult because of high data dimensionality and the lack of annotated examples to train a retrieval algorithm. Under such a scenario, the performance of information retrieval algorithms is often unsatisfactory, therefore improvements are needed. DESIGN: We studied two approaches that enhance the text categorization performance on sparse and high data dimensionality: (1) semantic-preserving dimension reduction by representing text with semantic-enriched features; and (2) augmenting training data with semi-supervised learning. A probabilistic topic model was applied to extract major semantic topics from a corpus of text of interest. The representation of documents was projected from the high-dimensional vocabulary space onto a semantic topic space with reduced dimensionality. A semi-supervised learning algorithm based on graph theory was applied to identify potential positive training cases, which were further used to augment training data. The effects of data transformation and augmentation on text categorization by support vector machine (SVM) were evaluated. RESULTS AND CONCLUSION: Semantic-enriched data transformation and the pseudo-positive-cases augmented training data enhance the efficiency and performance of text categorization by SVM.

Assuntos

Inteligência Artificial , Armazenamento e Recuperação da Informação/métodos , Algoritmos , Classificação , Processamento de Linguagem Natural , Curva ROC , Semântica , Vocabulário Controlado

20.

Understanding user intents in online health forums.

Zhang, Thomas; Cho, Jason H D; Zhai, Chengxiang.

IEEE J Biomed Health Inform ; 19(4): 1392-8, 2015 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-25823052

RESUMO

Online health forums provide a convenient way for patients to obtain medical information and connect with physicians and peers outside of clinical settings. However, large quantities of unstructured and diversified content generated on these forums make it difficult for users to digest and extract useful information. Understanding user intents would enable forums to find and recommend relevant information to users by filtering out threads that do not match particular intents. In this paper, we derive a taxonomy of intents to capture user information needs in online health forums and propose novel pattern-based features for use with a multiclass support vector machine (SVM) classifier to classify original thread posts according to their underlying intents. Since no dataset existed for this task, we employ three annotators to manually label a dataset of 1192 HealthBoards posts spanning four forum topics. Experimental results show that a SVM using pattern-based features is highly capable of identifying user intents in forum posts, reaching a maximum precision of 75%, and that a SVM-based hierarchical classifier using both pattern and word features outperforms its SVM counterpart that uses only word features. Furthermore, comparable classification performance can be achieved by training and testing on posts from different forum topics.

Assuntos

Troca de Informação em Saúde/classificação , Intenção , Internet , Máquina de Vetores de Suporte , Biologia Computacional , Humanos , Aprendizado de Máquina , Reconhecimento Automatizado de Padrão/métodos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA