Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 119
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-37610905

RESUMO

Given the overwhelming and rapidly increasing volumes of the published biomedical literature, automatic biomedical text summarization has long been a highly important task. Recently, great advances in the performance of biomedical text summarization have been facilitated by pre-trained language models (PLMs) based on fine-tuning. However, existing summarization methods based on PLMs do not capture domain-specific knowledge. This can result in generated summaries with low coherence, including redundant sentences, or excluding important domain knowledge conveyed in the full-text document. Furthermore, the black-box nature of the transformers means that they lack explainability, i.e. it is not clear to users how and why the summary was generated. The domain-specific knowledge and explainability are crucial for the accuracy and transparency of biomedical text summarization methods. In this article, we aim to address these issues by proposing a novel domain knowledge-enhanced graph topic transformer (DORIS) for explainable biomedical text summarization. The model integrates the graph neural topic model and the domain-specific knowledge from the Unified Medical Language System (UMLS) into the transformer-based PLM, to improve the explainability and accuracy. Experimental results on four biomedical literature datasets show that our model outperforms existing state-of-the-art (SOTA) PLM-based summarization methods on biomedical extractive summarization. Furthermore, our use of graph neural topic modeling means that our model possesses the desirable property of being explainable, i.e. it is straightforward for users to understand how and why the model selects particular sentences for inclusion in the summary. The domain-specific knowledge helps our model to learn more coherent topics, to better explain the performance.

2.
J Biomed Inform ; 141: 104347, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-37030658

RESUMO

Automatic extraction of patient medication histories from free-text clinical notes can increase the amount of relevant information to clinicians for developing treatment plans. In addition to detecting medication events, clinical text mining systems must also be able to predict event context, such as negation, uncertainty, and time of occurrence, in order to construct accurate patient timelines. Towards this goal, we introduce Levitated Context Markers (LCMs), a novel transformer-based model for contextualized event extraction. LCMs are an adaptation of levitated markers -originally developed for relation extraction- that allow pretrained transformer models to utilize global input representations while also focusing on event-related subspans using a sparse attention mechanism. In addition to outperforming a strong baseline model on the Contextualized Medication Event Dataset, we show that LCMs' sparse attention can provide interpretable predictions by detecting relevant context cues in an unsupervised manner.


Assuntos
Mineração de Dados , Registros , Humanos , Processamento de Linguagem Natural
3.
IEEE J Biomed Health Inform ; 27(2): 1096-1105, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36395134

RESUMO

Automatic extraction of relations between gene mutations and cancer entities occurring in the cancer literature using text mining can rapidly provide vital information to support precision cancer medicine. However, mutation-cancer relation extraction is more challenging than general relation extraction from free text, since it is often not possible without cancer-specific background knowledge and thus the model replies on a deeper understanding of complex surrounding tokens. We propose a deep learning model that jointly extracts mutations and their associated cancers. Background knowledge comes from two different knowledge bases which store different types of information about mutations. Given the different ways in which knowledge is stored in these two resources, we propose two separate methods for embedding knowledge, namely sentence-based knowledge integration and attribute-aware knowledge integration. The evaluation demonstrated that our model outperforms a number of baseline models and gains 96.00%, 92.57% and 94.57% F1 scores on three public datasets, EMU BCa, EMU PCa, and BRONCO, thus illustrating the effectiveness of our knowledge integration approach. The auxiliary experiments show that our models can utilize more informative text from the KBs and link the mutations to their corresponding cancer disease although the input text provides insufficient context.


Assuntos
Neoplasias , Humanos , Mutação/genética , Neoplasias/genética , Mineração de Dados/métodos
4.
J Med Internet Res ; 24(10): e40323, 2022 10 05.
Artigo em Inglês | MEDLINE | ID: mdl-36150046

RESUMO

BACKGROUND: In recent years, the COVID-19 pandemic has brought great changes to public health, society, and the economy. Social media provide a platform for people to discuss health concerns, living conditions, and policies during the epidemic, allowing policymakers to use this content to analyze the public emotions and attitudes for decision-making. OBJECTIVE: The aim of this study was to use deep learning-based methods to understand public emotions on topics related to the COVID-19 pandemic in the United Kingdom through a comparative geolocation and text mining analysis on Twitter. METHODS: Over 500,000 tweets related to COVID-19 from 48 different cities in the United Kingdom were extracted, with the data covering the period of the last 2 years (from February 2020 to November 2021). We leveraged three advanced deep learning-based models for topic modeling to geospatially analyze the sentiment, emotion, and topics of tweets in the United Kingdom: SenticNet 6 for sentiment analysis, SpanEmo for emotion recognition, and combined topic modeling (CTM). RESULTS: We observed a significant change in the number of tweets as the epidemiological situation and vaccination situation shifted over the 2 years. There was a sharp increase in the number of tweets from January 2020 to February 2020 due to the outbreak of COVID-19 in the United Kingdom. Then, the number of tweets gradually declined as of February 2020. Moreover, with identification of the COVID-19 Omicron variant in the United Kingdom in November 2021, the number of tweets grew again. Our findings reveal people's attitudes and emotions toward topics related to COVID-19. For sentiment, approximately 60% of tweets were positive, 20% were neutral, and 20% were negative. For emotion, people tended to express highly positive emotions in the beginning of 2020, while expressing highly negative emotions over time toward the end of 2021. The topics also changed during the pandemic. CONCLUSIONS: Through large-scale text mining of Twitter, our study found meaningful differences in public emotions and topics regarding the COVID-19 pandemic among different UK cities. Furthermore, efficient location-based and time-based comparative analysis can be used to track people's thoughts and feelings, and to understand their behaviors. Based on our analysis, positive attitudes were common during the pandemic; optimism and anticipation were the dominant emotions. With the outbreak and epidemiological change, the government developed control measures and vaccination policies, and the topics also shifted over time. Overall, the proportion and expressions of emojis, sentiments, emotions, and topics varied geographically and temporally. Therefore, our approach of exploring public emotions and topics on the pandemic from Twitter can potentially lead to informing how public policies are received in a particular geographical area.


Assuntos
COVID-19 , Mídias Sociais , COVID-19/epidemiologia , Mineração de Dados , Emoções , Humanos , Pandemias , SARS-CoV-2
5.
Artigo em Inglês | MEDLINE | ID: mdl-35886395

RESUMO

The evolution of the Exposome concept revolutionised the research in exposure assessment and epidemiology by introducing the need for a more holistic approach on the exploration of the relationship between the environment and disease. At the same time, further and more dramatic changes have also occurred on the working environment, adding to the already existing dynamic nature of it. Natural Language Processing (NLP) refers to a collection of methods for identifying, reading, extracting and untimely transforming large collections of language. In this work, we aim to give an overview of how NLP has successfully been applied thus far in Exposome research. METHODS: We conduct a literature search on PubMed, Scopus and Web of Science for scientific articles published between 2011 and 2021. We use both quantitative and qualitative methods to screen papers and provide insights into the inclusion and exclusion criteria. We outline our approach for article selection and provide an overview of our findings. This is followed by a more detailed insight into selected articles. RESULTS: Overall, 6420 articles were screened for the suitability of this review, where we review 37 articles in depth. Finally, we discuss future avenues of research and outline challenges in existing work. CONCLUSIONS: Our results show that (i) there has been an increase in articles published that focus on applying NLP to exposure and epidemiology research, (ii) most work uses existing NLP tools and (iii) traditional machine learning is the most popular approach.


Assuntos
Expossoma , Processamento de Linguagem Natural , Aprendizado de Máquina , Narração , PubMed
6.
BMC Bioinformatics ; 23(1): 211, 2022 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-35655127

RESUMO

BACKGROUND: Nested and overlapping events are particularly frequent and informative structures in biomedical event extraction. However, state-of-the-art neural models either neglect those structures during learning or use syntactic features and external tools to detect them. To overcome these limitations, this paper presents and compares two neural models: a novel EXhaustive Neural Network (EXNN) and a Search-Based Neural Network (SBNN) for detection of nested and overlapping events. RESULTS: We evaluate the proposed models as an event detection component in isolation and within a pipeline setting. Evaluation in several annotated biomedical event extraction datasets shows that both EXNN and SBNN achieve higher performance in detecting nested and overlapping events, compared to the state-of-the-art model Turku Event Extraction System (TEES). CONCLUSIONS: The experimental results reveal that both EXNN and SBNN are effective for biomedical event extraction. Furthermore, results on a pipeline setting indicate that our models improve detection of events compared to models that use either gold or predicted named entities.


Assuntos
Modelos Biológicos , Redes Neurais de Computação
7.
NPJ Digit Med ; 5(1): 46, 2022 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-35396451

RESUMO

Mental illness is highly prevalent nowadays, constituting a major cause of distress in people's life with impact on society's health and well-being. Mental illness is a complex multi-factorial disease associated with individual risk factors and a variety of socioeconomic, clinical associations. In order to capture these complex associations expressed in a wide variety of textual data, including social media posts, interviews, and clinical notes, natural language processing (NLP) methods demonstrate promising improvements to empower proactive mental healthcare and assist early diagnosis. We provide a narrative review of mental illness detection using NLP in the past decade, to understand methods, trends, challenges and future directions. A total of 399 studies from 10,467 records were included. The review reveals that there is an upward trend in mental illness detection NLP research. Deep learning methods receive more attention and perform better than traditional machine learning methods. We also provide some recommendations for future studies, including the development of novel detection methods, deep learning paradigms and interpretable models.

8.
Bioinformatics ; 38(3): 872-874, 2022 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-34636886

RESUMO

SUMMARY: Large-scale pre-trained language models (PLMs) have advanced state-of-the-art (SOTA) performance on various biomedical text mining tasks. The power of such PLMs can be combined with the advantages of deep generative models. These are examples of these combinations. However, they are trained only on general domain text, and biomedical models are still missing. In this work, we describe BioVAE, the first large-scale pre-trained latent variable language model for the biomedical domain, which uses the OPTIMUS framework to train on large volumes of biomedical text. The model shows SOTA performance on several biomedical text mining tasks when compared to existing publicly available biomedical PLMs. In addition, our model can generate more accurate biomedical sentences than the original OPTIMUS output. AVAILABILITY AND IMPLEMENTATION: Our source code and pre-trained models are freely available: https://github.com/aistairc/BioVAE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Mineração de Dados , Idioma , Software , Processamento de Linguagem Natural
9.
JAMIA Open ; 4(4): ooab104, 2021 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-34927002

RESUMO

The COVID-19 pandemic resulted in an unprecedented production of scientific literature spanning several fields. To facilitate navigation of the scientific literature related to various aspects of the pandemic, we developed an exploratory search system. The system is based on automatically identified technical terms, document citations, and their visualization, accelerating identification of relevant documents. It offers a multi-view interactive search and navigation interface, bringing together unsupervised approaches of term extraction and citation analysis. We conducted a user evaluation with domain experts, including epidemiologists, biochemists, medicinal chemists, and medicine students. In general, most users were satisfied with the relevance and speed of the search results. More interestingly, participants mostly agreed on the capacity of the system to enable exploration and discovery of the search space using the graph visualization and filters. The system is updated on a weekly basis and it is publicly available at http://www.nactem.ac.uk/cord/.

10.
JMIR Res Protoc ; 10(11): e29398, 2021 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-34847061

RESUMO

BACKGROUND: A barrier to practicing evidence-based medicine is the rapidly increasing body of biomedical literature. Use of method terms to limit the search can help reduce the burden of screening articles for clinical relevance; however, such terms are limited by their partial dependence on indexing terms and usually produce low precision, especially when high sensitivity is required. Machine learning has been applied to the identification of high-quality literature with the potential to achieve high precision without sacrificing sensitivity. The use of artificial intelligence has shown promise to improve the efficiency of identifying sound evidence. OBJECTIVE: The primary objective of this research is to derive and validate deep learning machine models using iterations of Bidirectional Encoder Representations from Transformers (BERT) to retrieve high-quality, high-relevance evidence for clinical consideration from the biomedical literature. METHODS: Using the HuggingFace Transformers library, we will experiment with variations of BERT models, including BERT, BioBERT, BlueBERT, and PubMedBERT, to determine which have the best performance in article identification based on quality criteria. Our experiments will utilize a large data set of over 150,000 PubMed citations from 2012 to 2020 that have been manually labeled based on their methodological rigor for clinical use. We will evaluate and report on the performance of the classifiers in categorizing articles based on their likelihood of meeting quality criteria. We will report fine-tuning hyperparameters for each model, as well as their performance metrics, including recall (sensitivity), specificity, precision, accuracy, F-score, the number of articles that need to be read before finding one that is positive (meets criteria), and classification probability scores. RESULTS: Initial model development is underway, with further development planned for early 2022. Performance testing is expected to star in February 2022. Results will be published in 2022. CONCLUSIONS: The experiments will aim to improve the precision of retrieving high-quality articles by applying a machine learning classifier to PubMed searching. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/29398.

11.
NPJ Syst Biol Appl ; 7(1): 38, 2021 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-34671039

RESUMO

Machine reading (MR) is essential for unlocking valuable knowledge contained in millions of existing biomedical documents. Over the last two decades1,2, the most dramatic advances in MR have followed in the wake of critical corpus development3. Large, well-annotated corpora have been associated with punctuated advances in MR methodology and automated knowledge extraction systems in the same way that ImageNet4 was fundamental for developing machine vision techniques. This study contributes six components to an advanced, named entity analysis tool for biomedicine: (a) a new, Named Entity Recognition Ontology (NERO) developed specifically for describing textual entities in biomedical texts, which accounts for diverse levels of ambiguity, bridging the scientific sublanguages of molecular biology, genetics, biochemistry, and medicine; (b) detailed guidelines for human experts annotating hundreds of named entity classes; (c) pictographs for all named entities, to simplify the burden of annotation for curators; (d) an original, annotated corpus comprising 35,865 sentences, which encapsulate 190,679 named entities and 43,438 events connecting two or more entities; (e) validated, off-the-shelf, named entity recognition (NER) automated extraction, and; (f) embedding models that demonstrate the promise of biomedical associations embedded within this corpus.

12.
Internet Interv ; 25: 100422, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34401381

RESUMO

Suicide is one of the leading causes of death worldwide. At the same time, the widespread use of social media has led to an increase in people posting their suicide notes online. Therefore, designing a learning model that can aid the detection of suicide notes online is of great importance. However, current methods cannot capture both local and global semantic features. In this paper, we propose a transformer-based model named TransformerRNN, which can effectively extract contextual and long-term dependency information by using a transformer encoder and a Bi-directional Long Short-Term Memory (BiLSTM) structure. We evaluate our model with baseline approaches on a dataset collected from online sources (including 659 suicide notes, 431 last statements, and 2000 neutral posts). Our proposed TransformerRNN achieves 95.0%, 94.9% and 94.9% performance in P, R and F1-score metrics respectively and therefore outperforms comparable machine learning and state-of-the-art deep learning models. The proposed model is effective for classifying suicide notes, which in turn, may help to develop suicide prevention technologies for social media.

13.
Neurocomputing (Amst) ; 413: 431-443, 2020 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-33162674

RESUMO

Most deep language understanding models depend only on word representations, which are mainly based on language modelling derived from a large amount of raw text. These models encode distributional knowledge without considering syntactic structural information, although several studies have shown benefits of including such information. Therefore, we propose new syntactically-informed word representations (SIWRs), which allow us to enrich the pre-trained word representations with syntactic information without training language models from scratch. To obtain SIWRs, a graph-based neural model is built on top of either static or contextualised word representations such as GloVe, ELMo and BERT. The model is first pre-trained with only a relatively modest amount of task-independent data that are automatically annotated using existing syntactic tools. SIWRs are then obtained by applying the model to downstream task data and extracting the intermediate word representations. We finally replace word representations in downstream models with SIWRs for applications. We evaluate SIWRs on three information extraction tasks, namely nested named entity recognition (NER), binary and n-ary relation extractions (REs). The results demonstrate that our SIWRs yield performance gains over the base representations in these NLP tasks with 3-9% relative error reduction. Our SIWRs also perform better than fine-tuning BERT in binary RE. We also conduct extensive experiments to analyse the proposed method.

14.
Bioinformatics ; 36(19): 4910-4917, 2020 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-33141147

RESUMO

MOTIVATION: Recent neural approaches on event extraction from text mainly focus on flat events in general domain, while there are less attempts to detect nested and overlapping events. These existing systems are built on given entities and they depend on external syntactic tools. RESULTS: We propose an end-to-end neural nested event extraction model named DeepEventMine that extracts multiple overlapping directed acyclic graph structures from a raw sentence. On the top of the bidirectional encoder representations from transformers model, our model detects nested entities and triggers, roles, nested events and their modifications in an end-to-end manner without any syntactic tools. Our DeepEventMine model achieves the new state-of-the-art performance on seven biomedical nested event extraction tasks. Even when gold entities are unavailable, our model can detect events from raw text with promising performance. AVAILABILITY AND IMPLEMENTATION: Our codes and models to reproduce the results are available at: https://github.com/aistairc/DeepEventMine. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Idioma , Projetos de Pesquisa
15.
BMJ Open ; 10(10): e039759, 2020 10 21.
Artigo em Inglês | MEDLINE | ID: mdl-33087376

RESUMO

OBJECTIVE: To determine how the representation of women's health has changed in clinical studies over the course of 70 years. DESIGN: Observational study of 71 866 research articles published between 1948 and 2018 in The BMJ. MAIN OUTCOME MEASURES: The incidence of women-specific health topics over time. General linear, additive and segmented regression models were used to estimate trends. RESULTS: Over 70 years, the overall odds that a word in a BMJ research article was 'woman' or 'women' increased by an annual factor of 1.023, but this rate of increase varied by clinical specialty with some showing little or no change. The odds that an article was about some aspect of women-specific health increased much more slowly, by an annual factor of 1.004. The incidence of articles about particular areas of women-specific medicine such as pregnancy did not show a general increase, but rather fluctuated over time. The incidence of articles making any mention of women, gender or sex declined between 1948 and 2005, after which it rose steeply so that by 2018 few papers made no mention of them at all. CONCLUSIONS: Over time women have become ever more prominent in BMJ research articles. However, the importance of women-specific health topics has waxed and waned as researchers responded ephemerally to medical advances, public health programmes, and sociolegal changes. The appointment of a woman editor-inchief in 2005 may have had a dramatic effect on whether women were mentioned in research articles.


Assuntos
Ciência de Dados , Saúde da Mulher , Feminino , Humanos , Gravidez , Saúde Pública , Editoração , Pesquisadores
16.
Nat Hum Behav ; 4(4): 352-360, 2020 04.
Artigo em Inglês | MEDLINE | ID: mdl-31959923

RESUMO

Here we investigate the evolutionary dynamics of several kinds of modern cultural artefacts-pop music, novels, the clinical literature and cars-as well as a collection of organic populations. In contrast to the general belief that modern culture evolves very quickly, we show that rates of modern cultural evolution are comparable to those of many animal populations. Using time-series methods, we show that much of modern culture is shaped by either stabilizing or directional forces or both and that these forces partly regulate the rates at which different traits evolve. We suggest that these forces are probably cultural selection and that the evolution of many artefact traits can be explained by a shifting-optimum model of cultural selection that, in turn, rests on known psychological biases in aesthetic appreciation. In sum, our results demonstrate the deep unity of the processes and patterns of cultural and organic evolution.


Assuntos
Evolução Cultural , Cultura , Animais , Evolução Biológica , Humanos , Modelos Teóricos , Fatores de Tempo
17.
J Am Med Inform Assoc ; 27(1): 22-30, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31197355

RESUMO

OBJECTIVE: This article describes an ensembling system to automatically extract adverse drug events and drug related entities from clinical narratives, which was developed for the 2018 n2c2 Shared Task Track 2. MATERIALS AND METHODS: We designed a neural model to tackle both nested (entities embedded in other entities) and polysemous entities (entities annotated with multiple semantic types) based on MIMIC III discharge summaries. To better represent rare and unknown words in entities, we further tokenized the MIMIC III data set by splitting the words into finer-grained subwords. We finally combined all the models to boost the performance. Additionally, we implemented a featured-based conditional random field model and created an ensemble to combine its predictions with those of the neural model. RESULTS: Our method achieved 92.78% lenient micro F1-score, with 95.99% lenient precision, and 89.79% lenient recall, respectively. Experimental results showed that combining the predictions of either multiple models, or of a single model with different settings can improve performance. DISCUSSION: Analysis of the development set showed that our neural models can detect more informative text regions than feature-based conditional random field models. Furthermore, most entity types significantly benefit from subword representation, which also allows us to extract sparse entities, especially nested entities. CONCLUSION: The overall results have demonstrated that the ensemble method can accurately recognize entities, including nested and polysemous entities. Additionally, our method can recognize sparse entities by reconsidering the clinical narratives at a finer-grained subword level, rather than at the word level.


Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Redes Neurais de Computação , Humanos , Narração
18.
J Am Med Inform Assoc ; 27(1): 39-46, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31390003

RESUMO

OBJECTIVE: Identification of drugs, associated medication entities, and interactions among them are crucial to prevent unwanted effects of drug therapy, known as adverse drug events. This article describes our participation to the n2c2 shared-task in extracting relations between medication-related entities in electronic health records. MATERIALS AND METHODS: We proposed an ensemble approach for relation extraction and classification between drugs and medication-related entities. We incorporated state-of-the-art named-entity recognition (NER) models based on bidirectional long short-term memory (BiLSTM) networks and conditional random fields (CRF) for end-to-end extraction. We additionally developed separate models for intra- and inter-sentence relation extraction and combined them using an ensemble method. The intra-sentence models rely on bidirectional long short-term memory networks and attention mechanisms and are able to capture dependencies between multiple related pairs in the same sentence. For the inter-sentence relations, we adopted a neural architecture that utilizes the Transformer network to improve performance in longer sequences. RESULTS: Our team ranked third with a micro-averaged F1 score of 94.72% and 87.65% for relation and end-to-end relation extraction, respectively (Tracks 2 and 3). Our ensemble effectively takes advantages from our proposed models. Analysis of the reported results indicated that our proposed approach is more generalizable than the top-performing system, which employs additional training data- and corpus-driven processing techniques. CONCLUSIONS: We proposed a relation extraction system to identify relations between drugs and medication-related entities. The proposed approach is independent of external syntactic tools. Analysis showed that by using latent Drug-Drug interactions we were able to significantly improve the performance of non-Drug-Drug pairs in EHRs.


Assuntos
Aprendizado Profundo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Interações Medicamentosas , Humanos , Redes Neurais de Computação
19.
BMC Med Inform Decis Mak ; 19(1): 256, 2019 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-31805934

RESUMO

BACKGROUND: Machine learning can assist with multiple tasks during systematic reviews to facilitate the rapid retrieval of relevant references during screening and to identify and extract information relevant to the study characteristics, which include the PICO elements of patient/population, intervention, comparator, and outcomes. The latter requires techniques for identifying and categorising fragments of text, known as named entity recognition. METHODS: A publicly available corpus of PICO annotations on biomedical abstracts is used to train a named entity recognition model, which is implemented as a recurrent neural network. This model is then applied to a separate collection of abstracts for references from systematic reviews within biomedical and health domains. The occurrences of words tagged in the context of specific PICO contexts are used as additional features for a relevancy classification model. Simulations of the machine learning-assisted screening are used to evaluate the work saved by the relevancy model with and without the PICO features. Chi-squared and statistical significance of positive predicted values are used to identify words that are more indicative of relevancy within PICO contexts. RESULTS: Inclusion of PICO features improves the performance metric on 15 of the 20 collections, with substantial gains on certain systematic reviews. Examples of words whose PICO context are more precise can explain this increase. CONCLUSIONS: Words within PICO tagged segments in abstracts are predictive features for determining inclusion. Combining PICO annotation model into the relevancy classification pipeline is a promising approach. The annotations may be useful on their own to aid users in pinpointing necessary information for data extraction, or to facilitate semantic search.


Assuntos
Bases de Dados Genéticas , Disseminação de Informação , Aprendizado de Máquina , Semântica , Revisões Sistemáticas como Assunto , Humanos
20.
BMC Med Inform Decis Mak ; 19(Suppl 7): 273, 2019 12 23.
Artigo em Inglês | MEDLINE | ID: mdl-31865903

RESUMO

BACKGROUND: Clinical Named Entity Recognition is to find the name of diseases, body parts and other related terms from the given text. Because Chinese language is quite different with English language, the machine cannot simply get the graphical and phonetic information form Chinese characters. The method for Chinese should be different from that for English. Chinese characters present abundant information with the graphical features, recent research on Chinese word embedding tries to use graphical information as subword. This paper uses both graphical and phonetic features to improve Chinese Clinical Named Entity Recognition based on the presence of phono-semantic characters. METHODS: This paper proposed three different embedding models and tested them on the annotated data. The data have been divided into two sections for exploring the effect of the proportion of phono-semantic characters. RESULTS: The model using primary radical and pinyin can improve Clinical Named Entity Recognition in Chinese and get the F-measure of 0.712. More phono-semantic characters does not give a better result. CONCLUSIONS: The paper proves that the use of the combination of graphical and phonetic features can improve the Clinical Named Entity Recognition in Chinese.


Assuntos
Idioma , Aprendizado de Máquina , Processamento de Linguagem Natural , Fonética , Curadoria de Dados , Registros Eletrônicos de Saúde , Humanos , Semântica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...