Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Database (Oxford) ; 20222022 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-36043400

RESUMEN

The coronavirus disease 2019 (COVID-19) pandemic has been severely impacting global society since December 2019. The related findings such as vaccine and drug development have been reported in biomedical literature-at a rate of about 10 000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretation. For instance, LitCovid is a literature database of COVID-19-related articles in PubMed, which has accumulated more than 200 000 articles with millions of accesses each month by users worldwide. One primary curation task is to assign up to eight topics (e.g. Diagnosis and Treatment) to the articles in LitCovid. The annotated topics have been widely used for navigating the COVID literature, rapidly locating articles of interest and other downstream studies. However, annotating the topics has been the bottleneck of manual curation. Despite the continuing advances in biomedical text-mining methods, few have been dedicated to topic annotations in COVID-19 literature. To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature. The BioCreative LitCovid dataset-consisting of over 30 000 articles with manually reviewed topics-was created for training and testing. It is one of the largest multi-label classification datasets in biomedical scientific literature. Nineteen teams worldwide participated and made 80 submissions in total. Most teams used hybrid systems based on transformers. The highest performing submissions achieved 0.8875, 0.9181 and 0.9394 for macro-F1-score, micro-F1-score and instance-based F1-score, respectively. Notably, these scores are substantially higher (e.g. 12%, higher for macro F1-score) than the corresponding scores of the state-of-art multi-label classification method. The level of participation and results demonstrate a successful track and help close the gap between dataset curation and method development. The dataset is publicly available via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/ for benchmarking and further development. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/.


Asunto(s)
COVID-19 , COVID-19/epidemiología , Minería de Datos/métodos , Bases de Datos Factuales , Humanos , PubMed , Publicaciones
2.
J Biomed Inform ; 110: 103552, 2020 10.
Artículo en Inglés | MEDLINE | ID: mdl-32890727

RESUMEN

Adverse drug events (ADEs) are unintended incidents that involve the taking of a medication. ADEs pose significant health and financial problems worldwide. Information about ADEs can inform health care and improve patient safety. However, much of this information is buried in narrative texts and needs to be extracted with Natural Language Processing techniques, in order to be useful to computerized methods. ADEs can be found on drug labels, contained in the different sections such as descriptions of the drug's active components or more prominently in descriptions of studied side-effects. Extracting these automatically could be useful in triaging and processing drug reports. In this paper, we present three base methods consisting of a Conditional Random Field (CRF), a bi-directional Long Short Term Memory unit with a CRF layer (biLSTM+CRF), and a pre-trained Bi-directional Encoder Representations from Transformers (BERT) model. We also present several ensembles of the CRF and biLSTM+CRF methods for extracting ADEs and their Reason from FDA drug labels. We show that all three methods perform well on our task, and that combining the models through different ensemble methods can improve results, providing increases in recall for the majority class and improving precision for all other classes. We also show the potential of framing ADE extraction from drug labels as a multi-class classification task on the Reason, or type, of ADE.


Asunto(s)
Aprendizaje Profundo , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Preparaciones Farmacéuticas , Etiquetado de Medicamentos , Humanos , Procesamiento de Lenguaje Natural
3.
AMIA Annu Symp Proc ; 2018: 616-623, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30815103

RESUMEN

As the cost of DNA sequencing continues to fall, an increasing amount of information on human genetic variation is being produced that could help progress precision medicine. However, information about such mutations is typically first made available in the scientific literature, and is then later manually curated into more standardized genomic databases. This curation process is expensive, time-consuming and many variants do not end up being fully curated, if at all. Detecting mutations in the literature is the first key step towards automating this process. However, most of the current methods have focused on identifying mutations that follow existing nomenclatures. In this work, we show that there is a large number of mutations that are missed by using this standard approach. Furthermore, we implement the first mutation annotator to cover an extended mutation landscape, and we show that its F1 performance is the same performance as human annotation (F1 78.29 for manual annotation vs F1 79.56 for automatic annotation).


Asunto(s)
Minería de Datos/métodos , Bases de Datos Genéticas , Aprendizaje Profundo , Mutación , Análisis Mutacional de ADN , Humanos , Aprendizaje Automático
4.
AMIA Annu Symp Proc ; 2017: 1215-1224, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29854190

RESUMEN

Adverse Drug Reactions (ADRs) are unintentional reactions caused by a drug or combination of drugs taken by a patient. The current ADR reporting systems inevitably have delays in reporting such events. The broad scope of social media conversations on sites such as Twitter means that inevitably health-related topics will be covered. This means that these sites could then be used to detect potentially novel ADRs with less latency for subsequent further investigation. In this work, we investigate ADR surveillance using a large corpus of Twitter data, containing around 50 billion tweets spanning 3 years (2012-2014), and evaluate against over 3000 drugs reported in the FAERS database. This is both a larger corpus and broader selection of drugs than previous work in the domain. We compare the ADRs identified using our method to the FDA Adverse Event Reporting System (FAERS) database of ADRs reported using more traditional techniques, and find that Twitter is a useful resource for ADR detection up to 72% micro-averaged precision. Micro-averaged recall of 6% is achievable using only 10% of Twitter, indicating that with a higher-volume or targeted feed it would be possible to detect a large percentage of ADRs.


Asunto(s)
Sistemas de Registro de Reacción Adversa a Medicamentos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Vigilancia de Productos Comercializados/métodos , Medios de Comunicación Sociales , Bases de Datos Factuales , Humanos , Estados Unidos , United States Food and Drug Administration
5.
Artículo en Inglés | MEDLINE | ID: mdl-24980129

RESUMEN

BioC is a new simple XML format for sharing biomedical text and annotations and libraries to read and write that format. This promotes the development of interoperable tools for natural language processing (NLP) of biomedical text. The interoperability track at the BioCreative IV workshop featured contributions using or highlighting the BioC format. These contributions included additional implementations of BioC, many new corpora in the format, biomedical NLP tools consuming and producing the format and online services using the format. The ease of use, broad support and rapidly growing number of tools demonstrate the need for and value of the BioC format. Database URL: http://bioc.sourceforge.net/.


Asunto(s)
Biología Computacional , Minería de Datos , Procesamiento de Lenguaje Natural , Programas Informáticos , Investigación Biomédica , Sistemas de Administración de Bases de Datos , Bases de Datos Factuales , Internet
6.
BMC Bioinformatics ; 14: 146, 2013 Apr 30.
Artículo en Inglés | MEDLINE | ID: mdl-23631733

RESUMEN

BACKGROUND: Most of the institutional and research information in the biomedical domain is available in the form of English text. Even in countries where English is an official language, such as the United States, language can be a barrier for accessing biomedical information for non-native speakers. Recent progress in machine translation suggests that this technique could help make English texts accessible to speakers of other languages. However, the lack of adequate specialized corpora needed to train statistical models currently limits the quality of automatic translations in the biomedical domain. RESULTS: We show how a large-sized parallel corpus can automatically be obtained for the biomedical domain, using the MEDLINE database. The corpus generated in this work comprises article titles obtained from MEDLINE and abstract text automatically retrieved from journal websites, which substantially extends the corpora used in previous work. After assessing the quality of the corpus for two language pairs (English/French and English/Spanish) we use the Moses package to train a statistical machine translation model that outperforms previous models for automatic translation of biomedical text. CONCLUSIONS: We have built translation data sets in the biomedical domain that can easily be extended to other languages available in MEDLINE. These sets can successfully be applied to train statistical machine translation models. While further progress should be made by incorporating out-of-domain corpora and domain-specific lexicons, we believe that this work improves the automatic translation of biomedical texts.


Asunto(s)
MEDLINE , Traducción , Lingüística/métodos , Modelos Estadísticos , Edición
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...