Your browser doesn't support javascript.
loading
PharmBERT: a domain-specific BERT model for drug labels.
ValizadehAslani, Taha; Shi, Yiwen; Ren, Ping; Wang, Jing; Zhang, Yi; Hu, Meng; Zhao, Liang; Liang, Hualou.
Afiliação
  • ValizadehAslani T; Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, PA, USA.
  • Shi Y; College of Computing and Informatics, Drexel University, Philadelphia, PA, USA.
  • Ren P; Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD, USA.
  • Wang J; Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD, USA.
  • Zhang Y; Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD, USA.
  • Hu M; Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD, USA.
  • Zhao L; Office of Research and Standards, Office of Generic Drugs, Center for Drug Evaluation and Research, United States Food and Drug Administration, Silver Spring, MD, USA.
  • Liang H; School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, PA, USA.
Brief Bioinform ; 24(4)2023 07 20.
Article em En | MEDLINE | ID: mdl-37317617
ABSTRACT
Human prescription drug labeling contains a summary of the essential scientific information needed for the safe and effective use of the drug and includes the Prescribing Information, FDA-approved patient labeling (Medication Guides, Patient Package Inserts and/or Instructions for Use), and/or carton and container labeling. Drug labeling contains critical information about drug products, such as pharmacokinetics and adverse events. Automatic information extraction from drug labels may facilitate finding the adverse reaction of the drugs or finding the interaction of one drug with another drug. Natural language processing (NLP) techniques, especially recently developed Bidirectional Encoder Representations from Transformers (BERT), have exhibited exceptional merits in text-based information extraction. A common paradigm in training BERT is to pretrain the model on large unlabeled generic language corpora, so that the model learns the distribution of the words in the language, and then fine-tune on a downstream task. In this paper, first, we show the uniqueness of language used in drug labels, which therefore cannot be optimally handled by other BERT models. Then, we present the developed PharmBERT, which is a BERT model specifically pretrained on the drug labels (publicly available at Hugging Face). We demonstrate that our model outperforms the vanilla BERT, ClinicalBERT and BioBERT in multiple NLP tasks in the drug label domain. Moreover, how the domain-specific pretraining has contributed to the superior performance of PharmBERT is demonstrated by analyzing different layers of PharmBERT, and more insight into how it understands different linguistic aspects of the data is gained.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Armazenamento e Recuperação da Informação / Rotulagem de Medicamentos Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Armazenamento e Recuperação da Informação / Rotulagem de Medicamentos Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2023 Tipo de documento: Article País de afiliação: Estados Unidos