Pesquisa | Biblioteca Virtual em Saúde

Language models for the prediction of SARS-CoV-2 inhibitors.

Blanchard, Andrew E; Gounley, John; Bhowmik, Debsindhu; Chandra Shekar, Mayanka; Lyngaas, Isaac; Gao, Shang; Yin, Junqi; Tsaris, Aristeidis; Wang, Feiyi; Glaser, Jens.

Int J High Perform Comput Appl ; 36(5-6): 587-602, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-38603308

RESUMO

The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on â¼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.

Path-BigBird: An AI-Driven Transformer Approach to Classification of Cancer Pathology Reports.

Chandrashekar, Mayanka; Lyngaas, Isaac; Hanson, Heidi A; Gao, Shang; Wu, Xiao-Cheng; Gounley, John.

JCO Clin Cancer Inform ; 8: e2300148, 2024 02.

Artigo em Inglês | MEDLINE | ID: mdl-38412383

RESUMO

PURPOSE: Surgical pathology reports are critical for cancer diagnosis and management. To accurately extract information about tumor characteristics from pathology reports in near real time, we explore the impact of using domain-specific transformer models that understand cancer pathology reports. METHODS: We built a pathology transformer model, Path-BigBird, by using 2.7 million pathology reports from six SEER cancer registries. We then compare different variations of Path-BigBird with two less computationally intensive methods: Hierarchical Self-Attention Network (HiSAN) classification model and an off-the-shelf clinical transformer model (Clinical BigBird). We use five pathology information extraction tasks for evaluation: site, subsite, laterality, histology, and behavior. Model performance is evaluated by using macro and micro F1 scores. RESULTS: We found that Path-BigBird and Clinical BigBird outperformed the HiSAN in all tasks. Clinical BigBird performed better on the site and laterality tasks. Versions of the Path-BigBird model performed best on the two most difficult tasks: subsite (micro F1 score of 72.53, macro F1 score of 35.76) and histology (micro F1 score of 80.96, macro F1 score of 37.94). The largest performance gains over the HiSAN model were for histology, for which a Path-BigBird model increased the micro F1 score by 1.44 points and the macro F1 score by 3.55 points. Overall, the results suggest that a Path-BigBird model with a vocabulary derived from well-curated and deidentified data is the best-performing model. CONCLUSION: The Path-BigBird pathology transformer model improves automated information extraction from pathology reports. Although Path-BigBird outperforms Clinical BigBird and HiSAN, these less computationally expensive models still have utility when resources are constrained.

Assuntos

Neoplasias , Humanos , Neoplasias/diagnóstico , Armazenamento e Recuperação da Informação , Sistema de Registros , Inteligência Artificial

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA