Pesquisa | Portal de Pesquisa da BVS

Language models for the prediction of SARS-CoV-2 inhibitors.

Blanchard, Andrew E; Gounley, John; Bhowmik, Debsindhu; Chandra Shekar, Mayanka; Lyngaas, Isaac; Gao, Shang; Yin, Junqi; Tsaris, Aristeidis; Wang, Feiyi; Glaser, Jens.

Int J High Perform Comput Appl ; 36(5-6): 587-602, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-38603308

RESUMO

The COVID-19 pandemic highlights the need for computational tools to automate and accelerate drug design for novel protein targets. We leverage deep learning language models to generate and score drug candidates based on predicted protein binding affinity. We pre-trained a deep learning language model (BERT) on â¼9.6 billion molecules and achieved peak performance of 603 petaflops in mixed precision. Our work reduces pre-training time from days to hours, compared to previous efforts with this architecture, while also increasing the dataset size by nearly an order of magnitude. For scoring, we fine-tuned the language model using an assembled set of thousands of protein targets with binding affinity data and searched for inhibitors of specific protein targets, SARS-CoV-2 Mpro and PLpro. We utilized a genetic algorithm approach for finding optimal candidates using the generation and scoring capabilities of the language model. Our generalizable models accelerate the identification of inhibitors for emerging therapeutic targets.

In with the old, in with the new: machine learning for time to event biomedical research.

Danciu, Ioana; Agasthya, Greeshma; Tate, Janet P; Chandra-Shekar, Mayanka; Goethert, Ian; Ovchinnikova, Olga S; McMahon, Benjamin H; Justice, Amy C.

J Am Med Inform Assoc ; 29(10): 1737-1743, 2022 09 12.

Artigo em Inglês | MEDLINE | ID: mdl-35920306

RESUMO

The predictive modeling literature for biomedical applications is dominated by biostatistical methods for survival analysis, and more recently some out of the box machine learning approaches. In this article, we show a presentation of a machine learning method appropriate for time-to-event modeling in the area of prostate cancer long-term disease progression. Using XGBoost adapted to long-term disease progression, we developed a predictive model for 118â788 patients with localized prostate cancer at diagnosis from the Department of Veterans Affairs (VA). Our model accounted for patient censoring. Harrell's c-index for our model using only features available at the time of diagnosis was 0.757 95% confidence interval [0.756, 0.757]. Our results show that machine learning methods like XGBoost can be adapted to use accelerated failure time (AFT) with censoring to model long-term risk of disease progression. The long median survival justifies and requires censoring. Overall, we show that an existing machine learning approach can be used for AFT outcome modeling in prostate cancer, and more generally for other chronic diseases with long observation times.

Assuntos

Pesquisa Biomédica , Neoplasias da Próstata , Progressão da Doença , Humanos , Aprendizado de Máquina , Masculino , Neoplasias da Próstata/diagnóstico , Análise de Sobrevida

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA