Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Automating biomedical literature review for rapid drug discovery: Leveraging GPT-4 to expedite pandemic response.

Yang, Jingmei; Walker, Kenji C; Bekar-Cesaretli, Ayse A; Hao, Boran; Bhadelia, Nahid; Joseph-McCarthy, Diane; Paschalidis, Ioannis Ch.

Int J Med Inform ; 189: 105500, 2024 May 24.

Artigo em Inglês | MEDLINE | ID: mdl-38815316

RESUMO

OBJECTIVE: The rapid expansion of the biomedical literature challenges traditional review methods, especially during outbreaks of emerging infectious diseases when quick action is critical. Our study aims to explore the potential of ChatGPT to automate the biomedical literature review for rapid drug discovery. MATERIALS AND METHODS: We introduce a novel automated pipeline helping to identify drugs for a given virus in response to a potential future global health threat. Our approach can be used to select PubMed articles identifying a drug target for the given virus. We tested our approach on two known pathogens: SARS-CoV-2, where the literature is vast, and Nipah, where the literature is sparse. Specifically, a panel of three experts reviewed a set of PubMed articles and labeled them as either describing a drug target for the given virus or not. The same task was given to the automated pipeline and its performance was based on whether it labeled the articles similarly to the human experts. We applied a number of prompt engineering techniques to improve the performance of ChatGPT. RESULTS: Our best configuration used GPT-4 by OpenAI and achieved an out-of-sample validation performance with accuracy/F1-score/sensitivity/specificity of 92.87%/88.43%/83.38%/97.82% for SARS-CoV-2 and 87.40%/73.90%/74.72%/91.36% for Nipah. CONCLUSION: These results highlight the utility of ChatGPT in drug discovery and development and reveal their potential to enable rapid drug target identification during a pandemic-level health emergency.

Machine learning for the identification of respiratory viral attachment machinery from sequences data.

Walker, Kenji C; Shwarts, Maïa; Demidikin, Stepan; Chakravarty, Arijit; Joseph-McCarthy, Diane.

PLoS One ; 18(3): e0281642, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36862685

RESUMO

At the outset of an emergent viral respiratory pandemic, sequence data is among the first molecular information available. As viral attachment machinery is a key target for therapeutic and prophylactic interventions, rapid identification of viral "spike" proteins from sequence can significantly accelerate the development of medical countermeasures. For six families of respiratory viruses, covering the vast majority of airborne and droplet-transmitted diseases, host cell entry is mediated by the binding of viral surface glycoproteins that interact with a host cell receptor. In this report it is shown that sequence data for an unknown virus belonging to one of the six families above provides sufficient information to identify the protein(s) responsible for viral attachment. Random forest models that take as input a set of respiratory viral sequences can classify the protein as "spike" vs. non-spike based on predicted secondary structure elements alone (with 97.3% correctly classified) or in combination with N-glycosylation related features (with 97.0% correctly classified). Models were validated through 10-fold cross-validation, bootstrapping on a class-balanced set, and an out-of-sample extra-familial validation set. Surprisingly, we showed that secondary structural elements and N-glycosylation features were sufficient for model generation. The ability to rapidly identify viral attachment machinery directly from sequence data holds the potential to accelerate the design of medical countermeasures for future pandemics. Furthermore, this approach may be extendable for the identification of other potential viral targets and for viral sequence annotation in general in the future.

Assuntos

Contramedidas Médicas , Vírus , Ligação Viral , Aprendizado de Máquina , Glicosilação

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA