Current strategies to address data scarcity in artificial intelligence-based drug discovery: A comprehensive review.

Gangwal, Amit; Ansari, Azim; Ahmad, Iqrar; Azad, Abul Kalam; Wan Sulaiman, Wan Mohd Azizi

Gangwal, Amit; Ansari, Azim; Ahmad, Iqrar; Azad, Abul Kalam; Wan Sulaiman, Wan Mohd Azizi.

Gangwal A; Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India. Electronic address: gangwal.amit@gmail.com.
Ansari A; Computer Aided Drug Design Center, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy, Dhule, 424001, Maharashtra, India.
Ahmad I; Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Gondur, Dhule, 424002, Maharashtra, India. Electronic address: ansariiqrar50@gmail.com.
Azad AK; Faculty of Pharmacy, University College of MAIWP International, Batu Caves, 68100, Kuala Lumpur, Malaysia. Electronic address: azad2011iium@gmail.com.
Wan Sulaiman WMA; Faculty of Pharmacy, University College of MAIWP International, Batu Caves, 68100, Kuala Lumpur, Malaysia. Electronic address: drwanazizi@ucmi.edu.my.

Comput Biol Med ; 179: 108734, 2024 Sep.

Article en En | MEDLINE | ID: mdl-38964243

ABSTRACT

ABSTRACT

Artificial intelligence (AI) has played a vital role in computer-aided drug design (CADD). This development has been further accelerated with the increasing use of machine learning (ML), mainly deep learning (DL), and computing hardware and software advancements. As a result, initial doubts about the application of AI in drug discovery have been dispelled, leading to significant benefits in medicinal chemistry. At the same time, it is crucial to recognize that AI is still in its infancy and faces a few limitations that need to be addressed to harness its full potential in drug discovery. Some notable limitations are insufficient, unlabeled, and non-uniform data, the resemblance of some AI-generated molecules with existing molecules, unavailability of inadequate benchmarks, intellectual property rights (IPRs) related hurdles in data sharing, poor understanding of biology, focus on proxy data and ligands, lack of holistic methods to represent input (molecular structures) to prevent pre-processing of input molecules (feature engineering), etc. The major component in AI infrastructure is input data, as most of the successes of AI-driven efforts to improve drug discovery depend on the quality and quantity of data, used to train and test AI algorithms, besides a few other factors. Additionally, data-gulping DL approaches, without sufficient data, may collapse to live up to their promise. Current literature suggests a few methods, to certain extent, effectively handle low data for better output from the AI models in the context of drug discovery. These are transferring learning (TL), active learning (AL), single or one-shot learning (OSL), multi-task learning (MTL), data augmentation (DA), data synthesis (DS), etc. One different method, which enables sharing of proprietary data on a common platform (without compromising data privacy) to train ML model, is federated learning (FL). In this review, we compare and discuss these methods, their recent applications, and limitations while modeling small molecule data to get the improved output of AI methods in drug discovery. Article also sums up some other novel methods to handle inadequate data.

Asunto(s)

Inteligencia Artificial; Descubrimiento de Drogas; Descubrimiento de Drogas/métodos; Humanos; Aprendizaje Automático; Aprendizaje Profundo

Palabras clave

Active learning; Artificial intelligence federated learning; Data augmentation; Data privacy; Data synthesis; Deep learning; Drug discovery; Machine learning; Multi-task learning; One-shot learning; Transfer learning

Texto completo

Imprimir

XML

PubMed Links

Search on Google

Texto completo: 1 Banco de datos: MEDLINE Asunto principal: Inteligencia Artificial / Descubrimiento de Drogas Límite: Humans Idioma: En Año: 2024 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Search on Google