Your browser doesn't support javascript.
loading
Comprehensive ensemble in QSAR prediction for drug discovery.
Kwon, Sunyoung; Bae, Ho; Jo, Jeonghee; Yoon, Sungroh.
Affiliation
  • Kwon S; Department of Electrical and Computer Engineering, Seoul National University, Seoul, 08826, South Korea.
  • Bae H; Clova AI Research, NAVER Corp., Seongnam, 13561, South Korea.
  • Jo J; Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, South Korea.
  • Yoon S; Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 08826, South Korea.
BMC Bioinformatics ; 20(1): 521, 2019 Oct 26.
Article in En | MEDLINE | ID: mdl-31655545
ABSTRACT

BACKGROUND:

Quantitative structure-activity relationship (QSAR) is a computational modeling method for revealing relationships between structural properties of chemical compounds and biological activities. QSAR modeling is essential for drug discovery, but it has many constraints. Ensemble-based machine learning approaches have been used to overcome constraints and obtain reliable predictions. Ensemble learning builds a set of diversified models and combines them. However, the most prevalent approach random forest and other ensemble approaches in QSAR prediction limit their model diversity to a single subject.

RESULTS:

The proposed ensemble method consistently outperformed thirteen individual models on 19 bioassay datasets and demonstrated superiority over other ensemble approaches that are limited to a single subject. The comprehensive ensemble method is publicly available at http//data.snu.ac.kr/QSAR/ .

CONCLUSIONS:

We propose a comprehensive ensemble method that builds multi-subject diversified models and combines them through second-level meta-learning. In addition, we propose an end-to-end neural network-based individual classifier that can automatically extract sequential features from a simplified molecular-input line-entry system (SMILES). The proposed individual models did not show impressive results as a single model, but it was considered the most important predictor when combined, according to the interpretation of the meta-learning.
Subject(s)
Key words

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Quantitative Structure-Activity Relationship Type of study: Prognostic_studies / Risk_factors_studies Language: En Journal: BMC Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2019 Document type: Article Affiliation country: South Korea Publication country: ENGLAND / ESCOCIA / GB / GREAT BRITAIN / INGLATERRA / REINO UNIDO / SCOTLAND / UK / UNITED KINGDOM

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Quantitative Structure-Activity Relationship Type of study: Prognostic_studies / Risk_factors_studies Language: En Journal: BMC Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2019 Document type: Article Affiliation country: South Korea Publication country: ENGLAND / ESCOCIA / GB / GREAT BRITAIN / INGLATERRA / REINO UNIDO / SCOTLAND / UK / UNITED KINGDOM