iAMY-SCM: Improved prediction and analysis of amyloid proteins using a scoring card method with propensity scores of dipeptides.

Charoenkwan, Phasit; Kanthawong, Sakawrat; Nantasenamat, Chanin; Hasan, Md Mehedi; Shoombuatong, Watshara

Charoenkwan, Phasit; Kanthawong, Sakawrat; Nantasenamat, Chanin; Hasan, Md Mehedi; Shoombuatong, Watshara.

Affiliation

Charoenkwan P; Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand.
Kanthawong S; Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen 40002, Thailand.
Nantasenamat C; Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
Hasan MM; Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
Shoombuatong W; Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand. Electronic address: watshara.sho@mahidol.ac.th.

Genomics ; 113(1 Pt 2): 689-698, 2021 01.

Article in En | MEDLINE | ID: mdl-33017626

ABSTRACT

ABSTRACT

Fast, accurate identification and characterization of amyloid proteins at a large-scale is essential for understating their role in therapeutic intervention strategies. As a matter of fact, there exist only one in silico model for amyloid protein identification using the random forest (RF) model in conjunction with various feature types namely the RFAmy. However, it suffers from low interpretability for biologists. Thus, it is highly desirable to develop a simple and easily interpretable prediction method with robust accuracy as compared to the existing complicated model. In this study, we propose iAMY-SCM, the first scoring card method-based predictor for predicting and analyzing amyloid proteins. Herein, the iAMY-SCM made use of a simple weighted-sum function in conjunction with the propensity scores of dipeptides for the amyloid protein identification. Cross-validation results indicated that iAMY-SCM provided an accuracy of 0.895 that corresponded to 10-22% higher performance than that of widely used machine learning models. Furthermore, iAMY-SCM achieving an accuracy of 0.827 as evaluated by an independent test, which was found to be comparable to that of RFAmy and was approximately 9-13% higher than widely used machine learning models. Furthermore, the analysis of estimated propensity scores of amino acids and dipeptides were performed to provide insights into the biophysical and biochemical properties of amyloid proteins. As such, this demonstrates that the proposed iAMY-SCM is efficient and reliable in terms of simplicity, interpretability and implementation. To facilitate ease of use of the proposed iAMY-SCM, a user-friendly and publicly accessible web server at http//camt.pythonanywhere.com/iAMY-SCM has been established. We anticipate that that iAMY-SCM will be an important tool for facilitating the large-scale prediction and characterization of amyloid protein.

Subject(s)

Amyloid/chemistry; Sequence Analysis, Protein/methods; Software; Amyloid/genetics; Amyloid/metabolism; Machine Learning; Propensity Score; Protein Conformation; Protein Multimerization

Key words

Amyloid protein; Classification; Machine learning; Propensity score; Protein function; Scoring card method

Fulltext

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / Sequence Analysis, Protein / Amyloid Type of study: Prognostic_studies / Risk_factors_studies Language: En Journal: Genomics Journal subject: GENETICA Year: 2021 Type: Article Affiliation country: Thailand

Fulltext

XML

PubMed Links

Search on Google