RESUMO
Nucleic acids exhibit a repertoire of conformational preference depending on the sequence and environment. Circular dichroism (CD) is an essential and valuable tool for monitoring such secondary structural conformations of nucleic acids. Nonetheless, the CD spectral diversity associated with these structures poses a challenge in obtaining the quantitative information about the secondary structural content of a given CD spectrum. To this end, the competence of the extreme gradient boosting decision-tree (XGBoost), Kohonen and neural network (nnet) algorithms have been exploited here to predict the diverse secondary structures of nucleic acids. A curated library of 450 CD spectra corresponding to 16 different secondary structures of nucleic acids has been created and used as a training dataset. The hyper-parameters corresponding to the aforementioned algorithms have been optimized using holdout and k-fold (here, kâ¯=â¯5) cross-validation methods. For a test dataset of 150 CD spectra, both the nnet and XGBoost algorithms have exhibited nearly similar prediction accuracy in the range of 85% and 87% (the latter exhibited a slightly higher prediction accuracy). Thus, the nnet and XGBoost algorithms tested here can be employed for predicting the hybrid nucleic acid topologies in future. For the sake of accessibility, the entire process has been automated and implemented as a webserver, called CD-NuSS (CD to nucleic acids secondary structure) and is freely accessible at https://project.iith.ac.in/cdnuss/.