RESUMO
Auditory neuropathy spectrum disorder (ANSD) associated with mutations of the OTOF gene is one of the common types of sensorineural hearing loss of a hereditary nature. Due to its high genetic heterogeneity, ANSD is considered one of the most difficult hearing disorders to diagnose. The dataset from 270 known annotated single amino acid substitutions (SAV) related to ANSD was created. It was used to estimate the accuracy of pathogenicity prediction using the known (from dbNSFP4.4) method and a new one. The new method (ConStruct) for the creation of the protein-centric classification model is based on the use of Random Forest for the analysis of missense variants in exons of the OTOF gene. A system of predictor variables was developed based on the modern understanding of the structure and function of the otoferlin protein and reflecting the location of changes in the tertiary structure of the protein due to mutations in the OTOF gene. The conservation values of nucleotide substitutions in genomes of 100 vertebrates and 30 primates were also used as variables. The average prediction of balanced accuracy and the AUC value calculated by the 5-fold cross-validation procedure were 0.866 and 0.903, respectively. The model shows good results for interpreting data from the targeted sequencing of the OTOF gene and can be implemented as an auxiliary tool for the diagnosis of ANSD in the early stages of ontogenesis. The created model, together with the results of the pathogenicity prediction of SAVs via other known accurate methods, were used for the evaluation of a manually created set of 1302 VUS related to ANSD. Based on the analysis of predicted results, 16 SAVs were selected as the new most probable pathogenic variants.
Assuntos
Perda Auditiva Central , Perda Auditiva Neurossensorial , Proteínas de Membrana , Animais , Perda Auditiva Central/diagnóstico , Perda Auditiva Central/genética , Perda Auditiva Neurossensorial/genética , Mutação , Mutação de Sentido Incorreto , Proteínas de Membrana/genética , HumanosRESUMO
Drug resistance to anticancer drugs is a serious complication in patients with cancer. Typically, drug resistance occurs due to amino acid substitutions (AAS) in drug target proteins. The study aimed at developing and validating a new approach to the creation of structure-property relationships (SPR) classification models to predict AASs leading to drug resistance to inhibitors of tyrosine-protein kinase ABL1. The approach was based on the representation of AASs as peptides described in terms of structural formulas. The data on drug-resistant and non-resistant variants of AAS for two isoforms of ABL1 were extracted from the COSMIC database. The given training sets (approximately 700 missense variants) were used for the creation of SPR models in MultiPASS software based on substructural atom-centric multiple neighborhoods of atom (MNA) descriptors for the description of the structural formula of protein fragments and a Bayesian-like algorithm for revealing structure-property relationships. It was found that MNA descriptors of the 6th level and peptides from 11 amino acid residues were the best combination for ABL1 isoform 1 with the prediction accuracy (AUC) of resistance to imatinib (0.897) and dasatinib (0.996). For ABL1 isoform 2 (resistance to imatinib), the best combination was MNA descriptors of the 6th level, peptides form 15 amino acids (AUC value was 0.909). The prediction of possible drug-resistant AASs was made for dbSNP and gnomAD data. The six selected most probable imatinib-resistant AASs were additionally validated by molecular modeling and docking, which confirmed the possibility of resistance for the E334V and T392I variants.
RESUMO
Next Generation Sequencing (NGS) technologies are rapidly entering clinical practice. A promising area for their use lies in the field of newborn screening. The mass screening of newborns using NGS technology leads to the discovery of a large number of new missense variants that need to be assessed for association with the development of hereditary diseases. Currently, the primary analysis and identification of pathogenic variations is carried out using bioinformatic tools. Although extensive efforts have been made in the computational approach to variant interpretation, there is currently no generally accepted pathogenicity predictor. In this study, we used the sequence-structure-property relationships (SSPR) approach, based on the representation of protein fragments by molecular structural formula. The approach predicts the pathogenic effect of single amino acid substitutions in proteins related with twenty-five monogenic heritable diseases from the Uniform Screening Panel for Major Conditions recommended by the Advisory Committee on Hereditary Disorders in Newborns and Children. In order to create SSPR models of classification, we modified a piece of cheminformatics software, MultiPASS, that was originally developed for the prediction of activity spectra for drug-like substances. The created SSPR models were compared with traditional bioinformatic tools (SIFT 4G, Polyphen-2 HDIV, MutationAssessor, PROVEAN and FATHMM). The average AUC of our approach was 0.804 ± 0.040. Better quality scores were achieved for 15 from 25 proteins with a significantly higher accuracy for some proteins (IVD, HADHB, HBB). The best SSPR models of classification are freely available in the online resource SAV-Pred (Single Amino acid Variants Predictor).
Assuntos
Triagem Neonatal , Software , Recém-Nascido , Criança , Humanos , Substituição de Aminoácidos , Mutação de Sentido Incorreto , Biologia ComputacionalRESUMO
Estimation of interaction of drug-like compounds with antitargets is important for the assessment of possible toxic effects during drug development. Publicly available online databases provide data on the experimental results of chemical interactions with antitargets, which can be used for the creation of (Q)SAR models. The structures and experimental Ki and IC50 values for compounds tested on the inhibition of 30 antitargets from the ChEMBL 20 database were used. Data sets with Ki and IC50 values including more than 100 compounds were created for each antitarget. The (Q)SAR models were created by GUSAR software using quantitative neighborhoods of atoms (QNA), multilevel neighborhoods of atoms (MNA) descriptors, and self-consistent regression. The accuracy of (Q)SAR models was validated by the fivefold cross-validation procedure. The balanced accuracy was higher for qualitative SAR models (0.80 and 0.81 for Ki and IC50 values, respectively) than for quantitative QSAR models (0.73 and 0.76 for Ki and IC50 values, respectively). In most cases, sensitivity was higher for SAR models than for QSAR models, but specificity was higher for QSAR models. The mean R 2 and RMSE were 0.64 and 0.77 for Ki values and 0.59 and 0.73 for IC50 values, respectively. The number of compounds falling within the applicability domain was higher for SAR models than for the test sets.