RESUMEN
Systematically predicting the effects of mutations on protein fitness is essential for the understanding of genetic diseases. Indeed, predictions complement experimental efforts in analyzing how variants lead to dysfunctional proteins that in turn can cause diseases. Here we present our new fitness predictor, FiTMuSiC, which leverages structural, evolutionary and coevolutionary information. We show that FiTMuSiC predicts fitness with high accuracy despite the simplicity of its underlying model: it was among the top predictors on the hydroxymethylbilane synthase (HMBS) target of the sixth round of the Critical Assessment of Genome Interpretation challenge (CAGI6) and performs as well as much more complex deep learning models such as AlphaMissense. To further demonstrate FiTMuSiC's robustness, we compared its predictions with in vitro activity data on HMBS, variant fitness data on human glucokinase (GCK), and variant deleteriousness data on HMBS and GCK. These analyses further confirm FiTMuSiC's qualities and accuracy, which compare favorably with those of other predictors. Additionally, FiTMuSiC returns two scores that separately describe the functional and structural effects of the variant, thus providing mechanistic insight into why the variant leads to fitness loss or gain. We also provide an easy-to-use webserver at https://babylone.ulb.ac.be/FiTMuSiC , which is freely available for academic use and does not require any bioinformatics expertise, which simplifies the accessibility of our tool for the entire scientific community.
Asunto(s)
Proteínas , Humanos , MutaciónRESUMEN
BACKGROUND: It remains an important challenge to predict the functional consequences or clinical impacts of genetic variants in human diseases, such as cancer. An increasing number of genetic variants in cancer have been discovered and documented in public databases such as COSMIC, but the vast majority of them have no functional or clinical annotations. Some databases, such as CiVIC are available with manual annotation of functional mutations, but the size of the database is small due to the use of human annotation. Since the unlabeled data (millions of variants) typically outnumber labeled data (thousands of variants), computational tools that take advantage of unlabeled data may improve prediction accuracy. RESULT: To leverage unlabeled data to predict functional importance of genetic variants, we introduced a method using semi-supervised generative adversarial networks (SGAN), incorporating features from both labeled and unlabeled data. Our SGAN model incorporated features from clinical guidelines and predictive scores from other computational tools. We also performed comparative analysis to study factors that influence prediction accuracy, such as using different algorithms, types of features, and training sample size, to provide more insights into variant prioritization. We found that SGAN can achieve competitive performances with small labeled training samples by incorporating unlabeled samples, which is a unique advantage compared to traditional machine learning methods. We also found that manually curated samples can achieve a more stable predictive performance than publicly available datasets. CONCLUSIONS: By incorporating much larger samples of unlabeled data, the SGAN method can improve the ability to detect novel oncogenic variants, compared to other machine-learning algorithms that use only labeled datasets. SGAN can be potentially used to predict the pathogenicity of more complex variants such as structural variants or non-coding variants, with the availability of more training samples and informative features.
Asunto(s)
Algoritmos , Neoplasias , Humanos , Aprendizaje Automático , Neoplasias/genética , Bases de Datos Factuales , Aprendizaje Automático SupervisadoRESUMEN
BACKGROUND: Despite the efforts that have been made to standardize the interpretation of variants, in some cases, their pathogenicity remains vague and confusing, and sometimes their interpretation does not help clinicians to establish clinical correlation using genetic test results. This study aims to shed more lights on these challenging variants. METHODS: In a clinical setting, the variants found from 81 array CGH and 79 whole exome sequencing (WES) in patients with congenital anomalies were interpreted based on American College of Medical Genetics and Genomics guidelines. RESULTS: In this study, the interpretation of the disease-causing variants and the variants with uncertain clinical significance detected by WES was far more challenging than the variants detected by array CGH. The presence of unreported clinical symptoms, incomplete penetrance, variable expressivity, parents' reluctance to analyze segregation in the family, and the limitations of prenatal tests, were among the challenging factors in the interpretation of variants in this study. CONCLUSION: A careful study of the pedigree and disease mode of inheritance, as well as a careful clinical examination of the carrier parents in diseases with autosomal dominant inheritance, are among the primary strategies for determining the clinical significance of the variants. Continued efforts to mitigate these challenges are needed to improve the interpretation of variants.
Asunto(s)
Variación Biológica Poblacional , Diagnóstico Prenatal , Embarazo , Femenino , Humanos , Linaje , Secuenciación del Exoma , GenómicaRESUMEN
Interpretation of mitochondrial protein-encoding (mt-mRNA) variants has been challenging due to mitochondrial characteristics that have not been addressed by American College of Medical Genetics and Genomics guidelines. We developed criteria for the interpretation of mt-mRNA variants via literature review of reported variants, tested and refined these criteria by using our new cases, followed by interpreting 421 novel variants in our clinical database using these verified criteria. A total of 32 of 56 previously reported pathogenic (P) variants had convincing evidence for pathogenicity. These variants are either null variants, well-known disease-causing variants, or have robust functional data or strong phenotypic correlation with heteroplasmy levels. Based on our criteria, 65.7% (730/1,111) of variants of unknown significance (VUS) were reclassified as benign (B) or likely benign (LB), and one variant was scored as likely pathogenic (LP). Furthermore, using our criteria we classified 2, 12, and 23 as P, LP, and LB, respectively, among 421 novel variants. The remaining stayed as VUS (91.2%). Appropriate interpretation of mt-mRNA variants is the basis for clinical diagnosis and genetic counseling. Mutation type, heteroplasmy levels in different tissues of the probands and matrilineal relatives, in silico predictions, population data, as well as functional studies are key points for pathogenicity assessments.
Asunto(s)
Predisposición Genética a la Enfermedad , Genómica , Asesoramiento Genético , Humanos , Mutación , ARN Mensajero/genética , Estados UnidosRESUMEN
PURPOSE: To develop criteria to interpret mitochondrial transfer RNA (mt-tRNA) variants based on unique characteristics of mitochondrial genetics and conserved structural/functional properties of tRNA. METHODS: We developed rules on a set of established pathogenic/benign variants by examining heteroplasmy correlations with phenotype, tissue distribution, family members, and among unrelated families from published literature. We validated these deduced rules using our new cases and applied them to classify novel variants. RESULTS: Evaluation of previously reported pathogenic variants found that 80.6% had sufficient evidence to support phenotypic correlation with heteroplasmy levels among and within families. The remaining variants were downgraded due to the lack of similar evidence. Application of the verified criteria resulted in rescoring 80.8% of reported variants of uncertain significance (VUS) to benign and likely benign. Among 97 novel variants, none met pathogenic criteria. A large proportion of novel variants (84.5%) remained as VUS, while only 10.3% were likely pathogenic. Detection of these novel variants in additional individuals would facilitate their classification. CONCLUSION: Proper interpretation of mt-tRNA variants is crucial for accurate clinical diagnosis and genetic counseling. Correlations with tissue distribution, heteroplasmy levels, predicted perturbations to tRNA structure, and phenotypes provide important evidence for determining the clinical significance of mt-tRNA variants.
Asunto(s)
Mitocondrias , ARN de Transferencia , Humanos , Mitocondrias/genética , Fenotipo , ARN Mitocondrial/genética , ARN de Transferencia/genéticaRESUMEN
In the context of the Critical Assessment of the Genome Interpretation, 6th edition (CAGI6), the Genetics of Neurodevelopmental Disorders Lab in Padua proposed a new ID-challenge to give the opportunity of developing computational methods for predicting patient's phenotype and the causal variants. Eight research teams and 30 models had access to the phenotype details and real genetic data, based on the sequences of 74 genes (VCF format) in 415 pediatric patients affected by Neurodevelopmental Disorders (NDDs). NDDs are clinically and genetically heterogeneous conditions, with onset in infant age. In this study we evaluate the ability and accuracy of computational methods to predict comorbid phenotypes based on clinical features described in each patient and causal variants. Finally, we asked to develop a method to find new possible genetic causes for patients without a genetic diagnosis. As already done for the CAGI5, seven clinical features (ID, ASD, ataxia, epilepsy, microcephaly, macrocephaly, hypotonia), and variants (causative, putative pathogenic and contributing factors) were provided. Considering the overall clinical manifestation of our cohort, we give out the variant data and phenotypic traits of the 150 patients from CAGI5 ID-Challenge as training and validation for the prediction methods development.
RESUMEN
BACKGROUND: At least 10% of adults and most of the children who receive renal replacement therapy have inherited kidney diseases. These disorders substantially decrease their life quality and have a large effect on the health-care system. Multisystem complications, with typical challenges for rare disorders, including variable phenotypes and fragmented clinical and biological data, make genetic diagnosis of inherited kidney disorders difficult. In current clinical practice, genetic diagnosis is important for clinical management, estimating disease development, and applying personal treatment for patients. SUMMARY: Inherited kidney diseases comprise hundreds of different disorders. Here, we have summarized various monogenic kidney disorders. These disorders are caused by mutations in genes coding for a wide range of proteins including receptors, channels/transporters, enzymes, transcription factors, and structural components that might also have a role in extrarenal organs (bone, eyes, brain, skin, ear, etc.). With the development of next-generation sequencing technologies, genetic testing and analysis become more accessible, promoting our understanding of the pathophysiologic mechanisms of inherited kidney diseases. However, challenges exist in interpreting the significance of genetic variants and translating them to guide clinical managements. Alport syndrome is chosen as an example to introduce the practical application of genetic testing and diagnosis on inherited kidney diseases, considering its clinical features, genetic backgrounds, and genetic testing for making a genetic diagnosis. KEY MESSAGES: Recent advances in genomics have highlighted the complexity of Mendelian disorders, which is due to allelic heterogeneity (distinct mutations in the same gene produce distinct phenotypes), locus heterogeneity (mutations in distinct genes result in similar phenotypes), reduced penetrance, variable expressivity, modifier genes, and/or environmental factors. Implementation of precision medicine in clinical nephrology can improve the clinical diagnostic rate and treatment efficiency of kidney diseases, which requires a good understanding of genetics for nephrologists.
RESUMEN
INTRODUCTION: Cystic Fibrosis is among the first diseases to have general population genetic screening tests and one of the most common indications of prenatal and preimplantation genetic diagnosis for single gene disorders. During the past twenty years, thanks to the evolution of diagnostic techniques, our knowledge of CFTR genetics and pathophysiological mechanisms involved in cystic fibrosis has significantly improved. Areas covered: Sanger sequencing and quantitative methods greatly contributed to the identification of more than 2,000 sequence variations reported worldwide in the CFTR gene. We are now entering a new technological age with the generalization of high throughput approaches such as Next Generation Sequencing and Droplet Digital PCR technologies in diagnostics laboratories. These powerful technologies open up new perspectives for scanning the entire CFTR locus, exploring modifier factors that possibly influence the clinical evolution of patients, and for preimplantation and prenatal diagnosis. Expert commentary: Such breakthroughs would, however, require powerful bioinformatics tools and relevant functional tests of variants for analysis and interpretation of the resulting data. Ultimately, an optimal use of all those resources may improve patient care and therapeutic decision-making.
Asunto(s)
Fibrosis Quística/diagnóstico , Pruebas Genéticas , Fibrosis Quística/genética , Fibrosis Quística/metabolismo , Regulador de Conductancia de Transmembrana de Fibrosis Quística/genética , Humanos , Mutación , Diagnóstico Prenatal , Análisis de Secuencia de ADNRESUMEN
The rapid evolution and widespread use of next generation sequencing (NGS) in clinical laboratories has allowed an incredible progress in the genetic diagnostics of several inherited disorders. However, the new technologies have brought new challenges. In this review we consider the important issue of NGS data analysis, as well as the interpretation of unknown genetic variants and the management of the incidental findings. Moreover, we focus the attention on the new professional figure of bioinformatics and the new role of medical geneticists in clinical management of patients. Furthermore, we consider some of the main clinical applications of NGS, taking into consideration that there will be a growing progress in this field in the forthcoming future.
RESUMEN
Mutations in Thyroglobulin (TG) are common genetic causes of congenital hypothyroidism (CH). But the TG mutation spectrum and its frequency in Chinese CH patients have not been investigated. Here we conducted a genetic screening of TG gene in a cohort of 382 Chinese CH patients. We identified 22 rare non-polymorphic variants including six truncating variants and 16 missense variants of unknown significance (VUS). Seven patients carried homozygous pathogenic variants, and three patients carried homozygous or compound heterozygous VUS. 48 out of 382 patients carried one of 18 heterozygous VUS which is significantly more often than their occurrences in control cohort (P < 0.0001). Unique to Asian population, the c.274+2T>G variant is the most common pathogenic variant with an allele frequency of 0.021. The prevalence of CH due to TG gene defect in Chinese population was estimated to be approximately 1/101,000. Our study uncovered ethnicity specific TG mutation spectrum and frequency.