Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 81
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Am J Hum Genet ; 111(5): 877-895, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38614076

ABSTRACT

Infertility, affecting ∼10% of men, is predominantly caused by primary spermatogenic failure (SPGF). We screened likely pathogenic and pathogenic (LP/P) variants in 638 candidate genes for male infertility in 521 individuals presenting idiopathic SPGF and 323 normozoospermic men in the ESTAND cohort. Molecular diagnosis was reached for 64 men with SPGF (12%), with findings in 39 genes (6%). The yield did not differ significantly between the subgroups with azoospermia (20/185, 11%), oligozoospermia (18/181, 10%), and primary cryptorchidism with SPGF (26/155, 17%). Notably, 19 of 64 LP/P variants (30%) identified in 28 subjects represented recurrent findings in this study and/or with other male infertility cohorts. NR5A1 was the most frequently affected gene, with seven LP/P variants in six SPGF-affected men and two normozoospermic men. The link to SPGF was validated for recently proposed candidate genes ACTRT1, ASZ1, GLUD2, GREB1L, LEO1, RBM5, ROS1, and TGIF2LY. Heterozygous truncating variants in BNC1, reported in female infertility, emerged as plausible causes of severe oligozoospermia. Data suggested that several infertile men may present congenital conditions with less pronounced or pleiotropic phenotypes affecting the development and function of the reproductive system. Genes regulating the hypothalamic-pituitary-gonadal axis were affected in >30% of subjects with LP/P variants. Six individuals had more than one LP/P variant, including five with two findings from the gene panel. A 4-fold increased prevalence of cancer was observed in men with genetic infertility compared to the general male population (8% vs. 2%; p = 4.4 × 10-3). Expanding genetic testing in andrology will contribute to the multidisciplinary management of SPGF.


Subject(s)
Infertility, Male , Humans , Male , Infertility, Male/genetics , Adult , Exome Sequencing , Steroidogenic Factor 1/genetics , Azoospermia/genetics , Oligospermia/genetics , Mutation , Spermatogenesis/genetics , Cohort Studies
2.
Brief Bioinform ; 24(6)2023 09 22.
Article in English | MEDLINE | ID: mdl-37974506

ABSTRACT

Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.


Subject(s)
Genetic Variation , Genome-Wide Association Study , Humans , Phenotype , Case-Control Studies , Models, Genetic
3.
Bioinformatics ; 40(4)2024 Mar 29.
Article in English | MEDLINE | ID: mdl-38603604

ABSTRACT

MOTIVATION: Whole exome sequencing (WES) has emerged as a powerful tool for genetic research, enabling the collection of a tremendous amount of data about human genetic variation. However, properly identifying which variants are causative of a genetic disease remains an important challenge, often due to the number of variants that need to be screened. Expanding the screening to combinations of variants in two or more genes, as would be required under the oligogenic inheritance model, simply blows this problem out of proportion. RESULTS: We present here the High-throughput oligogenic prioritizer (Hop), a novel prioritization method that uses direct oligogenic information at the variant, gene and gene pair level to detect digenic variant combinations in WES data. This method leverages information from a knowledge graph, together with specialized pathogenicity predictions in order to effectively rank variant combinations based on how likely they are to explain the patient's phenotype. The performance of Hop is evaluated in cross-validation on 36 120 synthetic exomes for training and 14 280 additional synthetic exomes for independent testing. Whereas the known pathogenic variant combinations are found in the top 20 in approximately 60% of the cross-validation exomes, 71% are found in the same ranking range when considering the independent set. These results provide a significant improvement over alternative approaches that depend simply on a monogenic assessment of pathogenicity, including early attempts for digenic ranking using monogenic pathogenicity scores. AVAILABILITY AND IMPLEMENTATION: Hop is available at https://github.com/oligogenic/HOP.


Subject(s)
Exome , Humans , Exome Sequencing/methods , Genetic Variation , High-Throughput Nucleotide Sequencing/methods , Computational Biology/methods
4.
Hum Genomics ; 17(1): 16, 2023 03 02.
Article in English | MEDLINE | ID: mdl-36859317

ABSTRACT

BACKGROUND: Congenital hydrocephalus is characterized by ventriculomegaly, defined as a dilatation of cerebral ventricles, and thought to be due to impaired cerebrospinal fluid (CSF) homeostasis. Primary congenital hydrocephalus is a subset of cases with prenatal onset and absence of another primary cause, e.g., brain hemorrhage. Published series report a Mendelian cause in only a minority of cases. In this study, we analyzed exome data of PCH patients in search of novel causal genes and addressed the possibility of an underlying oligogenic mode of inheritance for PCH. MATERIALS AND METHODS: We sequenced the exome in 28 unrelated probands with PCH, 12 of whom from families with at least two affected siblings and 9 of whom consanguineous, thereby increasing the contribution of genetic causes. Patient exome data were first analyzed for rare (MAF < 0.005) transmitted or de novo variants. Population stratification of unrelated PCH patients and controls was determined by principle component analysis, and outliers identified using Mahalanobis distance 5% as cutoff. Patient and control exome data for genes biologically related to cilia (SYScilia database) were analyzed by mutation burden test. RESULTS: In 18% of probands, we identify a causal (pathogenic or likely pathogenic) variant of a known hydrocephalus gene, including genes for postnatal, syndromic hydrocephalus, not previously reported in isolated PCH. In a further 11%, we identify mutations in novel candidate genes. Through mutation burden tests, we demonstrate a significant burden of genetic variants in genes coding for proteins of the primary cilium in PCH patients compared to controls. CONCLUSION: Our study confirms the low contribution of Mendelian mutations in PCH and reports PCH as a phenotypic presentation of some known genes known for syndromic, postnatal hydrocephalus. Furthermore, this study identifies novel Mendelian candidate genes, and provides evidence for oligogenic inheritance implicating primary cilia in PCH.


Subject(s)
Hydrocephalus , Multifactorial Inheritance , Female , Pregnancy , Humans , Mutation , Consanguinity , Databases, Factual
5.
PLoS Comput Biol ; 19(9): e1011488, 2023 09.
Article in English | MEDLINE | ID: mdl-37708232

ABSTRACT

The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.


Subject(s)
Genetic Variation , Genome-Wide Association Study , Computer Simulation , Genetic Association Studies , Genomics , Models, Genetic , High-Throughput Nucleotide Sequencing
6.
BMC Bioinformatics ; 24(1): 324, 2023 Aug 29.
Article in English | MEDLINE | ID: mdl-37644440

ABSTRACT

BACKGROUND: Understanding the impact of gene interactions on disease phenotypes is increasingly recognised as a crucial aspect of genetic disease research. This trend is reflected by the growing amount of clinical research on oligogenic diseases, where disease manifestations are influenced by combinations of variants on a few specific genes. Although statistical machine-learning methods have been developed to identify relevant genetic variant or gene combinations associated with oligogenic diseases, they rely on abstract features and black-box models, posing challenges to interpretability for medical experts and impeding their ability to comprehend and validate predictions. In this work, we present a novel, interpretable predictive approach based on a knowledge graph that not only provides accurate predictions of disease-causing gene interactions but also offers explanations for these results. RESULTS: We introduce BOCK, a knowledge graph constructed to explore disease-causing genetic interactions, integrating curated information on oligogenic diseases from clinical cases with relevant biomedical networks and ontologies. Using this graph, we developed a novel predictive framework based on heterogenous paths connecting gene pairs. This method trains an interpretable decision set model that not only accurately predicts pathogenic gene interactions, but also unveils the patterns associated with these diseases. A unique aspect of our approach is its ability to offer, along with each positive prediction, explanations in the form of subgraphs, revealing the specific entities and relationships that led to each pathogenic prediction. CONCLUSION: Our method, built with interpretability in mind, leverages heterogenous path information in knowledge graphs to predict pathogenic gene interactions and generate meaningful explanations. This not only broadens our understanding of the molecular mechanisms underlying oligogenic diseases, but also presents a novel application of knowledge graphs in creating more transparent and insightful predictors for genetic research.


Subject(s)
Epistasis, Genetic , Pattern Recognition, Automated , Machine Learning , Phenotype , Gene Ontology
7.
BMC Bioinformatics ; 24(1): 179, 2023 May 01.
Article in English | MEDLINE | ID: mdl-37127601

ABSTRACT

BACKGROUND: The prediction of potentially pathogenic variant combinations in patients remains a key task in the field of medical genetics for the understanding and detection of oligogenic/multilocus diseases. Models tailored towards such cases can help shorten the gap of missing diagnoses and can aid researchers in dealing with the high complexity of the derived data. The predictor VarCoPP (Variant Combinations Pathogenicity Predictor) that was published in 2019 and identified potentially pathogenic variant combinations in gene pairs (bilocus variant combinations), was the first important step in this direction. Despite its usefulness and applicability, several issues still remained that hindered a better performance, such as its False Positive (FP) rate, the quality of its training set and its complex architecture. RESULTS: We present VarCoPP2.0: the successor of VarCoPP that is a simplified, faster and more accurate predictive model identifying potentially pathogenic bilocus variant combinations. Results from cross-validation and on independent data sets reveal that VarCoPP2.0 has improved in terms of both sensitivity (95% in cross-validation and 98% during testing) and specificity (5% FP rate). At the same time, its running time shows a significant 150-fold decrease due to the selection of a simpler Balanced Random Forest model. Its positive training set now consists of variant combinations that are more confidently linked with evidence of pathogenicity, based on the confidence scores present in OLIDA, the Oligogenic Diseases Database ( https://olida.ibsquare.be ). The improvement of its performance is also attributed to a more careful selection of up-to-date features identified via an original wrapper method. We show that the combination of different variant and gene pair features together is important for predictions, highlighting the usefulness of integrating biological information at different levels. CONCLUSIONS: Through its improved performance and faster execution time, VarCoPP2.0 enables a more accurate analysis of larger data sets linked to oligogenic diseases. Users can access the ORVAL platform ( https://orval.ibsquare.be ) to apply VarCoPP2.0 on their data.

8.
Proc Natl Acad Sci U S A ; 116(24): 11878-11887, 2019 06 11.
Article in English | MEDLINE | ID: mdl-31127050

ABSTRACT

Notwithstanding important advances in the context of single-variant pathogenicity identification, novel breakthroughs in discerning the origins of many rare diseases require methods able to identify more complex genetic models. We present here the Variant Combinations Pathogenicity Predictor (VarCoPP), a machine-learning approach that identifies pathogenic variant combinations in gene pairs (called digenic or bilocus variant combinations). We show that the results produced by this method are highly accurate and precise, an efficacy that is endorsed when validating the method on recently published independent disease-causing data. Confidence labels of 95% and 99% are identified, representing the probability of a bilocus combination being a true pathogenic result, providing geneticists with rational markers to evaluate the most relevant pathogenic combinations and limit the search space and time. Finally, the VarCoPP has been designed to act as an interpretable method that can provide explanations on why a bilocus combination is predicted as pathogenic and which biological information is important for that prediction. This work provides an important step toward the genetic understanding of rare diseases, paving the way to clinical knowledge and improved patient care.


Subject(s)
Genetic Predisposition to Disease/genetics , Genetic Variation/genetics , Rare Diseases/genetics , Genetic Markers/genetics , Humans
9.
Clin Endocrinol (Oxf) ; 94(4): 656-666, 2021 04.
Article in English | MEDLINE | ID: mdl-33296094

ABSTRACT

OBJECTIVE: The study aimed to identify the genetic basis of partial gonadal dysgenesis (PGD) in a non-consanguineous family from Estonia. PATIENTS: Cousins P (proband) 1 (12 years; 46,XY) and P2 (18 years; 46,XY) presented bilateral cryptorchidism, severe penoscrotal hypospadias, low bitesticular volume and azoospermia in P2. Their distant relative, P3 (30 years; 46,XY), presented bilateral cryptorchidism and cryptozoospermia. DESIGN: Exome sequencing was targeted to P1-P3 and five unaffected family members. RESULTS: P1-P2 were identified as heterozygous carriers of NR5A1 c.991-1G > C. NR5A1 encodes the steroidogenic factor-1 essential in gonadal development and specifically expressed in adrenal, spleen, pituitary and testes. Together with a previous PGD case from Belgium (Robevska et al 2018), c.991-1G > C represents the first recurrent NR5A1 splice-site mutation identified in patients. The majority of previous reports on NR5A1 mutation carriers have not included phenotype-genotype data of the family members. Segregation analysis across three generations showed incomplete penetrance (<50%) and phenotypic variability among the carriers of NR5A1 c.991-1G > C. The variant pathogenicity was possibly modulated by rare heterozygous variants inherited from the other parent, OTX2 p.P134R (P1) or PROP1 c.301_302delAG (P2). For P3, the pedigree structure supported a distinct genetic cause. He carries a previously undescribed likely pathogenic variant SOS1 p.Y136H. SOS1, critical in Ras/MAPK signalling and foetal development, is a strong novel candidate gene for cryptorchidism. CONCLUSIONS: Detailed genetic profiling facilitates counselling and clinical management of the probands, and supports unaffected mutation carriers in the family for their reproductive decision making.


Subject(s)
Gonadal Dysgenesis, 46,XY , Penetrance , Steroidogenic Factor 1 , Biological Variation, Population , Gonadal Dysgenesis, 46,XY/genetics , Humans , Male , Mutation , Steroidogenic Factor 1/genetics , Testis
10.
Nucleic Acids Res ; 47(W1): W93-W98, 2019 07 02.
Article in English | MEDLINE | ID: mdl-31147699

ABSTRACT

A tremendous amount of DNA sequencing data is being produced around the world with the ambition to capture in more detail the mechanisms underlying human diseases. While numerous bioinformatics tools exist that allow the discovery of causal variants in Mendelian diseases, little to no support is provided to do the same for variant combinations, an essential task for the discovery of the causes of oligogenic diseases. ORVAL (the Oligogenic Resource for Variant AnaLysis), which is presented here, provides an answer to this problem by focusing on generating networks of candidate pathogenic variant combinations in gene pairs, as opposed to isolated variants in unique genes. This online platform integrates innovative machine learning methods for combinatorial variant pathogenicity prediction with visualization techniques, offering several interactive and exploratory tools, such as pathogenic gene and protein interaction networks, a ranking of pathogenic gene pairs, as well as visual mappings of the cellular location and pathway information. ORVAL is the first web-based exploration platform dedicated to identifying networks of candidate pathogenic variant combinations with the sole ambition to help in uncovering oligogenic causes for patients that cannot rely on the classical disease analysis tools. ORVAL is available at https://orval.ibsquare.be.


Subject(s)
Genetic Diseases, Inborn/genetics , Genetic Predisposition to Disease , Multifactorial Inheritance/genetics , Software , Computational Biology , Genetic Diseases, Inborn/diagnosis , Humans , Mutation/genetics , Sequence Analysis, DNA
11.
Hum Mutat ; 41(2): 512-524, 2020 02.
Article in English | MEDLINE | ID: mdl-31696992

ABSTRACT

Primary microcephaly (PM) is characterized by a small head since birth and is vastly heterogeneous both genetically and phenotypically. While most cases are monogenic, genetic interactions between Aspm and Wdr62 have recently been described in a mouse model of PM. Here, we used two complementary, holistic in vivo approaches: high throughput DNA sequencing of multiple PM genes in human patients with PM, and genome-edited zebrafish modeling for the digenic inheritance of PM. Exomes of patients with PM showed a significant burden of variants in 75 PM genes, that persisted after removing monogenic causes of PM (e.g., biallelic pathogenic variants in CEP152). This observation was replicated in an independent cohort of patients with PM, where a PM gene panel showed in addition that the burden was carried by six centrosomal genes. Allelic frequencies were consistent with digenic inheritance. In zebrafish, non-centrosomal gene casc5 -/- produced a severe PM phenotype, that was not modified by centrosomal genes aspm or wdr62 invalidation. A digenic, quadriallelic PM phenotype was produced by aspm and wdr62. Our observations provide strong evidence for digenic inheritance of human PM, involving centrosomal genes. Absence of genetic interaction between casc5 and aspm or wdr62 further delineates centrosomal and non-centrosomal pathways in PM.


Subject(s)
Centrosome/metabolism , Genetic Association Studies , Genetic Predisposition to Disease , Inheritance Patterns , Microcephaly/diagnosis , Microcephaly/genetics , Animals , Databases, Genetic , Genetic Association Studies/methods , Humans , Mutation , Open Reading Frames , Phenotype , Signal Transduction , Exome Sequencing , Zebrafish
12.
PLoS Comput Biol ; 14(6): e1006133, 2018 06.
Article in English | MEDLINE | ID: mdl-29912864

ABSTRACT

Paroxysmal nocturnal hemoglobinuria (PNH) is an acquired clonal blood disorder characterized by hemolysis and a high risk of thrombosis, that is due to a deficiency in several cell surface proteins that prevent complement activation. Its origin has been traced to a somatic mutation in the PIG-A gene within hematopoietic stem cells (HSC). However, to date the question of how this mutant clone expands in size to contribute significantly to hematopoiesis remains under debate. One hypothesis posits the existence of a selective advantage of PIG-A mutated cells due to an immune mediated attack on normal HSC, but the evidence supporting this hypothesis is inconclusive. An alternative (and simpler) explanation attributes clonal expansion to neutral drift, in which case selection neither favours nor inhibits expansion of PIG-A mutated HSC. Here we examine the implications of the neutral drift model by numerically evolving a Markov chain for the probabilities of all possible outcomes, and investigate the possible occurrence and evolution, within this framework, of multiple independently arising clones within the HSC pool. Predictions of the model agree well with the known incidence of the disease and average age at diagnosis. Notwithstanding the slight difference in clonal expansion rates between our results and those reported in the literature, our model results lead to a relative stability of clone size when averaging multiple cases, in accord with what has been observed in human trials. The probability of a patient harbouring a second clone in the HSC pool was found to be extremely low ([Formula: see text]). Thus our results suggest that in clinical cases of PNH where two independent clones of mutant cells are observed, only one of those is likely to have originated in the HSC pool.


Subject(s)
Hemoglobinuria, Paroxysmal/genetics , Hemoglobinuria, Paroxysmal/physiopathology , Clone Cells , Evolution, Molecular , Hematopoiesis/genetics , Hematopoietic Stem Cells , Hemoglobinuria/genetics , Hemoglobinuria/physiopathology , Humans , Membrane Proteins/genetics , Membrane Proteins/metabolism , Models, Biological , Mutation
13.
Nucleic Acids Res ; 45(15): e140, 2017 Sep 06.
Article in English | MEDLINE | ID: mdl-28911095

ABSTRACT

To further our understanding of the complexity and genetic heterogeneity of rare diseases, it has become essential to shed light on how combinations of variants in different genes are responsible for a disease phenotype. With the appearance of a resource on digenic diseases, it has become possible to evaluate how digenic combinations differ in terms of the phenotypes they produce. All instances in this resource were assigned to two classes of digenic effects, annotated as true digenic and composite classes. Whereas in the true digenic class variants in both genes are required for developing the disease, in the composite class, a variant in one gene is sufficient to produce the phenotype, but an additional variant in a second gene impacts the disease phenotype or alters the age of onset. We show that a combination of variant, gene and higher-level features can differentiate between these two classes with high accuracy. Moreover, we show via the analysis of three digenic disorders that a digenic effect decision profile, extracted from the predictive model, motivates why an instance was assigned to either of the two classes. Together, our results show that digenic disease data generates novel insights, providing a glimpse into the oligogenic realm.


Subject(s)
Epistasis, Genetic/physiology , Genetic Diseases, Inborn/genetics , Mutation/physiology , Computational Biology/methods , Datasets as Topic , Genetic Association Studies/methods , Genetic Diseases, Inborn/diagnosis , Genetic Predisposition to Disease , Humans , Models, Genetic , Phenotype , Prognosis , Validation Studies as Topic
14.
Nucleic Acids Res ; 45(W1): W201-W206, 2017 07 03.
Article in English | MEDLINE | ID: mdl-28498993

ABSTRACT

High-throughput sequencing methods are generating enormous amounts of genomic data, giving unprecedented insights into human genetic variation and its relation to disease. An individual human genome contains millions of Single Nucleotide Variants: to discriminate the deleterious from the benign ones, a variety of methods have been developed that predict whether a protein-coding variant likely affects the carrier individual's health. We present such a method, DEOGEN2, which incorporates heterogeneous information about the molecular effects of the variants, the domains involved, the relevance of the gene and the interactions in which it participates. This extensive contextual information is non-linearly mapped into one single deleteriousness score for each variant. Since for the non-expert user it is sometimes still difficult to assess what this score means, how it relates to the encoded protein, and where it originates from, we developed an interactive online framework (http://deogen2.mutaframe.com/) to better present the DEOGEN2 deleteriousness predictions of all possible variants in all human proteins. The prediction is visualized so both expert and non-expert users can gain insights into the meaning, protein context and origins of each prediction.


Subject(s)
Amino Acid Substitution , Proteins/genetics , Software , Computer Graphics , Genetic Variation , Humans , Internet , Protein Domains/genetics , Protein Folding
15.
Bioinformatics ; 33(24): 3902-3908, 2017 Dec 15.
Article in English | MEDLINE | ID: mdl-28666322

ABSTRACT

MOTIVATION: Methods able to provide reliable protein alignments are crucial for many bioinformatics applications. In the last years many different algorithms have been developed and various kinds of information, from sequence conservation to secondary structure, have been used to improve the alignment performances. This is especially relevant for proteins with highly divergent sequences. However, recent works suggest that different features may have different importance in diverse protein classes and it would be an advantage to have more customizable approaches, capable to deal with different alignment definitions. RESULTS: Here we present Rigapollo, a highly flexible pairwise alignment method based on a pairwise HMM-SVM that can use any type of information to build alignments. Rigapollo lets the user decide the optimal features to align their protein class of interest. It outperforms current state of the art methods on two well-known benchmark datasets when aligning highly divergent sequences. AVAILABILITY AND IMPLEMENTATION: A Python implementation of the algorithm is available at http://ibsquare.be/rigapollo. CONTACT: wim.vranken@vub.be. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Sequence Alignment/methods , Sequence Analysis, Protein/methods , Support Vector Machine , Algorithms , Markov Chains , Protein Structure, Secondary , Proteins/chemistry , Software
16.
Nucleic Acids Res ; 44(D1): D900-7, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26481352

ABSTRACT

DIDA (DIgenic diseases DAtabase) is a novel database that provides for the first time detailed information on genes and associated genetic variants involved in digenic diseases, the simplest form of oligogenic inheritance. The database is accessible via http://dida.ibsquare.be and currently includes 213 digenic combinations involved in 44 different digenic diseases. These combinations are composed of 364 distinct variants, which are distributed over 136 distinct genes. The web interface provides browsing and search functionalities, as well as documentation and help pages, general database statistics and references to the original publications from which the data have been collected. The possibility to submit novel digenic data to DIDA is also provided. Creating this new repository was essential as current databases do not allow one to retrieve detailed records regarding digenic combinations. Genes, variants, diseases and digenic combinations in DIDA are annotated with manually curated information and information mined from other online resources. Next to providing a unique resource for the development of new analysis methods, DIDA gives clinical and molecular geneticists a tool to find the most comprehensive information on the digenic nature of their diseases of interest.


Subject(s)
Databases, Genetic , Disease/genetics , Multifactorial Inheritance , Genes , Genetic Variation , Humans , Molecular Sequence Annotation
17.
Bioinformatics ; 32(12): 1797-804, 2016 06 15.
Article in English | MEDLINE | ID: mdl-27153718

ABSTRACT

MOTIVATION: There are now many predictors capable of identifying the likely phenotypic effects of single nucleotide variants (SNVs) or short in-frame Insertions or Deletions (INDELs) on the increasing amount of genome sequence data. Most of these predictors focus on SNVs and use a combination of features related to sequence conservation, biophysical, and/or structural properties to link the observed variant to either neutral or disease phenotype. Despite notable successes, the mapping between genetic variants and their phenotypic effects is riddled with levels of complexity that are not yet fully understood and that are often not taken into account in the predictions, despite their promise of significantly improving the prediction of deleterious mutants. RESULTS: We present DEOGEN, a novel variant effect predictor that can handle both missense SNVs and in-frame INDELs. By integrating information from different biological scales and mimicking the complex mixture of effects that lead from the variant to the phenotype, we obtain significant improvements in the variant-effect prediction results. Next to the typical variant-oriented features based on the evolutionary conservation of the mutated positions, we added a collection of protein-oriented features that are based on functional aspects of the gene affected. We cross-validated DEOGEN on 36 825 polymorphisms, 20 821 deleterious SNVs, and 1038 INDELs from SwissProt. The multilevel contextualization of each (variant, protein) pair in DEOGEN provides a 10% improvement of MCC with respect to current state-of-the-art tools. AVAILABILITY AND IMPLEMENTATION: The software and the data presented here is publicly available at http://ibsquare.be/deogen CONTACT: : wvranken@vub.ac.be SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Proteins/genetics , Databases, Protein , Genetic Variation , INDEL Mutation , Software
18.
PLoS Comput Biol ; 12(5): e1004938, 2016 05.
Article in English | MEDLINE | ID: mdl-27213566

ABSTRACT

Src Homology 3 domains are ubiquitous small interaction modules known to act as docking sites and regulatory elements in a wide range of proteins. Prior experimental NMR work on the SH3 domain of Src showed that ligand binding induces long-range dynamic changes consistent with an induced fit mechanism. The identification of the residues that participate in this mechanism produces a chart that allows for the exploration of the regulatory role of such domains in the activity of the encompassing protein. Here we show that a computational approach focusing on the changes in side chain dynamics through ligand binding identifies equivalent long-range effects in the Src SH3 domain. Mutation of a subset of the predicted residues elicits long-range effects on the binding energetics, emphasizing the relevance of these positions in the definition of intramolecular cooperative networks of signal transduction in this domain. We find further support for this mechanism through the analysis of seven other publically available SH3 domain structures of which the sequences represent diverse SH3 classes. By comparing the eight predictions, we find that, in addition to a dynamic pathway that is relatively conserved throughout all SH3 domains, there are dynamic aspects specific to each domain and homologous subgroups. Our work shows for the first time from a structural perspective, which transduction mechanisms are common between a subset of closely related and distal SH3 domains, while at the same time highlighting the differences in signal transduction that make each family member unique. These results resolve the missing link between structural predictions of dynamic changes and the domain sectors recently identified for SH3 domains through sequence analysis.


Subject(s)
src Homology Domains , Amino Acid Sequence , Animals , Computational Biology , Computer Simulation , Evolution, Molecular , Humans , Ligands , Models, Molecular , Mutation , Protein Binding , Sequence Alignment , Thermodynamics , src Homology Domains/genetics
19.
Nucleic Acids Res ; 42(Web Server issue): W264-70, 2014 Jul.
Article in English | MEDLINE | ID: mdl-24728994

ABSTRACT

Protein dynamics are important for understanding protein function. Unfortunately, accurate protein dynamics information is difficult to obtain: here we present the DynaMine webserver, which provides predictions for the fast backbone movements of proteins directly from their amino-acid sequence. DynaMine rapidly produces a profile describing the statistical potential for such movements at residue-level resolution. The predicted values have meaning on an absolute scale and go beyond the traditional binary classification of residues as ordered or disordered, thus allowing for direct dynamics comparisons between protein regions. Through this webserver, we provide molecular biologists with an efficient and easy to use tool for predicting the dynamical characteristics of any protein of interest, even in the absence of experimental observations. The prediction results are visualized and can be directly downloaded. The DynaMine webserver, including instructive examples describing the meaning of the profiles, is available at http://dynamine.ibsquare.be.


Subject(s)
Proteins/chemistry , Software , Internet , Sequence Analysis, Protein
20.
BMC Bioinformatics ; 15: 309, 2014 Sep 19.
Article in English | MEDLINE | ID: mdl-25238967

ABSTRACT

BACKGROUND: Viruses are typically characterized by high mutation rates, which allow them to quickly develop drug-resistant mutations. Mining relevant rules from mutation data can be extremely useful to understand the virus adaptation mechanism and to design drugs that effectively counter potentially resistant mutants. RESULTS: We propose a simple statistical relational learning approach for mutant prediction where the input consists of mutation data with drug-resistance information, either as sets of mutations conferring resistance to a certain drug, or as sets of mutants with information on their susceptibility to the drug. The algorithm learns a set of relational rules characterizing drug-resistance and uses them to generate a set of potentially resistant mutants. Learning a weighted combination of rules allows to attach generated mutants with a resistance score as predicted by the statistical relational model and select only the highest scoring ones. CONCLUSIONS: Promising results were obtained in generating resistant mutations for both nucleoside and non-nucleoside HIV reverse transcriptase inhibitors. The approach can be generalized quite easily to learning mutants characterized by more complex rules correlating multiple mutations.


Subject(s)
Algorithms , Drug Resistance, Viral , HIV Infections/virology , HIV/genetics , Models, Genetic , Mutation , Reverse Transcriptase Inhibitors/pharmacology , Amino Acid Sequence , Artificial Intelligence , HIV/drug effects , HIV/enzymology , HIV Infections/drug therapy , HIV Reverse Transcriptase/chemistry , HIV Reverse Transcriptase/metabolism , Humans , Models, Biological , Models, Statistical , Molecular Sequence Data , Nucleosides/chemistry , Nucleosides/pharmacology , Reverse Transcriptase Inhibitors/chemistry
SELECTION OF CITATIONS
SEARCH DETAIL