Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 59
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Nucleic Acids Res ; 52(W1): W61-W64, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38726873

ABSTRACT

Creating pedigree charts is a recurring task in biomedical research, but there are few online tools for drawing complex human pedigrees available and even fewer are free. With DrawPed we aim to close this gap. DrawPed automatically draws pedigree charts from standard PED format pedigree files. Users can also create pedigrees from scratch and interactively edit existing pedigrees. The application can display conditions not captured in a PED file such as deceased persons or suspected consanguinity of parents. Pedigree charts are displayed as SVGs, which are scalable and hence publication-ready. Pedigrees can be exported as PED files for storage, exchange, or use in other applications. DrawPed is open source and freely available at https://www.genecascade.org/DrawPed/.


Subject(s)
Pedigree , Software , Humans , Computer Graphics , Consanguinity
2.
Nucleic Acids Res ; 52(W1): W148-W158, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38769069

ABSTRACT

In the era of high throughput sequencing, special software is required for the clinical evaluation of genetic variants. We developed REEV (Review, Evaluate and Explain Variants), a user-friendly platform for clinicians and researchers in the field of rare disease genetics. Supporting data was aggregated from public data sources. We compared REEV with seven other tools for clinical variant evaluation. REEV (semi-)automatically fills individual ACMG criteria facilitating variant interpretation. REEV can store disease and phenotype data related to a case to use these for phenotype similarity measures. Users can create public permanent links for individual variants that can be saved as browser bookmarks and shared. REEV may help in the fast diagnostic assessment of genetic variants in a clinical as well as in a research context. REEV (https://reev.bihealth.org/) is free and open to all users and there is no login requirement.


Subject(s)
Genetic Variation , Software , Humans , Phenotype , High-Throughput Nucleotide Sequencing , Rare Diseases/genetics , Rare Diseases/diagnosis , Databases, Genetic
3.
Hum Genet ; 143(5): 683-694, 2024 May.
Article in English | MEDLINE | ID: mdl-38592547

ABSTRACT

Generalized lipodystrophy is a feature of various hereditary disorders, often leading to a progeroid appearance. In the present study we identified a missense and a frameshift variant in a compound heterozygous state in SUPT7L in a boy with intrauterine growth retardation, generalized lipodystrophy, and additional progeroid features. SUPT7L encodes a component of the transcriptional coactivator complex STAGA. By transcriptome sequencing, we showed the predicted missense variant to cause aberrant splicing, leading to exon truncation and thereby to a complete absence of SUPT7L in dermal fibroblasts. In addition, we found altered expression of genes encoding DNA repair pathway components. This pathway was further investigated and an increased rate of DNA damage was detected in proband-derived fibroblasts and genome-edited HeLa cells. Finally, we performed transient overexpression of wildtype SUPT7L in both cellular systems, which normalizes the number of DNA damage events. Our findings suggest SUPT7L as a novel disease gene and underline the link between genome instability and progeroid phenotypes.


Subject(s)
Loss of Function Mutation , Humans , Male , HeLa Cells , Lipodystrophy, Congenital Generalized/genetics , Fibroblasts/metabolism , DNA Damage , Mutation, Missense , DNA Repair/genetics , Lipodystrophy/genetics , Transcription Factors/genetics , Fetal Growth Retardation/genetics
4.
Nucleic Acids Res ; 50(W1): W322-W329, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35639768

ABSTRACT

While great advances in predicting the effects of coding variants have been made, the assessment of non-coding variants remains challenging. This is especially problematic for variants within promoter regions which can lead to over-expression of a gene or reduce or even abolish its expression. The binding of transcription factors to the DNA can be predicted using position weight matrices (PWMs). More recently, transcription factor flexible models (TFFMs) have been introduced and shown to be more accurate than PWMs. TFFMs are based on hidden Markov models and can account for complex positional dependencies. Our new web-based application FABIAN-variant uses 1224 TFFMs and 3790 PWMs to predict whether and to which degree DNA variants affect the binding of 1387 different human transcription factors. For each variant and transcription factor, the software combines the results of different models for a final prediction of the resulting binding-affinity change. The software is written in C++ for speed but variants can be entered through a web interface. Alternatively, a VCF file can be uploaded to assess variants identified by high-throughput sequencing. The search can be restricted to variants in the vicinity of candidate genes. FABIAN-variant is available freely at https://www.genecascade.org/fabian/.


Subject(s)
DNA-Binding Proteins , DNA , Genetic Variation , Software , Transcription Factors , Humans , Binding Sites/genetics , DNA/genetics , DNA/metabolism , Position-Specific Scoring Matrices , Protein Binding , Transcription Factors/genetics , Transcription Factors/metabolism , Genetic Variation/genetics , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Internet , Programming Languages
5.
Nucleic Acids Res ; 50(W1): W677-W681, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35524573

ABSTRACT

Precision medicine needs precise phenotypes. The Human Phenotype Ontology (HPO) uses clinical signs instead of diagnoses and has become the standard annotation for patients' phenotypes when describing single gene disorders. Use of the HPO beyond human genetics is however still limited. With SAMS (Symptom Annotation Made Simple), we want to bring sign-based phenotyping to routine clinical care, to hospital patients as well as to outpatients. Our web-based application provides access to three widely used annotation systems: HPO, OMIM, Orphanet. Whilst data can be stored in our database, phenotypes can also be imported and exported as Global Alliance for Genomics and Health (GA4GH) Phenopackets without using the database. The web interface can easily be integrated into local databases, e.g. clinical information systems. SAMS offers users to share their data with others, empowering patients to record their own signs and symptoms (or those of their children) and thus provide their doctors with additional information. We think that our approach will lead to better characterised patients which is not only helpful for finding disease mutations but also to better understand the pathophysiology of diseases and to recruit patients for studies and clinical trials. SAMS is freely available at https://www.genecascade.org/SAMS/.


Subject(s)
Databases, Genetic , Software , Child , Humans , Genomics , Phenotype , Mutation
6.
Nucleic Acids Res ; 50(W1): W83-W89, 2022 07 05.
Article in English | MEDLINE | ID: mdl-35489060

ABSTRACT

With the shift from SNP arrays to high-throughput sequencing, most researchers studying diseases in consanguineous families do not rely on linkage analysis any longer, but simply search for deleterious variants which are homozygous in all patients. AutozygosityMapper allows the fast and convenient identification of disease mutations in patients from consanguineous pedigrees by focussing on homozygous segments shared by all patients. Users can upload multi-sample VCF files, including WGS data, without any pre-processing. Genome-wide runs of homozygosity and the underlying genotypes are presented in graphical interfaces. AutozygosityMapper extends the functions of its predecessor, HomozygosityMapper, to the search for autozygous regions, in which all patients share the same homozygous genotype. We provide export of VCF files containing only the variants found in homozygous regions, this usually reduces the number of variants by two orders of magnitude. These regions can also directly be analysed with our disease mutation identification tool MutationDistiller. The application comes with simple and intuitive graphical interfaces for data upload, analysis, and results. We kept the structure of HomozygosityMapper so that previous users will find it easy to switch. With AutozygosityMapper, we provide a fast web-based way to identify disease mutations in consanguineous families. AutozygosityMapper is freely available at https://www.genecascade.org/AutozygosityMapper/.


Subject(s)
Consanguinity , DNA Mutational Analysis , Humans , Genotype , Homozygote , Mutation , Pedigree , Polymorphism, Single Nucleotide , DNA Mutational Analysis/methods
7.
BMC Genomics ; 24(1): 736, 2023 Dec 04.
Article in English | MEDLINE | ID: mdl-38049725

ABSTRACT

BACKGROUND: Transcription factors regulate gene expression by binding to transcription factor binding sites (TFBSs). Most models for predicting TFBSs are based on position weight matrices (PWMs), which require a specific motif to be present in the DNA sequence and do not consider interdependencies of nucleotides. Novel approaches such as Transcription Factor Flexible Models or recurrent neural networks consequently provide higher accuracies. However, it is unclear whether such approaches can uncover novel non-canonical, hitherto unexpected TFBSs relevant to human transcriptional regulation. RESULTS: In this study, we trained a convolutional recurrent neural network with HT-SELEX data for GRHL1 binding and applied it to a set of GRHL1 binding sites obtained from ChIP-Seq experiments from human cells. We identified 46 non-canonical GRHL1 binding sites, which were not found by a conventional PWM approach. Unexpectedly, some of the newly predicted binding sequences lacked the CNNG core motif, so far considered obligatory for GRHL1 binding. Using isothermal titration calorimetry, we experimentally confirmed binding between the GRHL1-DNA binding domain and predicted GRHL1 binding sites, including a non-canonical GRHL1 binding site. Mutagenesis of individual nucleotides revealed a correlation between predicted binding strength and experimentally validated binding affinity across representative sequences. This correlation was neither observed with a PWM-based nor another deep learning approach. CONCLUSIONS: Our results show that convolutional recurrent neural networks may uncover unanticipated binding sites and facilitate quantitative transcription factor binding predictions.


Subject(s)
Gene Expression Regulation , Transcription Factors , Humans , Transcription Factors/metabolism , Binding Sites , Protein Binding , Neural Networks, Computer , Nucleotides/metabolism , Repressor Proteins/genetics
8.
J Med Genet ; 59(7): 662-668, 2022 07.
Article in English | MEDLINE | ID: mdl-34379057

ABSTRACT

BACKGROUND: Genes implicated in the Golgi and endosomal trafficking machinery are crucial for brain development, and mutations in them are particularly associated with postnatal microcephaly (POM). METHODS: Exome sequencing was performed in three affected individuals from two unrelated consanguineous families presenting with delayed neurodevelopment, intellectual disability of variable degree, POM and failure to thrive. Patient-derived fibroblasts were tested for functional effects of the variants. RESULTS: We detected homozygous truncating variants in ATP9A. While the variant in family A is predicted to result in an early premature termination codon, the variant in family B affects a canonical splice site. Both variants lead to a substantial reduction of ATP9A mRNA expression. It has been shown previously that ATP9A localises to early and recycling endosomes, whereas its depletion leads to altered gene expression of components from this compartment. Consistent with previous findings, we also observed overexpression of ARPC3 and SNX3, genes strongly interacting with ATP9A. CONCLUSION: In aggregate, our findings show that pathogenic variants in ATP9A cause a novel autosomal recessive neurodevelopmental disorder with POM. While the physiological function of endogenous ATP9A is still largely elusive, our results underline a crucial role of this gene in endosomal transport in brain tissue.


Subject(s)
Adenosine Triphosphatases/genetics , Intellectual Disability , Membrane Transport Proteins/genetics , Microcephaly , Nervous System Malformations , Neurodevelopmental Disorders , Failure to Thrive , Homozygote , Humans , Intellectual Disability/genetics , Microcephaly/pathology , Neurodevelopmental Disorders/genetics , Pedigree
9.
Nucleic Acids Res ; 49(W1): W46-W51, 2021 07 02.
Article in English | MEDLINE | ID: mdl-34038559

ABSTRACT

With Aviator, we present a web service and repository that facilitates surveillance of online tools. Aviator consists of a user-friendly website and two modules, a literature-mining based general and a manually curated module. The general module currently checks 9417 websites twice a day with respect to their availability and stores many features (frontend and backend response time, required RAM and size of the web page, security certificates, analytic tools and trackers embedded in the webpage and others) in a data warehouse. Aviator is also equipped with an analysis functionality, for example authors can check and evaluate the availability of their own tools or those of their peers. Likewise, users can check the availability of a certain tool they intend to use in research or teaching to avoid including unstable tools. The curated section of Aviator offers additional services. We provide API snippets for common programming languages (Perl, PHP, Python, JavaScript) as well as an OpenAPI documentation for embedding in the backend of own web services for an automatic test of their function. We query the respective APIs twice a day and send automated notifications in case of an unexpected result. Naturally, the same analysis functionality as for the literature-based module is available for the curated section. Aviator can freely be used at https://www.ccb.uni-saarland.de/aviator.


Subject(s)
Computer Graphics , Software , Drug Repositioning , Humans , Internet , Melanoma/metabolism , Receptors, Odorant/metabolism , Signal Transduction , COVID-19 Drug Treatment
10.
Nucleic Acids Res ; 49(W1): W446-W451, 2021 07 02.
Article in English | MEDLINE | ID: mdl-33893808

ABSTRACT

Here we present an update to MutationTaster, our DNA variant effect prediction tool. The new version uses a different prediction model and attains higher accuracy than its predecessor, especially for rare benign variants. In addition, we have integrated many sources of data that only became available after the last release (such as gnomAD and ExAC pLI scores) and changed the splice site prediction model. To more easily assess the relevance of detected known disease mutations to the clinical phenotype of the patient, MutationTaster now provides information on the diseases they cause. Further changes represent a major overhaul of the interfaces to increase user-friendliness whilst many changes under the hood have been designed to accelerate the processing of uploaded VCF files. We also offer an API for the rapid automated query of smaller numbers of variants from within other software. MutationTaster2021 integrates our disease mutation search engine, MutationDistiller, to prioritise variants from VCF files using the patient's clinical phenotype. The novel version is available at https://www.genecascade.org/MutationTaster2021/. This website is free and open to all users and there is no login requirement.


Subject(s)
Disease/genetics , Mutation , Software , Humans , Phenotype , RNA Splice Sites , Untranslated Regions
11.
Nucleic Acids Res ; 48(10): 5306-5317, 2020 06 04.
Article in English | MEDLINE | ID: mdl-32338759

ABSTRACT

The temporal and spatial expression of genes is controlled by promoters and enhancers. Findings obtained over the last decade that not only promoters but also enhancers are characterized by bidirectional, divergent transcription have challenged the traditional notion that promoters and enhancers represent distinct classes of regulatory elements. Over half of human promoters are associated with CpG islands (CGIs), relatively CpG-rich stretches of generally several hundred nucleotides that are often associated with housekeeping genes. Only about 6% of transcribed enhancers defined by CAGE-tag analysis are associated with CGIs. Here, we present an analysis of enhancer and promoter characteristics and relate them to the presence or absence of CGIs. We show that transcribed enhancers share a number of CGI-dependent characteristics with promoters, including statistically significant local overrepresentation of core promoter elements. CGI-associated enhancers are longer, display higher directionality of transcription, greater expression, a lesser degree of tissue specificity, and a higher frequency of transcription-factor binding events than non-CGI-associated enhancers. Genes putatively regulated by CGI-associated enhancers are enriched for transcription regulator activity. Our findings show that CGI-associated transcribed enhancers display a series of characteristics related to sequence, expression and function that distinguish them from enhancers not associated with CGIs.


Subject(s)
CpG Islands , Enhancer Elements, Genetic , Promoter Regions, Genetic , Transcription, Genetic , Gene Expression Regulation , Histone Code , Humans , Organ Specificity , TATA Box , Transcription Factors/metabolism , Transcription Initiation, Genetic
12.
Nucleic Acids Res ; 48(W1): W162-W169, 2020 07 02.
Article in English | MEDLINE | ID: mdl-32338743

ABSTRACT

VarFish is a user-friendly web application for the quality control, filtering, prioritization, analysis, and user-based annotation of DNA variant data with a focus on rare disease genetics. It is capable of processing variant call files with single or multiple samples. The variants are automatically annotated with population frequencies, molecular impact, and presence in databases such as ClinVar. Further, it provides support for pathogenicity scores including CADD, MutationTaster, and phenotypic similarity scores. Users can filter variants based on these annotations and presumed inheritance pattern and sort the results by these scores. Variants passing the filter are listed with their annotations and many useful link-outs to genome browsers, other gene/variant data portals, and external tools for variant assessment. VarFish allows users to create their own annotations including support for variant assessment following ACMG-AMP guidelines. In close collaboration with medical practitioners, VarFish was designed for variant analysis and prioritization in diagnostic and research settings as described in the software's extensive manual. The user interface has been optimized for supporting these protocols. Users can install VarFish on their own in-house servers where it provides additional lab notebook features for collaborative analysis and allows re-analysis of cases, e.g. after update of genotype or phenotype databases.


Subject(s)
Genetic Variation , Rare Diseases/genetics , Software , Humans , Molecular Sequence Annotation , Rare Diseases/diagnosis , Research , User-Computer Interface
13.
Nucleic Acids Res ; 52(W1): W1-W6, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38888116
14.
Nucleic Acids Res ; 47(W1): W114-W120, 2019 07 02.
Article in English | MEDLINE | ID: mdl-31106342

ABSTRACT

MutationDistiller is a freely available online tool for user-driven analyses of Whole Exome Sequencing data. It offers a user-friendly interface aimed at clinicians and researchers, who are not necessarily bioinformaticians. MutationDistiller combines MutationTaster's pathogenicity predictions with a phenotype-based approach. Phenotypic information is not limited to symptoms included in the Human Phenotype Ontology (HPO), but may also comprise clinical diagnoses and the suspected mode of inheritance. The search can be restricted to lists of candidate genes (e.g. virtual gene panels) and by tissue-specific gene expression. The inclusion of GeneOntology (GO) and metabolic pathways facilitates the discovery of hitherto unknown disease genes. In a novel approach, we trained MutationDistiller's HPO-based prioritization on authentic genotype-phenotype sets obtained from ClinVar and found it to match or outcompete current prioritization tools in terms of accuracy. In the output, the program provides a list of potential disease mutations ordered by the likelihood of the affected genes to cause the phenotype. MutationDistiller provides links to gene-related information from various resources. It has been extensively tested by clinicians and their suggestions have been valued in many iterative cycles of revisions. The tool, a comprehensive documentation and examples are freely available at https://www.mutationdistiller.org/.


Subject(s)
DNA/genetics , Genetic Diseases, Inborn/genetics , Genetic Variation/genetics , Software , Databases, Genetic , Exome/genetics , Humans , Mutation/genetics , Phenotype , User-Computer Interface , Exome Sequencing
15.
Nucleic Acids Res ; 47(W1): W106-W113, 2019 07 02.
Article in English | MEDLINE | ID: mdl-31106382

ABSTRACT

RegulationSpotter is a web-based tool for the user-friendly annotation and interpretation of DNA variants located outside of protein-coding transcripts (extratranscriptic variants). It is designed for clinicians and researchers who wish to assess the potential impact of the considerable number of non-coding variants found in Whole Genome Sequencing runs. It annotates individual variants with underlying regulatory features in an intuitive way by assessing over 100 genome-wide annotations. Additionally, it calculates a score, which reflects the regulatory potential of the variant region. Its dichotomous classifications, 'functional' or 'non-functional', and a human-readable presentation of the underlying evidence allow a biologically meaningful interpretation of the score. The output shows key aspects of every variant and allows rapid access to more detailed information about its possible role in gene regulation. RegulationSpotter can either analyse single variants or complete VCF files. Variants located within protein-coding transcripts are automatically assessed by MutationTaster as well as by RegulationSpotter to account for possible intragenic regulatory effects. RegulationSpotter offers the possibility of using phenotypic data to focus on known disease genes or genomic elements interacting with them. RegulationSpotter is freely available at https://www.regulationspotter.org.


Subject(s)
DNA/genetics , Genetic Diseases, Inborn/genetics , Genetic Variation/genetics , Software , Databases, Genetic , Genome/genetics , High-Throughput Nucleotide Sequencing , Humans , Molecular Sequence Annotation , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA
16.
J Med Genet ; 56(3): 164-175, 2019 03.
Article in English | MEDLINE | ID: mdl-30487246

ABSTRACT

BACKGROUND: Very long-chain fatty acids (VLCFAs) are essential for functioning of biological membranes. ELOVL fatty acid elongase 1 catalyses elongation of saturated and monounsaturated C22-C26-VLCFAs. We studied two patients with a dominant ELOVL1 mutation. Independently, Kutkowska-Kazmierczak et al. had investigated the same patients and found the same mutation. We extended our study towards additional biochemical, functional, and therapeutic aspects. METHODS: We did mutation screening by whole exome sequencing. RNA-sequencing was performed in patient and control fibroblasts. Ceramide and sphingomyelin levels were measured by LC-MS/MS. ELOVL1 activity was determined by a stable isotope-labelled [13C]malonyl-CoA elongation assay. ELOVL1 expression patterns were investigated by immunofluorescence, in situ hybridisation and RT-qPCR. As treatment option, we investigated VLCFA loading of fibroblasts. RESULTS: Both patients carried an identical heterozygous de novo ELOVL1 mutation (c.494C>T, NM_001256399; p.S165F) not deriving from a founder allele. Patients suffered from epidermal hyperproliferation and increased keratinisation (ichthyosis). Hypomyelination of the central white matter explained spastic paraplegia and central nystagmus, while optic atrophy was causative for reduction of peripheral vision and visual acuity. The mutation abrogated ELOVL1 enzymatic activity and reduced ≥C24 ceramides and sphingomyelins in patient cells. Fibroblast loading with C22:0-VLCFAs increased C24:0-ceramides and sphingomyelins. We found competitive inhibition for ceramide and sphingomyelin synthesis between saturated and monounsaturated VLCFAs. Transcriptome analysis revealed upregulation of modules involved in epidermal development and keratinisation, and downregulation of genes for neurodevelopment, myelination, and synaptogenesis. Many regulated genes carried consensus proliferator-activated receptor (PPAR)α and PPARγ binding motifs in their 5'-regions. CONCLUSION: A dominant ELOVL1 mutation causes a neuro-ichthyotic disorder possibly amenable to treatment with PPAR-modulating drugs.


Subject(s)
Acanthosis Nigricans/genetics , Deafness/genetics , Demyelinating Diseases/genetics , Fatty Acid Elongases/genetics , Ichthyosis/genetics , Mutation , Optic Atrophy/genetics , Paraplegia/genetics , Acanthosis Nigricans/diagnosis , Adolescent , Amino Acid Sequence , Biomarkers , Biopsy , Child, Preschool , Deafness/diagnosis , Demyelinating Diseases/diagnosis , Female , Fibroblasts/metabolism , Gene Expression , Genetic Predisposition to Disease , Genotype , Humans , Ichthyosis/diagnosis , Magnetic Resonance Imaging , Male , Optic Atrophy/diagnosis , Paraplegia/diagnosis , Pedigree , Peroxisome Proliferator-Activated Receptors/metabolism , Phenotype , Exome Sequencing
18.
Clin Genet ; 95(2): 287-292, 2019 02.
Article in English | MEDLINE | ID: mdl-30417324

ABSTRACT

In clinical genetics, the Human Phenotype Ontology as well as disease ontologies are often used for deep phenotyping of patients and coding of clinical diagnoses. However, assigning ontology classes to patient descriptions is often disconnected from writing patient reports or manuscripts in word processing software. This additional workload and the requirement to install dedicated software may discourage usage of ontologies for parts of the target audience. Here we present Phenotero, a freely available and simple solution to annotate patient phenotypes and diseases at the time of writing clinical reports or manuscripts. We adopt Zotero, a citation management software to create a tool which allows to reference classes from ontologies within text at the time of writing. We expect this approach to decrease the additional workload to a minimum while ensuring high quality associations with ontology classes. Standardized collection of phenotypic information at the time of describing the patient allows for streamlining the clinic workflow and efficient data entry. It will subsequently promote clinical and molecular diagnosis with the ultimate goal of better understanding genetic diseases. Thus, we believe that Phenotero eases the usage of ontologies and controlled vocabularies in the field of clinical genetics.


Subject(s)
Databases, Factual , Genetics, Medical/methods , Phenotype , Software , Databases, Genetic , Humans , User-Computer Interface , Web Browser , Workflow
19.
Genome Res ; 24(2): 340-8, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24162188

ABSTRACT

Numerous new disease-gene associations have been identified by whole-exome sequencing studies in the last few years. However, many cases remain unsolved due to the sheer number of candidate variants remaining after common filtering strategies such as removing low quality and common variants and those deemed unlikely to be pathogenic. The observation that each of our genomes contains about 100 genuine loss-of-function variants makes identification of the causative mutation problematic when using these strategies alone. We propose using the wealth of genotype to phenotype data that already exists from model organism studies to assess the potential impact of these exome variants. Here, we introduce PHenotypic Interpretation of Variants in Exomes (PHIVE), an algorithm that integrates the calculation of phenotype similarity between human diseases and genetically modified mouse models with evaluation of the variants according to allele frequency, pathogenicity, and mode of inheritance approaches in our Exomiser tool. Large-scale validation of PHIVE analysis using 100,000 exomes containing known mutations demonstrated a substantial improvement (up to 54.1-fold) over purely variant-based (frequency and pathogenicity) methods with the correct gene recalled as the top hit in up to 83% of samples, corresponding to an area under the ROC curve of >95%. We conclude that incorporation of phenotype data can play a vital role in translational bioinformatics and propose that exome sequencing projects should systematically capture clinical phenotypes to take advantage of the strategy presented here.


Subject(s)
Exome/genetics , Genetic Association Studies , Genetic Predisposition to Disease , Polymorphism, Single Nucleotide/genetics , Algorithms , Animals , Computational Biology , Databases, Genetic , Humans , Mice , Phenotype , Sequence Analysis, DNA , Software
20.
BMC Genomics ; 17: 388, 2016 05 21.
Article in English | MEDLINE | ID: mdl-27209209

ABSTRACT

BACKGROUND: The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified "real" in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. RESULTS: While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. CONCLUSIONS: Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM ( http:/mutationtaster.charite.de/ePOSSUM/ ) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites.


Subject(s)
Computational Biology , Transcription Factors/metabolism , Binding Sites , Mutation , Polymorphism, Single Nucleotide , Transcription Factors/chemistry , Transcription Factors/genetics
SELECTION OF CITATIONS
SEARCH DETAIL