Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 81
Filter
Add more filters

Country/Region as subject
Publication year range
1.
Hum Genomics ; 18(1): 20, 2024 Feb 23.
Article in English | MEDLINE | ID: mdl-38395944

ABSTRACT

BACKGROUND: De novo mutations (DNMs) are variants that occur anew in the offspring of noncarrier parents. They are not inherited from either parent but rather result from endogenous mutational processes involving errors of DNA repair/replication. These spontaneous errors play a significant role in the causation of genetic disorders, and their importance in the context of molecular diagnostic medicine has become steadily more apparent as more DNMs have been reported in the literature. In this study, we examined 46,489 disease-associated DNMs annotated by the Human Gene Mutation Database (HGMD) to ascertain their distribution across gene and disease categories. RESULTS: Most disease-associated DNMs reported to date are found to be associated with developmental and psychiatric disorders, a reflection of the focus of sequencing efforts over the last decade. Of the 13,277 human genes in which DNMs have so far been found, the top-10 genes with the highest proportions of DNM relative to gene size were H3-3 A, DDX3X, CSNK2B, PURA, ZC4H2, STXBP1, SCN1A, SATB2, H3-3B and TUBA1A. The distribution of CADD and REVEL scores for both disease-associated DNMs and those mutations not reported to be de novo revealed a trend towards higher deleteriousness for DNMs, consistent with the likely lower selection pressure impacting them. This contrasts with the non-DNMs, which are presumed to have been subject to continuous negative selection over multiple generations. CONCLUSION: This meta-analysis provides important information on the occurrence and distribution of disease-associated DNMs in association with heritable disease and should make a significant contribution to our understanding of this major type of mutation.


Subject(s)
Germ Cells , Parents , Humans , Mutation
2.
Nature ; 571(7766): 505-509, 2019 07.
Article in English | MEDLINE | ID: mdl-31243369

ABSTRACT

The evolution of gene expression in mammalian organ development remains largely uncharacterized. Here we report the transcriptomes of seven organs (cerebrum, cerebellum, heart, kidney, liver, ovary and testis) across developmental time points from early organogenesis to adulthood for human, rhesus macaque, mouse, rat, rabbit, opossum and chicken. Comparisons of gene expression patterns identified correspondences of developmental stages across species, and differences in the timing of key events during the development of the gonads. We found that the breadth of gene expression and the extent of purifying selection gradually decrease during development, whereas the amount of positive selection and expression of new genes increase. We identified differences in the temporal trajectories of expression of individual genes across species, with brain tissues showing the smallest percentage of trajectory changes, and the liver and testis showing the largest. Our work provides a resource of developmental transcriptomes of seven organs across seven species, and comparative analyses that characterize the development and evolution of mammalian organs.


Subject(s)
Gene Expression Regulation, Developmental , Organogenesis/genetics , Transcriptome/genetics , Animals , Biological Evolution , Chickens/genetics , Female , Humans , Macaca mulatta/genetics , Male , Mice , Opossums/genetics , Rabbits , Rats
3.
Genome Res ; 31(2): 327-336, 2021 Feb.
Article in English | MEDLINE | ID: mdl-33468550

ABSTRACT

Recent evidence from proteomics and deep massively parallel sequencing studies have revealed that eukaryotic genomes contain substantial numbers of as-yet-uncharacterized open reading frames (ORFs). We define these uncharacterized ORFs as novel ORFs (nORFs). nORFs in humans are mostly under 100 codons and are found in diverse regions of the genome, including in long noncoding RNAs, pseudogenes, 3' UTRs, 5' UTRs, and alternative reading frames of canonical protein coding exons. There is therefore a pressing need to evaluate the potential functional importance of these unannotated transcripts and proteins in biological pathways and human disease on a larger scale, rather than one at a time. In this study, we outline the creation of a valuable nORFs data set with experimental evidence of translation for the community, use measures of heritability and selection that reveal signals for functional importance, and show the potential implications for functional interpretation of genetic variants in nORFs. Our results indicate that some variants that were previously classified as being benign or of uncertain significance may have to be reinterpreted.

4.
Hum Genet ; 142(2): 245-274, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36344696

ABSTRACT

Whilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear. We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key topological features at the DNA, RNA and protein levels. Comparison with controls without known pathogenicity and genomic regions lacking repeats, allowed the construction of the first tool to discriminate repeat regions harboring pathogenic repeat expansions (DPREx). At the DNA level, pathogenic repeat expansions exhibited stronger signals for DNA regulatory factors (e.g. H3K4me3, transcription factor-binding sites) in exons, promoters, 5'UTRs and 5'genes but were not significantly different from controls in introns, 3'UTRs and 3'genes. Additionally, pathogenic repeat expansions were also found to be enriched in non-B DNA structures. At the RNA level, pathogenic repeat expansions were characterized by lower free energy for forming RNA secondary structure and were closer to splice sites in introns, exons, promoters and 5'genes than controls. At the protein level, pathogenic repeat expansions exhibited a preference to form coil rather than other types of secondary structure, and tended to encode surface-located protein domains. Guided by these features, DPREx ( http://biomed.nscc-gz.cn/zhaolab/geneprediction/# ) achieved an Area Under the Curve (AUC) value of 0.88 in a test on an independent dataset. Pathogenic repeat expansions are thus located such that they exert a synergistic influence on the gene expression pathway involving inter-molecular connections at the DNA, RNA and protein levels.


Subject(s)
DNA Repeat Expansion , DNA , Humans , Introns/genetics , RNA , Trinucleotide Repeat Expansion
5.
Nucleic Acids Res ; 49(1): 221-243, 2021 01 11.
Article in English | MEDLINE | ID: mdl-33300026

ABSTRACT

Human genome stability requires efficient repair of oxidized bases, which is initiated via damage recognition and excision by NEIL1 and other base excision repair (BER) pathway DNA glycosylases (DGs). However, the biological mechanisms underlying detection of damaged bases among the million-fold excess of undamaged bases remain enigmatic. Indeed, mutation rates vary greatly within individual genomes, and lesion recognition by purified DGs in the chromatin context is inefficient. Employing super-resolution microscopy and co-immunoprecipitation assays, we find that acetylated NEIL1 (AcNEIL1), but not its non-acetylated form, is predominantly localized in the nucleus in association with epigenetic marks of uncondensed chromatin. Furthermore, chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) revealed non-random AcNEIL1 binding near transcription start sites of weakly transcribed genes and along highly transcribed chromatin domains. Bioinformatic analyses revealed a striking correspondence between AcNEIL1 occupancy along the genome and mutation rates, with AcNEIL1-occupied sites exhibiting fewer mutations compared to AcNEIL1-free domains, both in cancer genomes and in population variation. Intriguingly, from the evolutionarily conserved unstructured domain that targets NEIL1 to open chromatin, its damage surveillance of highly oxidation-susceptible sites to preserve essential gene function and to limit instability and cancer likely originated ∼500 million years ago during the buildup of free atmospheric oxygen.


Subject(s)
Chromatin/physiology , DNA Glycosylases/metabolism , DNA Repair , Protein Processing, Post-Translational , Acetylation , Animals , Cell Line, Tumor , Cell Nucleus/metabolism , Chromatin/ultrastructure , DNA Glycosylases/chemistry , DNA Glycosylases/physiology , DNA Repair/genetics , Datasets as Topic , Evolution, Molecular , Genes, Helminth , Genes, Homeobox , HEK293 Cells , Helminth Proteins/genetics , Humans , Invertebrates/genetics , Invertebrates/metabolism , Lysine/chemistry , Mutation , Neoplasm Proteins/metabolism , Neoplasms/genetics , Neoplasms/metabolism , Neoplasms/mortality , Oxidation-Reduction , Proteome , Sequence Alignment , Sequence Homology, Amino Acid , Transcription Initiation Site , Vertebrates/genetics , Vertebrates/metabolism
6.
Hum Genet ; 139(10): 1197-1207, 2020 Oct.
Article in English | MEDLINE | ID: mdl-32596782

ABSTRACT

The Human Gene Mutation Database (HGMD®) constitutes a comprehensive collection of published germline mutations in nuclear genes that are thought to underlie, or are closely associated with human inherited disease. At the time of writing (June 2020), the database contains in excess of 289,000 different gene lesions identified in over 11,100 genes manually curated from 72,987 articles published in over 3100 peer-reviewed journals. There are primarily two main groups of users who utilise HGMD on a regular basis; research scientists and clinical diagnosticians. This review aims to highlight how to make the most out of HGMD data in each setting.


Subject(s)
Databases, Genetic , Genome, Human , Germ-Line Mutation , Polymorphism, Genetic , Bibliometrics , Biomedical Research/methods , Genetic Predisposition to Disease , Humans , Public-Private Sector Partnerships
7.
PLoS Comput Biol ; 15(6): e1007112, 2019 06.
Article in English | MEDLINE | ID: mdl-31199787

ABSTRACT

Differentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation. The model shows good predictive performance as well as the ability to identify impacted structural and functional residues including secondary structure, intrinsic disorder, metal and macromolecular binding, post-translational modifications, allosteric sites, and catalytic residues. We identify structural and functional mechanisms impacted preferentially by germline variation from the Human Gene Mutation Database, recurrent somatic variation from COSMIC in the context of different cancers, as well as de novo variants from families with autism spectrum disorder. Further, the distributions of pathogenicity prediction scores generated by MutPred-Indel are shown to differentiate highly recurrent from non-recurrent somatic variation. Collectively, we present a framework to facilitate the interrogation of both pathogenicity and the functional effects of non-frameshifting insertion/deletion variants. The MutPred-Indel webserver is available at http://mutpred.mutdb.org/.


Subject(s)
Genetic Predisposition to Disease/genetics , Genome, Human , INDEL Mutation , Autism Spectrum Disorder/genetics , Autism Spectrum Disorder/physiopathology , Computational Biology , Databases, Genetic , Genome, Human/genetics , Genome, Human/physiology , Humans , INDEL Mutation/genetics , INDEL Mutation/physiology , Machine Learning , ROC Curve
8.
Hum Mutat ; 40(10): 1856-1873, 2019 10.
Article in English | MEDLINE | ID: mdl-31131953

ABSTRACT

It has long been known that canonical 5' splice site (5'SS) GT>GC variants may be compatible with normal splicing. However, to date, the actual scale of canonical 5'SSs capable of generating wild-type transcripts in the case of GT>GC substitutions remains unknown. Herein, combining data derived from a meta-analysis of 45 human disease-causing 5'SS GT>GC variants and a cell culture-based full-length gene splicing assay of 103 5'SS GT>GC substitutions, we estimate that ~15-18% of canonical GT 5'SSs retain their capacity to generate between 1% and 84% normal transcripts when GT is substituted by GC. We further demonstrate that the canonical 5'SSs in which substitution of GT by GC-generated normal transcripts exhibit stronger complementarity to the 5' end of U1 snRNA than those sites whose substitutions of GT by GC did not lead to the generation of normal transcripts. We also observed a correlation between the generation of wild-type transcripts and a milder than expected clinical phenotype but found that none of the available splicing prediction tools were capable of reliably distinguishing 5'SS GT>GC variants that generated wild-type transcripts from those that did not. Our findings imply that 5'SS GT>GC variants in human disease genes may not invariably be pathogenic.


Subject(s)
Alternative Splicing , Base Sequence , Gene Expression Regulation , Genetic Variation , RNA Splice Sites , Cells, Cultured , Computational Biology/methods , Databases, Nucleic Acid , Exons , Gene Expression Profiling , High-Throughput Nucleotide Sequencing , Humans , Introns , Nucleotide Motifs , Position-Specific Scoring Matrices , Sequence Analysis, DNA
9.
Bioinformatics ; 34(3): 511-513, 2018 02 01.
Article in English | MEDLINE | ID: mdl-28968714

ABSTRACT

Summary: We present FATHMM-XF, a method for predicting pathogenic point mutations in the human genome. Drawing on an extensive feature set, FATHMM-XF outperforms competitors on benchmark tests, particularly in non-coding regions where the majority of pathogenic mutations are likely to be found. Availability and implementation: The FATHMM-XF web server is available at http://fathmm.biocompute.org.uk/fathmm-xf/, and as tracks on the Genome Tolerance Browser: http://gtb.biocompute.org.uk. Predictions are provided for human genome version GRCh37/hg19. The data used for this project can be downloaded from: http://fathmm.biocompute.org.uk/fathmm-xf/. Contact: mark.rogers@bristol.ac.uk or c.campbell@bristol.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomics/methods , Point Mutation , Sequence Analysis, DNA/methods , Software , Genome, Human , Humans
10.
Nucleic Acids Res ; 45(3): e13, 2017 02 17.
Article in English | MEDLINE | ID: mdl-28180317

ABSTRACT

The in silico prediction of the functional consequences of mutations is an important goal of human pathogenetics. However, bioinformatic tools that classify mutations according to their functionality employ different algorithms so that predictions may vary markedly between tools. We therefore integrated nine popular prediction tools (PolyPhen-2, SNPs&GO, MutPred, SIFT, MutationTaster2, Mutation Assessor and FATHMM as well as conservation-based Grantham Score and PhyloP) into a single predictor. The optimal combination of these tools was selected by means of a wide range of statistical modeling techniques, drawing upon 10 029 disease-causing single nucleotide variants (SNVs) from Human Gene Mutation Database and 10 002 putatively 'benign' non-synonymous SNVs from UCSC. Predictive performance was found to be markedly improved by model-based integration, whilst maximum predictive capability was obtained with either random forest, decision tree or logistic regression analysis. A combination of PolyPhen-2, SNPs&GO, MutPred, MutationTaster2 and FATHMM was found to perform as well as all tools combined. Comparison of our approach with other integrative approaches such as Condel, CoVEC, CAROL, CADD, MetaSVM and MetaLR using an independent validation dataset, revealed the superiority of our newly proposed integrative approach. An online implementation of this approach, IMHOTEP ('Integrating Molecular Heuristics and Other Tools for Effect Prediction'), is provided at http://www.uni-kiel.de/medinfo/cgi-bin/predictor/.


Subject(s)
Genetic Variation , Software , Algorithms , Computational Biology/methods , Computer Simulation , Humans , Mutation , Polymorphism, Single Nucleotide
11.
Hum Mutat ; 39(2): 292-301, 2018 02.
Article in English | MEDLINE | ID: mdl-29044887

ABSTRACT

Many genetic diseases exhibit considerable epidemiological comorbidity and common symptoms, which provokes debate about the extent of their etiological overlap. The rapid growth in the number of known disease-causing mutations in the Human Gene Mutation Database (HGMD) has allowed us to characterize genetic similarities between diseases by ascertaining the extent to which identical genetic mutations are shared between diseases. Using this approach, we show that 41.6% of disease pairs in all possible pairs (42, 083) exhibit a significant sharing of mutations (P value < 0.05). These mutation-related disease pairs are in agreement with heritability-based disease-disease relations in 48 neurological and psychiatric disease pairs (Spearman's correlation coefficient = 0.50; P value = 3.4 × 10-5 ), and share over-expressed genes significantly more often than unrelated disease pairs (1.5-1.8-fold higher; P value ≤ 1.6 × 10-4 ). The usefulness of mutation-related disease pairs was further demonstrated for predicting novel mutations and identifying individuals susceptible to Crohn disease. Moreover, the mutation-based disease network concurs closely with that based on phenotypes.


Subject(s)
Mutation/genetics , Genetic Predisposition to Disease/genetics , Humans , Phenotype , RNA, Messenger/genetics
12.
Proteins ; 86 Suppl 1: 374-386, 2018 03.
Article in English | MEDLINE | ID: mdl-28975675

ABSTRACT

Our goal is to answer the question: compared with experimental structures, how useful are predicted models for functional annotation? We assessed the functional utility of predicted models by comparing the performances of a suite of methods for functional characterization on the predictions and the experimental structures. We identified 28 sites in 25 protein targets to perform functional assessment. These 28 sites included nine sites with known ligand binding (holo-sites), nine sites that are expected or suggested by experimental authors for small molecule binding (apo-sites), and Ten sites containing important motifs, loops, or key residues with important disease-associated mutations. We evaluated the utility of the predictions by comparing their microenvironments to the experimental structures. Overall structural quality correlates with functional utility. However, the best-ranked predictions (global) may not have the best functional quality (local). Our assessment provides an ability to discriminate between predictions with high structural quality. When assessing ligand-binding sites, most prediction methods have higher performance on apo-sites than holo-sites. Some servers show consistently high performance for certain types of functional sites. Finally, many functional sites are associated with protein-protein interaction. We also analyzed biologically relevant features from the protein assemblies of two targets where the active site spanned the protein-protein interface. For the assembly targets, we find that the features in the models are mainly determined by the choice of template.


Subject(s)
Biological Products/metabolism , Computational Biology/methods , Models, Molecular , Models, Statistical , Protein Conformation , Proteins/chemistry , Proteins/metabolism , Binding Sites , Catalytic Domain , Humans , Ligands , Protein Binding
13.
Bioinformatics ; 33(14): i389-i398, 2017 Jul 15.
Article in English | MEDLINE | ID: mdl-28882004

ABSTRACT

MOTIVATION: Loss-of-function genetic variants are frequently associated with severe clinical phenotypes, yet many are present in the genomes of healthy individuals. The available methods to assess the impact of these variants rely primarily upon evolutionary conservation with little to no consideration of the structural and functional implications for the protein. They further do not provide information to the user regarding specific molecular alterations potentially causative of disease. RESULTS: To address this, we investigate protein features underlying loss-of-function genetic variation and develop a machine learning method, MutPred-LOF, for the discrimination of pathogenic and tolerated variants that can also generate hypotheses on specific molecular events disrupted by the variant. We investigate a large set of human variants derived from the Human Gene Mutation Database, ClinVar and the Exome Aggregation Consortium. Our prediction method shows an area under the Receiver Operating Characteristic curve of 0.85 for all loss-of-function variants and 0.75 for proteins in which both pathogenic and neutral variants have been observed. We applied MutPred-LOF to a set of 1142 de novo vari3ants from neurodevelopmental disorders and find enrichment of pathogenic variants in affected individuals. Overall, our results highlight the potential of computational tools to elucidate causal mechanisms underlying loss of protein function in loss-of-function variants. AVAILABILITY AND IMPLEMENTATION: http://mutpred.mutdb.org. CONTACT: predrag@indiana.edu.


Subject(s)
Loss of Function Mutation , Machine Learning , Proteins/genetics , Sequence Analysis, Protein/methods , Software , Computational Biology/methods , Humans , Protein Conformation , Proteins/metabolism , Proteins/physiology
14.
BMC Med Genet ; 19(1): 183, 2018 10 11.
Article in English | MEDLINE | ID: mdl-30305043

ABSTRACT

BACKGROUND: Mucopolysaccharidosis-IVA (Morquio A disease) is a lysosomal disorder in which the abnormal accumulation of keratan sulfate and chondroitin-6-sulfate is consequent to mutations in the galactosamine-6-sulfatase (GALNS) gene. Since standard DNA sequencing analysis fails to detect about 16% of GALNS mutant alleles, gross DNA rearrangement screening and uniparental disomy evaluation are required to complete the molecular diagnosis. Despite this, the second pathogenic GALNS allele generally remains unidentified in ~ 5% of Morquio-A disease patients. METHODS: In an attempt to bridge the residual gap between clinical and molecular diagnosis, we performed an mRNA-based evaluation of three Morquio-A disease patients in whom the second mutant GALNS allele had not been identified. We also performed sequence analysis of the entire GALNS gene in two patients. RESULTS: Different aberrant GALNS mRNA transcripts were characterized in each patient. Analysis of these transcripts then allowed the identification, in one patient, of a disease-causing deep intronic GALNS mutation. The aberrant mRNA products identified in the other two individuals resulted in partial exon loss. Despite sequencing the entire GALNS gene region in these patients, the identity of a single underlying pathological lesion could not be unequivocally determined. We postulate that a combination of multiple variants, acting in cis, may synergise in terms of their impact on the splicing machinery. CONCLUSIONS: We have identified GALNS variants located within deep intronic regions that have the potential to impact splicing. These findings have prompted us to incorporate mRNA analysis into our diagnostic flow procedure for the molecular analysis of Morquio A disease.


Subject(s)
Chondroitinsulfatases/genetics , Mucopolysaccharidosis IV/genetics , Mutation , RNA Splicing , RNA, Messenger/genetics , Adolescent , Base Sequence , Chondroitinsulfatases/metabolism , DNA Mutational Analysis , Decision Trees , Exons , Female , Genotype , Humans , Introns , Male , Mucopolysaccharidosis IV/diagnosis , Mucopolysaccharidosis IV/metabolism , Mucopolysaccharidosis IV/physiopathology , RNA, Messenger/metabolism
15.
Gastrointest Endosc ; 88(4): 665-673, 2018 10.
Article in English | MEDLINE | ID: mdl-29702101

ABSTRACT

BACKGROUND AND AIMS: Duodenal polyposis and cancer have become a key issue for patients with familial adenomatous polyposis (FAP) and MUTYH-associated polyposis (MAP). Almost all patients with FAP will develop duodenal adenomas, and 5% will develop cancer. The incidence of duodenal adenomas in MAP appears to be lower than in FAP, but the limited available data suggest a comparable increase in the relative risk and lifetime risk of duodenal cancer. Current surveillance recommendations, however, are the same for FAP and MAP, using the Spigelman score (incorporating polyp number, size, dysplasia, and histology) for risk stratification and determination of surveillance intervals. Previous studies have demonstrated a benefit of enhanced detection rates of adenomas by use of chromoendoscopy both in sporadic colorectal disease and in groups at high risk of colorectal cancer. We aimed to assess the effect of chromoendoscopy on duodenal adenoma detection, to determine the impact on Spigelman stage and to compare this in individuals with known pathogenic mutations in order to determine the difference in duodenal involvement between MAP and FAP. METHODS: A prospective study examined the impact of chromoendoscopy on the assessment of the duodenum in 51 consecutive patients with MAP and FAP in 2 academic centers in the United Kingdom (University Hospital Llandough, Cardiff, and St Mark's Hospital, London) from 2011 to 2014. RESULTS: Enhanced adenoma detection of 3 times the number of adenomas after chromoendoscopy was demonstrated in both MAP (P = .013) and FAP (P = .002), but did not affect adenoma size. In both conditions, there was a significant increase in Spigelman stage after chromoendoscopy compared with endoscopy without dye spray. Spigelman scores and overall adenoma detection was significantly lower in MAP compared with FAP. CONCLUSIONS: Chromoendoscopy improved the diagnostic yield of anomas in MAP and FAP 3-fold, and in both MAP and FAP this resulted in a clinically significant upstaging in Spigelman score. Further studies are required to determine the impact of improved adenoma detection on the management and outcome of duodenal polyposis.


Subject(s)
Adenomatous Polyposis Coli/diagnostic imaging , Duodenal Neoplasms/diagnostic imaging , Endoscopy, Gastrointestinal/methods , Population Surveillance/methods , Adenomatous Polyposis Coli/genetics , Adenomatous Polyposis Coli/pathology , Adult , Aged , Aged, 80 and over , Coloring Agents , DNA Glycosylases/genetics , Duodenal Neoplasms/genetics , Duodenal Neoplasms/pathology , Female , Humans , Indigo Carmine , Male , Middle Aged , Neoplasm Staging , Prospective Studies , Tumor Burden
16.
Nature ; 483(7388): 169-75, 2012 Mar 07.
Article in English | MEDLINE | ID: mdl-22398555

ABSTRACT

Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.


Subject(s)
Evolution, Molecular , Genetic Speciation , Genome/genetics , Gorilla gorilla/genetics , Animals , Female , Gene Expression Regulation , Genetic Variation/genetics , Genomics , Humans , Macaca mulatta/genetics , Molecular Sequence Data , Pan troglodytes/genetics , Phylogeny , Pongo/genetics , Proteins/genetics , Sequence Alignment , Species Specificity , Transcription, Genetic
17.
BMC Bioinformatics ; 18(1): 442, 2017 Oct 06.
Article in English | MEDLINE | ID: mdl-28985712

ABSTRACT

BACKGROUND: Small insertions and deletions (indels) have a significant influence in human disease and, in terms of frequency, they are second only to single nucleotide variants as pathogenic mutations. As the majority of mutations associated with complex traits are located outside the exome, it is crucial to investigate the potential pathogenic impact of indels in non-coding regions of the human genome. RESULTS: We present FATHMM-indel, an integrative approach to predict the functional effect, pathogenic or neutral, of indels in non-coding regions of the human genome. Our method exploits various genomic annotations in addition to sequence data. When validated on benchmark data, FATHMM-indel significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-coding variants. FATHMM-indel is available via a web server at indels.biocompute.org.uk. CONCLUSIONS: FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome.


Subject(s)
Computational Biology/methods , DNA, Intergenic/genetics , Genome, Human , INDEL Mutation/genetics , Genetics, Population , Humans , Phenotype , ROC Curve , Reproducibility of Results , Software
18.
Hum Mutat ; 38(10): 1336-1347, 2017 10.
Article in English | MEDLINE | ID: mdl-28649752

ABSTRACT

Synonymous single-nucleotide variants (SNVs), although they do not alter the encoded protein sequences, have been implicated in many genetic diseases. Experimental studies indicate that synonymous SNVs can lead to changes in the secondary and tertiary structures of DNA and RNA, thereby affecting translational efficiency, cotranslational protein folding as well as the binding of DNA-/RNA-binding proteins. However, the importance of these various features in disease phenotypes is not clearly understood. Here, we have built a support vector machine (SVM) model (termed DDIG-SN) as a means to discriminate disease-causing synonymous variants. The model was trained and evaluated on nearly 900 disease-causing variants. The method achieves robust performance with the area under the receiver operating characteristic curve of 0.84 and 0.85 for protein-stratified 10-fold cross-validation and independent testing, respectively. We were able to show that the disease-causing effects in the immediate proximity to exon-intron junctions (1-3 bp) are driven by the loss of splicing motif strength, whereas the gain of splicing motif strength is the primary cause in regions further away from the splice site (4-69 bp). The method is available as a part of the DDIG server at http://sparks-lab.org/ddig.


Subject(s)
DNA-Binding Proteins/genetics , DNA/genetics , Proteins/genetics , Silent Mutation/genetics , DNA/chemistry , DNA-Binding Proteins/chemistry , Genetic Predisposition to Disease , Humans , Nucleic Acid Conformation , Polymorphism, Single Nucleotide/genetics , Protein Folding , Proteins/chemistry , RNA/chemistry , RNA/genetics
19.
Hum Mutat ; 38(1): 16-24, 2017 01.
Article in English | MEDLINE | ID: mdl-27604408

ABSTRACT

Alternative splicing (AS) is a closely regulated process that allows a single gene to encode multiple protein isoforms, thereby contributing to the diversity of the proteome. Dysregulation of the splicing process has been found to be associated with many inherited diseases. However, among the pathogenic AS events, there are numerous "passenger" events whose inclusion or exclusion does not lead to significant changes with respect to protein function. In this study, we evaluate the secondary and tertiary structural features of proteins associated with disease-causing and neutral AS events, and show that several structural features are strongly associated with the pathological impact of exon inclusion. We further develop a machine-learning-based computational model, ExonImpact, for prioritizing and evaluating the functional consequences of hitherto uncharacterized AS events. We evaluated our model using several strategies including cross-validation, and data from the Gene-Tissue Expression (GTEx) and ClinVar databases. ExonImpact is freely available at http://watson.compbio.iupui.edu/ExonImpact.


Subject(s)
Alternative Splicing , Computational Biology/methods , Exons , Genetic Association Studies/methods , Software , Algorithms , Brain/metabolism , Databases, Nucleic Acid , Genetic Predisposition to Disease , Humans , Machine Learning , Protein Domains , Protein Isoforms/chemistry , Protein Isoforms/genetics , Protein Isoforms/metabolism , Structure-Activity Relationship , Web Browser
20.
Hum Genet ; 136(6): 665-677, 2017 06.
Article in English | MEDLINE | ID: mdl-28349240

ABSTRACT

The Human Gene Mutation Database (HGMD®) constitutes a comprehensive collection of published germline mutations in nuclear genes that underlie, or are closely associated with human inherited disease. At the time of writing (March 2017), the database contained in excess of 203,000 different gene lesions identified in over 8000 genes manually curated from over 2600 journals. With new mutation entries currently accumulating at a rate exceeding 17,000 per annum, HGMD represents de facto the central unified gene/disease-oriented repository of heritable mutations causing human genetic disease used worldwide by researchers, clinicians, diagnostic laboratories and genetic counsellors, and is an essential tool for the annotation of next-generation sequencing data. The public version of HGMD ( http://www.hgmd.org ) is freely available to registered users from academic institutions and non-profit organisations whilst the subscription version (HGMD Professional) is available to academic, clinical and commercial users under license via QIAGEN Inc.


Subject(s)
Databases, Genetic , Mutation , Humans , Molecular Diagnostic Techniques
SELECTION OF CITATIONS
SEARCH DETAIL