Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 70
Filtrar
1.
Proc Natl Acad Sci U S A ; 120(46): e2314225120, 2023 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-37931111

RESUMO

Human genetic variants that introduce an AG into the intronic region between the branchpoint (BP) and the canonical splice acceptor site (ACC) of protein-coding genes can disrupt pre-mRNA splicing. Using our genome-wide BP database, we delineated the BP-ACC segments of all human introns and found extreme depletion of AG/YAG in the [BP+8, ACC-4] high-risk region. We developed AGAIN as a genome-wide computational approach to systematically and precisely pinpoint intronic AG-gain variants within the BP-ACC regions. AGAIN identified 350 AG-gain variants from the Human Gene Mutation Database, all of which alter splicing and cause disease. Among them, 74% created new acceptor sites, whereas 31% resulted in complete exon skipping. AGAIN also predicts the protein-level products resulting from these two consequences. We performed AGAIN on our exome/genomes database of patients with severe infectious diseases but without known genetic etiology and identified a private homozygous intronic AG-gain variant in the antimycobacterial gene SPPL2A in a patient with mycobacterial disease. AGAIN also predicts a retention of six intronic nucleotides that encode an in-frame stop codon, turning AG-gain into stop-gain. This allele was then confirmed experimentally to lead to loss of function by disrupting splicing. We further showed that AG-gain variants inside the high-risk region led to misspliced products, while those outside the region did not, by two case studies in genes STAT1 and IRF7. We finally evaluated AGAIN on our 14 paired exome-RNAseq samples and found that 82% of AG-gain variants in high-risk regions showed evidence of missplicing. AGAIN is publicly available from https://hgidsoft.rockefeller.edu/AGAIN and https://github.com/casanova-lab/AGAIN.


Assuntos
Sítios de Splice de RNA , Splicing de RNA , Humanos , Íntrons , Mutação , Genoma
2.
Am J Hum Genet ; 109(3): 457-470, 2022 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-35120630

RESUMO

We used a machine learning approach to analyze the within-gene distribution of missense variants observed in hereditary conditions and cancer. When applied to 840 genes from the ClinVar database, this approach detected a significant non-random distribution of pathogenic and benign variants in 387 (46%) and 172 (20%) genes, respectively, revealing that variant clustering is widespread across the human exome. This clustering likely occurs as a consequence of mechanisms shaping pathogenicity at the protein level, as illustrated by the overlap of some clusters with known functional domains. We then took advantage of these findings to develop a pathogenicity predictor, MutScore, that integrates qualitative features of DNA substitutions with the new additional information derived from this positional clustering. Using a random forest approach, MutScore was able to identify pathogenic missense mutations with very high accuracy, outperforming existing predictive tools, especially for variants associated with autosomal-dominant disease and cancer. Thus, the within-gene clustering of pathogenic and benign DNA changes is an important and previously underappreciated feature of the human exome, which can be harnessed to improve the prediction of pathogenicity and disambiguation of DNA variants of uncertain significance.


Assuntos
Genoma Humano , Mutação de Sentido Incorreto , Análise por Conglomerados , Exoma/genética , Genoma Humano/genética , Humanos , Mutação de Sentido Incorreto/genética , Virulência
3.
Proc Natl Acad Sci U S A ; 119(44): e2211194119, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36306325

RESUMO

Pre-messenger RNA splicing is initiated with the recognition of a single-nucleotide intronic branchpoint (BP) within a BP motif by spliceosome elements. Forty-eight rare variants in 43 human genes have been reported to alter splicing and cause disease by disrupting BP. However, until now, no computational approach was available to efficiently detect such variants in massively parallel sequencing data. We established a comprehensive human genome-wide BP database by integrating existing BP data and generating new BP data from RNA sequencing of lariat debranching enzyme DBR1-mutated patients and from machine-learning predictions. We characterized multiple features of BP in major and minor introns and found that BP and BP-2 (two nucleotides upstream of BP) positions exhibit a lower rate of variation in human populations and higher evolutionary conservation than the intronic background, while being comparable to the exonic background. We developed BPHunter as a genome-wide computational approach to systematically and efficiently detect intronic variants that may disrupt BP recognition. BPHunter retrospectively identified 40 of the 48 known pathogenic BP variants, in which we summarized a strategy for prioritizing BP variant candidates. The remaining eight variants all create AG-dinucleotides between the BP and acceptor site, which is the likely reason for missplicing. We demonstrated the practical utility of BPHunter prospectively by using it to identify a novel germline heterozygous BP variant of STAT2 in a patient with critical COVID-19 pneumonia and a novel somatic intronic 59-nucleotide deletion of ITPKB in a lymphoma patient, both of which were validated experimentally. BPHunter is publicly available from https://hgidsoft.rockefeller.edu/BPHunter and https://github.com/casanova-lab/BPHunter.


Assuntos
COVID-19 , Humanos , Íntrons/genética , Estudos Retrospectivos , COVID-19/genética , Splicing de RNA/genética , Nucleotídeos
4.
Am J Hum Genet ; 108(12): 2301-2318, 2021 12 02.
Artigo em Inglês | MEDLINE | ID: mdl-34762822

RESUMO

Identifying whether a given genetic mutation results in a gene product with increased (gain-of-function; GOF) or diminished (loss-of-function; LOF) activity is an important step toward understanding disease mechanisms because they may result in markedly different clinical phenotypes. Here, we generated an extensive database of documented germline GOF and LOF pathogenic variants by employing natural language processing (NLP) on the available abstracts in the Human Gene Mutation Database. We then investigated various gene- and protein-level features of GOF and LOF variants and applied machine learning and statistical analyses to identify discriminative features. We found that GOF variants were enriched in essential genes, for autosomal-dominant inheritance, and in protein binding and interaction domains, whereas LOF variants were enriched in singleton genes, for protein-truncating variants, and in protein core regions. We developed a user-friendly web-based interface that enables the extraction of selected subsets from the GOF/LOF database by a broad set of annotated features and downloading of up-to-date versions. These results improve our understanding of how variants affect gene/protein function and may ultimately guide future treatment options.


Assuntos
Bases de Dados Genéticas , Mutação com Ganho de Função , Mutação com Perda de Função , Proteínas/genética , Computação em Nuvem , Predisposição Genética para Doença , Genoma Humano , Mutação em Linhagem Germinativa , Humanos , Intervenção Baseada em Internet , Aprendizado de Máquina
5.
Proc Natl Acad Sci U S A ; 118(36)2021 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-34426522

RESUMO

The construction of population-based variomes has contributed substantially to our understanding of the genetic basis of human inherited disease. Here, we investigated the genetic structure of Turkey from 3,362 unrelated subjects whose whole exomes (n = 2,589) or whole genomes (n = 773) were sequenced to generate a Turkish (TR) Variome that should serve to facilitate disease gene discovery in Turkey. Consistent with the history of present-day Turkey as a crossroads between Europe and Asia, we found extensive admixture between Balkan, Caucasus, Middle Eastern, and European populations with a closer genetic relationship of the TR population to Europeans than hitherto appreciated. We determined that 50% of TR individuals had high inbreeding coefficients (≥0.0156) with runs of homozygosity longer than 4 Mb being found exclusively in the TR population when compared to 1000 Genomes Project populations. We also found that 28% of exome and 49% of genome variants in the very rare range (allele frequency < 0.005) are unique to the modern TR population. We annotated these variants based on their functional consequences to establish a TR Variome containing alleles of potential medical relevance, a repository of homozygous loss-of-function variants and a TR reference panel for genotype imputation using high-quality haplotypes, to facilitate genome-wide association studies. In addition to providing information on the genetic structure of the modern TR population, these data provide an invaluable resource for future studies to identify variants that are associated with specific phenotypes as well as establishing the phenotypic consequences of mutations in specific genes.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Alelos , Consanguinidade , Exoma , Frequência do Gene/genética , Deriva Genética , Genética Populacional/métodos , Estudo de Associação Genômica Ampla/métodos , Genótipo , Haplótipos/genética , Migração Humana/tendências , Humanos , Turquia/etnologia , Sequenciamento do Exoma/métodos
6.
Hum Genet ; 142(2): 275-288, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36352240

RESUMO

Epilepsy (EP) and congenital heart disease (CHD) are two apparently unrelated diseases that nevertheless display substantial mutual comorbidity. Thus, while congenital heart defects are associated with an elevated risk of developing epilepsy, the incidence of epilepsy in CHD patients correlates with CHD severity. Although genetic determinants have been postulated to underlie the comorbidity of EP and CHD, the precise genetic etiology is unknown. We performed variant and gene association analyses on EP and CHD patients separately, using whole exomes of genetically identified Europeans from the UK Biobank and Mount Sinai BioMe Biobank. We prioritized biologically plausible candidate genes and investigated the enriched pathways and other identified comorbidities by biological proximity calculation, pathway analyses, and gene-level phenome-wide association studies. Our variant- and gene-level results point to the Voltage-Gated Calcium Channels (VGCC) pathway as being a unifying framework for EP and CHD comorbidity. Additionally, pathway-level analyses indicated that the functions of disease-associated genes partially overlap between the two disease entities. Finally, phenome-wide association analyses of prioritized candidate genes revealed that cerebral blood flow and ulcerative colitis constitute the two main traits associated with both EP and CHD.


Assuntos
Epilepsia , Cardiopatias Congênitas , Humanos , População Europeia , Cardiopatias Congênitas/genética , Epilepsia/epidemiologia , Epilepsia/genética , Estudos de Associação Genética , Fenótipo
7.
Hum Genet ; 142(2): 245-274, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36344696

RESUMO

Whilst DNA repeat expansions cause numerous heritable human disorders, their origins and underlying pathological mechanisms are often unclear. We collated a dataset comprising 224 human repeat expansions encompassing 203 different genes, and performed a systematic analysis with respect to key topological features at the DNA, RNA and protein levels. Comparison with controls without known pathogenicity and genomic regions lacking repeats, allowed the construction of the first tool to discriminate repeat regions harboring pathogenic repeat expansions (DPREx). At the DNA level, pathogenic repeat expansions exhibited stronger signals for DNA regulatory factors (e.g. H3K4me3, transcription factor-binding sites) in exons, promoters, 5'UTRs and 5'genes but were not significantly different from controls in introns, 3'UTRs and 3'genes. Additionally, pathogenic repeat expansions were also found to be enriched in non-B DNA structures. At the RNA level, pathogenic repeat expansions were characterized by lower free energy for forming RNA secondary structure and were closer to splice sites in introns, exons, promoters and 5'genes than controls. At the protein level, pathogenic repeat expansions exhibited a preference to form coil rather than other types of secondary structure, and tended to encode surface-located protein domains. Guided by these features, DPREx ( http://biomed.nscc-gz.cn/zhaolab/geneprediction/# ) achieved an Area Under the Curve (AUC) value of 0.88 in a test on an independent dataset. Pathogenic repeat expansions are thus located such that they exert a synergistic influence on the gene expression pathway involving inter-molecular connections at the DNA, RNA and protein levels.


Assuntos
Expansão das Repetições de DNA , DNA , Humanos , Íntrons/genética , RNA , Expansão das Repetições de Trinucleotídeos
8.
Proc Natl Acad Sci U S A ; 117(24): 13626-13636, 2020 06 16.
Artigo em Inglês | MEDLINE | ID: mdl-32487729

RESUMO

Humans homozygous or hemizygous for variants predicted to cause a loss of function (LoF) of the corresponding protein do not necessarily present with overt clinical phenotypes. We report here 190 autosomal genes with 207 predicted LoF variants, for which the frequency of homozygous individuals exceeds 1% in at least one human population from five major ancestry groups. No such genes were identified on the X and Y chromosomes. Manual curation revealed that 28 variants (15%) had been misannotated as LoF. Of the 179 remaining variants in 166 genes, only 11 alleles in 11 genes had previously been confirmed experimentally to be LoF. The set of 166 dispensable genes was enriched in olfactory receptor genes (41 genes). The 41 dispensable olfactory receptor genes displayed a relaxation of selective constraints similar to that observed for other olfactory receptor genes. The 125 dispensable nonolfactory receptor genes also displayed a relaxation of selective constraints consistent with greater redundancy. Sixty-two of these 125 genes were found to be dispensable in at least three human populations, suggesting possible evolution toward pseudogenes. Of the 179 LoF variants, 68 could be tested for two neutrality statistics, and 8 displayed robust signals of positive selection. These latter variants included a known FUT2 variant that confers resistance to intestinal viruses, and an APOL3 variant involved in resistance to parasitic infections. Overall, the identification of 166 genes for which a sizeable proportion of humans are homozygous for predicted LoF alleles reveals both redundancies and advantages of such deficiencies for human survival.


Assuntos
Genética Humana , Mutação com Perda de Função , Alelos , Apolipoproteínas L/genética , Fucosiltransferases/genética , Variação Genética , Homozigoto , Humanos , Proteínas/genética , Cromossomos Sexuais/genética , Galactosídeo 2-alfa-L-Fucosiltransferase
9.
Hum Mutat ; 43(3): 328-346, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-34918412

RESUMO

Microdeletions and gross deletions are important causes (~20%) of human inherited disease and their genomic locations are strongly influenced by the local DNA sequence environment. This notwithstanding, no study has systematically examined their underlying generative mechanisms. Here, we obtained 42,098 pathogenic microdeletions and gross deletions from the Human Gene Mutation Database (HGMD) that together form a continuum of germline deletions ranging in size from 1 to 28,394,429 bp. We analyzed the DNA sequence within 1 kb of the breakpoint junctions and found that the frequencies of non-B DNA-forming repeats, GC-content, and the presence of seven of 78 specific sequence motifs in the vicinity of pathogenic deletions correlated with deletion length for deletions of length ≤30 bp. Further, we found that the presence of DR, GQ, and STR repeats is important for the formation of longer deletions (>30 bp) but not for the formation of shorter deletions (≤30 bp) while significantly (χ2 , p < 2E-16) more microhomologies were identified flanking short deletions than long deletions (length >30 bp). We provide evidence to support a functional distinction between microdeletions and gross deletions. Finally, we propose that a deletion length cut-off of 25-30 bp may serve as an objective means to functionally distinguish microdeletions from gross deletions.


Assuntos
DNA , Genoma Humano , Composição de Bases , Sequência de Bases , DNA/genética , Genoma Humano/genética , Humanos , Mutação , Deleção de Sequência
10.
Nature ; 536(7616): 285-91, 2016 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-27535533

RESUMO

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.


Assuntos
Exoma/genética , Variação Genética/genética , Análise Mutacional de DNA , Conjuntos de Dados como Assunto , Humanos , Fenótipo , Proteoma/genética , Doenças Raras/genética , Tamanho da Amostra
11.
Proc Natl Acad Sci U S A ; 116(3): 950-959, 2019 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-30591557

RESUMO

Computational analyses of human patient exomes aim to filter out as many nonpathogenic genetic variants (NPVs) as possible, without removing the true disease-causing mutations. This involves comparing the patient's exome with public databases to remove reported variants inconsistent with disease prevalence, mode of inheritance, or clinical penetrance. However, variants frequent in a given exome cohort, but absent or rare in public databases, have also been reported and treated as NPVs, without rigorous exploration. We report the generation of a blacklist of variants frequent within an in-house cohort of 3,104 exomes. This blacklist did not remove known pathogenic mutations from the exomes of 129 patients and decreased the number of NPVs remaining in the 3,104 individual exomes by a median of 62%. We validated this approach by testing three other independent cohorts of 400, 902, and 3,869 exomes. The blacklist generated from any given cohort removed a substantial proportion of NPVs (11-65%). We analyzed the blacklisted variants computationally and experimentally. Most of the blacklisted variants corresponded to false signals generated by incomplete reference genome assembly, location in low-complexity regions, bioinformatic misprocessing, or limitations inherent to cohort-specific private alleles (e.g., due to sequencing kits, and genetic ancestries). Finally, we provide our precalculated blacklists, together with ReFiNE, a program for generating customized blacklists from any medium-sized or large in-house cohort of exome (or other next-generation sequencing) data via a user-friendly public web server. This work demonstrates the power of extracting variant blacklists from private databases as a specific in-house but broadly applicable tool for optimizing exome analysis.


Assuntos
Bases de Dados de Ácidos Nucleicos , Exoma , Variação Genética , Genoma Humano , Análise de Sequência de DNA , Software , Estudos de Coortes , Feminino , Humanos , Masculino
12.
Hum Genet ; 140(9): 1329-1342, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-34173867

RESUMO

A non-negligible proportion of human pathogenic variants are known to be present as wild type in at least some non-human mammalian species. The standard explanation for this finding is that molecular mechanisms of compensatory epistasis can alleviate the mutations' otherwise pathogenic effects. Examples of compensated variants have been described in the literature but the interacting residue(s) postulated to play a compensatory role have rarely been ascertained. In this study, the examination of five human X-chromosomally encoded proteins (FIX, GLA, HPRT1, NDP and OTC) allowed us to identify several candidate compensated variants. Strong evidence for a compensated/compensatory pair of amino acids in the coagulation FIXa protein (involving residues 270 and 271) was found in a variety of mammalian species. Both amino acid residues are located within the 60-loop, spatially close to the 39-loop that performs a key role in coagulation serine proteases. To understand the nature of the underlying interactions, molecular dynamics simulations were performed. The predicted conformational change in the 39-loop consequent to the Glu270Lys substitution (associated with hemophilia B) appears to impair the protein's interaction with its substrate but, importantly, such steric hindrance is largely mitigated in those proteins that carry the compensatory residue (Pro271) at the neighboring amino acid position.


Assuntos
Cromossomos Humanos X/genética , Epistasia Genética , Fator IXa , Simulação de Dinâmica Molecular , Mutação de Sentido Incorreto , Substituição de Aminoácidos , Fator IXa/química , Fator IXa/genética , Humanos
13.
Nucleic Acids Res ; 47(W1): W623-W631, 2019 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-31045209

RESUMO

Human whole-genome-sequencing reveals about 4 000 000 genomic variants per individual. These data are mostly stored as VCF-format files. Although many variant analysis methods accept VCF as input, many other tools require DNA or protein sequences, particularly for splicing prediction, sequence alignment, phylogenetic analysis, and structure prediction. However, there is no existing webserver capable of extracting DNA/protein sequences for genomic variants from VCF files in a user-friendly and efficient manner. We developed the SeqTailor webserver to bridge this gap, by enabling rapid extraction of (i) DNA sequences around genomic variants, with customizable window sizes and options to annotate the splice sites closest to the variants and to consider the neighboring variants within the window; and (ii) protein sequences encoded by the DNA sequences around genomic variants, with built-in SnpEff annotator and customizable window sizes. SeqTailor supports 11 species, including: human (GRCh37/GRCh38), chimpanzee, mouse, rat, cow, chicken, lizard, zebrafish, fruitfly, Arabidopsis and rice. Standalone programs are provided for command-line-based needs. SeqTailor streamlines the sequence extraction process, and accelerates the analysis of genomic variants with software requiring DNA/protein sequences. It will facilitate the study of genomic variation, by increasing the feasibility of sequence-based analysis and prediction. The SeqTailor webserver is freely available at http://shiva.rockefeller.edu/SeqTailor/.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Software , Animais , Bovinos , Variação Genética , Humanos , Mutação INDEL , Internet , Camundongos , Ratos
14.
Hum Genet ; 139(10): 1197-1207, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32596782

RESUMO

The Human Gene Mutation Database (HGMD®) constitutes a comprehensive collection of published germline mutations in nuclear genes that are thought to underlie, or are closely associated with human inherited disease. At the time of writing (June 2020), the database contains in excess of 289,000 different gene lesions identified in over 11,100 genes manually curated from 72,987 articles published in over 3100 peer-reviewed journals. There are primarily two main groups of users who utilise HGMD on a regular basis; research scientists and clinical diagnosticians. This review aims to highlight how to make the most out of HGMD data in each setting.


Assuntos
Bases de Dados Genéticas , Genoma Humano , Mutação em Linhagem Germinativa , Polimorfismo Genético , Bibliometria , Pesquisa Biomédica/métodos , Predisposição Genética para Doença , Humanos , Parcerias Público-Privadas
15.
Genet Med ; 22(2): 362-370, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31467448

RESUMO

PURPOSE: Both monogenic pathogenic variant cataloging and clinical patient diagnosis start with variant-level evidence retrieval followed by expert evidence integration in search of diagnostic variants and genes. Here, we try to accelerate pathogenic variant evidence retrieval by an automatic approach. METHODS: Automatic VAriant evidence DAtabase (AVADA) is a novel machine learning tool that uses natural language processing to automatically identify pathogenic genetic variant evidence in full-text primary literature about monogenic disease and convert it to genomic coordinates. RESULTS: AVADA automatically retrieved almost 60% of likely disease-causing variants deposited in the Human Gene Mutation Database (HGMD), a 4.4-fold improvement over the current best open source automated variant extractor. AVADA contains over 60,000 likely disease-causing variants that are in HGMD but not in ClinVar. AVADA also highlights the challenges of automated variant mapping and pathogenicity curation. However, when combined with manual validation, on 245 diagnosed patients, AVADA provides valuable evidence for an additional 18 diagnostic variants, on top of ClinVar's 21, versus only 2 using the best current automated approach. CONCLUSION: AVADA advances automated retrieval of pathogenic monogenic variant evidence from full-text literature. Far from perfect, but much faster than PubMed/Google Scholar search, careful curation of AVADA-retrieved evidence can aid both database curation and patient diagnosis.


Assuntos
Processamento Eletrônico de Dados/métodos , Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Gerenciamento de Dados/métodos , Bases de Dados Factuais , Bases de Dados Genéticas , Humanos , Processamento de Linguagem Natural , PubMed , Publicações
16.
Am J Hum Genet ; 96(6): 913-25, 2015 Jun 04.
Artigo em Inglês | MEDLINE | ID: mdl-26046366

RESUMO

Next-generation sequencing provides the opportunity to practice predictive medicine based on identified variants. Putative loss-of-function (pLOF) variants are common in genomes and understanding their contribution to disease is critical for predictive medicine. To this end, we characterized the consequences of pLOF variants in an exome cohort by iterative phenotyping. Exome data were generated on 951 participants from the ClinSeq cohort and filtered for pLOF variants in genes likely to cause a phenotype in heterozygotes. 103 of 951 exomes had such a pLOF variant and 79 participants were evaluated. Of those 79, 34 had findings or family histories that could be attributed to the variant (28 variants in 18 genes), 2 had indeterminate findings (2 variants in 2 genes), and 43 had no findings or a negative family history for the trait (34 variants in 28 genes). The presence of a phenotype was correlated with two mutation attributes: prior report of pathogenicity for the variant (p = 0.0001) and prior report of other mutations in the same exon (p = 0.0001). We conclude that 1/30 unselected individuals harbor a pLOF mutation associated with a phenotype either in themselves or their family. This is more common than has been assumed and has implications for the setting of prior probabilities of affection status for predictive medicine.


Assuntos
Aterosclerose/genética , Estudo de Associação Genômica Ampla/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação/genética , Fenótipo , Medicina de Precisão/métodos , Biologia Computacional , Exoma/genética , Feminino , Estudo de Associação Genômica Ampla/tendências , Humanos , Masculino , Pessoa de Meia-Idade
17.
Nature ; 483(7388): 169-75, 2012 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-22398555

RESUMO

Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.


Assuntos
Evolução Molecular , Especiação Genética , Genoma/genética , Gorilla gorilla/genética , Animais , Feminino , Regulação da Expressão Gênica , Variação Genética/genética , Genômica , Humanos , Macaca mulatta/genética , Dados de Sequência Molecular , Pan troglodytes/genética , Filogenia , Pongo/genética , Proteínas/genética , Alinhamento de Sequência , Especificidade da Espécie , Transcrição Gênica
18.
Proc Natl Acad Sci U S A ; 112(44): 13615-20, 2015 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-26483451

RESUMO

The protein-coding exome of a patient with a monogenic disease contains about 20,000 variants, only one or two of which are disease causing. We found that 58% of rare variants in the protein-coding exome of the general population are located in only 2% of the genes. Prompted by this observation, we aimed to develop a gene-level approach for predicting whether a given human protein-coding gene is likely to harbor disease-causing mutations. To this end, we derived the gene damage index (GDI): a genome-wide, gene-level metric of the mutational damage that has accumulated in the general population. We found that the GDI was correlated with selective evolutionary pressure, protein complexity, coding sequence length, and the number of paralogs. We compared GDI with the leading gene-level approaches, genic intolerance, and de novo excess, and demonstrated that GDI performed best for the detection of false positives (i.e., removing exome variants in genes irrelevant to disease), whereas genic intolerance and de novo excess performed better for the detection of true positives (i.e., assessing de novo mutations in genes likely to be disease causing). The GDI server, data, and software are freely available to noncommercial users from lab.rockefeller.edu/casanova/GDI.


Assuntos
Exoma , Doenças Genéticas Inatas/genética , Humanos , Curva ROC
19.
Hum Mol Genet ; 24(21): 5995-6002, 2015 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-26246501

RESUMO

The role of rare missense variants in disease causation remains difficult to interpret. We explore whether the clustering pattern of rare missense variants (MAF < 0.01) in a protein is associated with mode of inheritance. Mutations in genes associated with autosomal dominant (AD) conditions are known to result in either loss or gain of function, whereas mutations in genes associated with autosomal recessive (AR) conditions invariably result in loss-of-function. Loss-of-function mutations tend to be distributed uniformly along protein sequence, whereas gain-of-function mutations tend to localize to key regions. It has not previously been ascertained whether these patterns hold in general for rare missense mutations. We consider the extent to which rare missense variants are located within annotated protein domains and whether they form clusters, using a new unbiased method called CLUstering by Mutation Position. These approaches quantified a significant difference in clustering between AD and AR diseases. Proteins linked to AD diseases exhibited more clustering of rare missense mutations than those linked to AR diseases (Wilcoxon P = 5.7 × 10(-4), permutation P = 8.4 × 10(-4)). Rare missense mutation in proteins linked to either AD or AR diseases was more clustered than controls (1000G) (Wilcoxon P = 2.8 × 10(-15) for AD and P = 4.5 × 10(-4) for AR, permutation P = 3.1 × 10(-12) for AD and P = 0.03 for AR). The differences in clustering patterns persisted even after removal of the most prominent genes. Testing for such non-random patterns may reveal novel aspects of disease etiology in large sample studies.


Assuntos
Genes Dominantes , Genes Recessivos , Doenças Genéticas Inatas/genética , Mutação de Sentido Incorreto , Proteínas/genética , Biologia Computacional , Bases de Dados Genéticas , Genoma Humano , Humanos , Anotação de Sequência Molecular , Família Multigênica
20.
Hum Genet ; 136(6): 665-677, 2017 06.
Artigo em Inglês | MEDLINE | ID: mdl-28349240

RESUMO

The Human Gene Mutation Database (HGMD®) constitutes a comprehensive collection of published germline mutations in nuclear genes that underlie, or are closely associated with human inherited disease. At the time of writing (March 2017), the database contained in excess of 203,000 different gene lesions identified in over 8000 genes manually curated from over 2600 journals. With new mutation entries currently accumulating at a rate exceeding 17,000 per annum, HGMD represents de facto the central unified gene/disease-oriented repository of heritable mutations causing human genetic disease used worldwide by researchers, clinicians, diagnostic laboratories and genetic counsellors, and is an essential tool for the annotation of next-generation sequencing data. The public version of HGMD ( http://www.hgmd.org ) is freely available to registered users from academic institutions and non-profit organisations whilst the subscription version (HGMD Professional) is available to academic, clinical and commercial users under license via QIAGEN Inc.


Assuntos
Bases de Dados Genéticas , Mutação , Humanos , Técnicas de Diagnóstico Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA