Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
Filtros adicionais











País/Região como assunto
Intervalo de ano
1.
Nat Biotechnol ; 2019 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-31406327

RESUMO

The DNA sequencing technologies in use today produce either highly accurate short reads or less-accurate long reads. We report the optimization of circular consensus sequencing (CCS) to improve the accuracy of single-molecule real-time (SMRT) sequencing (PacBio) and generate highly accurate (99.8%) long high-fidelity (HiFi) reads with an average length of 13.5 kilobases (kb). We applied our approach to sequence the well-characterized human HG002/NA24385 genome and obtained precision and recall rates of at least 99.91% for single-nucleotide variants (SNVs), 95.98% for insertions and deletions <50 bp (indels) and 95.99% for structural variants. Our CCS method matches or exceeds the ability of short-read sequencing to detect small variants and structural variants. We estimate that 2,434 discordances are correctable mistakes in the 'genome in a bottle' (GIAB) benchmark set. Nearly all (99.64%) variants can be phased into haplotypes, further improving variant detection. De novo genome assembly using CCS reads alone produced a contiguous and accurate genome with a contig N50 of >15 megabases (Mb) and concordance of 99.997%, substantially outperforming assembly with less-accurate long reads.

2.
Pac Symp Biocomput ; 24: 224-235, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30864325

RESUMO

Copy number variants (CNVs) are an important type of genetic variation that play a causal role in many diseases. The ability to identify high quality CNVs is of substantial clinical relevance. However, CNVs are notoriously difficult to identify accurately from array-based methods and next-generation sequencing (NGS) data, particularly for small (< 10kbp) CNVs. Manual curation by experts widely remains the gold standard but cannot scale with the pace of sequencing, particularly in fast-growing clinical applications. We present the first proof-of-principle study demonstrating high throughput manual curation of putative CNVs by non-experts. We developed a crowdsourcing framework, called CrowdVariant, that leverages Google's high-throughput crowdsourcing platform to create a high confidence set of deletions for NA24385 (NIST HG002/RM 8391), an Ashkenazim reference sample developed in partnership with the Genome In A Bottle (GIAB) Consortium. We show that non-experts tend to agree both with each other and with experts on putative CNVs. We show that crowdsourced non-expert classifications can be used to accurately assign copy number status to putative CNV calls and identify 1,781 high confidence deletions in a reference sample. Multiple lines of evidence suggest these calls are a substantial improvement over existing CNV callsets and can also be useful in benchmarking and improving CNV calling algorithms. Our crowdsourcing methodology takes the first step toward showing the clinical potential for manual curation of CNVs at scale and can further guide other crowdsourcing genomics applications.


Assuntos
Crowdsourcing/métodos , Variações do Número de Cópias de DNA , Algoritmos , Biologia Computacional/métodos , Curadoria de Dados , Genoma Humano , Genômica/métodos , Genômica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Análise de Sequência de DNA/estatística & dados numéricos
3.
Bioinformatics ; 2019 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-30916319

RESUMO

SUMMARY: Reference genomes are refined to reflect error corrections and other improvements. While this process improves novel data generation and analysis, incorporating data analyzed on an older reference genome assembly requires transforming the coordinates and representations of the data to the new assembly. Multiple tools exist to perform this transformation for coordinate-only data types, but none supports accurate transformation of genome-wide short variation. Here we present GenomeWarp, a tool for efficiently transforming variants between genome assemblies. GenomeWarp transforms regions and short variants in a conservative manner to minimize false positive and negative variants in the target genome, and converts over 99% of regions and short variants from a representative human genome. AVAILABILITY AND IMPLEMENTATION: GenomeWarp is written in Java. All source code and the user manual are freely available at https://github.com/verilylifesciences/genomewarp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Nat Med ; 25(1): 24-29, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30617335

RESUMO

Here we present deep-learning techniques for healthcare, centering our discussion on deep learning in computer vision, natural language processing, reinforcement learning, and generalized methods. We describe how these computational techniques can impact a few key areas of medicine and explore how to build end-to-end systems. Our discussion of computer vision focuses largely on medical imaging, and we describe the application of natural language processing to domains such as electronic health record data. Similarly, reinforcement learning is discussed in the context of robotic-assisted surgery, and generalized deep-learning methods for genomics are reviewed.


Assuntos
Aprendizado Profundo , Assistência à Saúde , Diagnóstico por Imagem , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural
5.
Nat Biotechnol ; 36(10): 983-987, 2018 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-30247488

RESUMO

Despite rapid advances in sequencing technologies, accurately calling genetic variants present in an individual genome from billions of short, errorful sequence reads remains challenging. Here we show that a deep convolutional neural network can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships between images of read pileups around putative variant and true genotype calls. The approach, called DeepVariant, outperforms existing state-of-the-art tools. The learned model generalizes across genome builds and mammalian species, allowing nonhuman sequencing projects to benefit from the wealth of human ground-truth data. We further show that DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, including deep whole genomes from 10X Genomics and Ion Ampliseq exomes, highlighting the benefits of using more automated and generalizable techniques for variant calling.

6.
Hum Mol Genet ; 27(R1): R63-R71, 2018 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-29648622

RESUMO

The human genome is now investigated through high-throughput functional assays, and through the generation of population genomic data. These advances support the identification of functional genetic variants and the prediction of traits (e.g. deleterious variants and disease). This review summarizes lessons learned from the large-scale analyses of genome and exome data sets, modeling of population data and machine-learning strategies to solve complex genomic sequence regions. The review also portrays the rapid adoption of artificial intelligence/deep neural networks in genomics; in particular, deep learning approaches are well suited to model the complex dependencies in the regulatory landscape of the genome, and to provide predictors for genetic variant calling and interpretation.

7.
Proc Natl Acad Sci U S A ; 115(2): 379-384, 2018 01 09.
Artigo em Inglês | MEDLINE | ID: mdl-29279374

RESUMO

A major challenge in evaluating the contribution of rare variants to complex disease is identifying enough copies of the rare alleles to permit informative statistical analysis. To investigate the contribution of rare variants to the risk of type 2 diabetes (T2D) and related traits, we performed deep whole-genome analysis of 1,034 members of 20 large Mexican-American families with high prevalence of T2D. If rare variants of large effect accounted for much of the diabetes risk in these families, our experiment was powered to detect association. Using gene expression data on 21,677 transcripts for 643 pedigree members, we identified evidence for large-effect rare-variant cis-expression quantitative trait loci that could not be detected in population studies, validating our approach. However, we did not identify any rare variants of large effect associated with T2D, or the related traits of fasting glucose and insulin, suggesting that large-effect rare variants account for only a modest fraction of the genetic risk of these traits in this sample of families. Reliable identification of large-effect rare variants will require larger samples of extended pedigrees or different study designs that further enrich for such variants.


Assuntos
Diabetes Mellitus Tipo 2/genética , Predisposição Genética para Doença/genética , Variação Genética , Americanos Mexicanos/genética , Diabetes Mellitus Tipo 2/etnologia , Diabetes Mellitus Tipo 2/patologia , Saúde da Família , Feminino , Frequência do Gene , Predisposição Genética para Doença/etnologia , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Masculino , Linhagem , Fenótipo , Locos de Características Quantitativas/genética , Sequenciamento Completo do Genoma/métodos
8.
Eur J Hum Genet ; 25(2): 227-233, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-27876817

RESUMO

Germline mutation detection from human DNA sequence data is challenging due to the rarity of such events relative to the intrinsic error rates of sequencing technologies and the uneven coverage across the genome. We developed PhaseByTransmission (PBT) to identify de novo single nucleotide variants and short insertions and deletions (indels) from sequence data collected in parent-offspring trios. We compute the joint probability of the data given the genotype likelihoods in the individual family members, the known familial relationships and a prior probability for the mutation rate. Candidate de novo mutations (DNMs) are reported along with their posterior probability, providing a systematic way to prioritize them for validation. Our tool is integrated in the Genome Analysis Toolkit and can be used together with the ReadBackedPhasing module to infer the parental origin of DNMs based on phase-informative reads. Using simulated data, we show that PBT outperforms existing tools, especially in low coverage data and on the X chromosome. We further show that PBT displays high validation rates on empirical parent-offspring sequencing data for whole-exome data from 104 trios and X-chromosome data from 249 parent-offspring families. Finally, we demonstrate an association between father's age at conception and the number of DNMs in female offspring's X chromosome, consistent with previous literature reports.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Mutação em Linhagem Germinativa , Linhagem , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Software , Adulto , Criança , Cromossomos Humanos X/genética , Exoma , Feminino , Genótipo , Humanos , Masculino , Modelos Genéticos
9.
Nature ; 536(7616): 285-91, 2016 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-27535533

RESUMO

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.


Assuntos
Exoma/genética , Variação Genética/genética , Análise Mutacional de DNA , Conjuntos de Dados como Assunto , Humanos , Fenótipo , Proteoma/genética , Doenças Raras/genética , Tamanho da Amostra
10.
Nature ; 518(7537): 102-6, 2015 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-25487149

RESUMO

Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families, whereas common variants at more than 45 loci have been associated with MI risk in the population. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). Approximately 2% of early MI cases harbour a rare, damaging mutation in LDLR; this estimate is similar to one made more than 40 years ago using an analysis of total cholesterol. Among controls, about 1 in 217 carried an LDLR coding-sequence mutation and had plasma LDL cholesterol > 190 mg dl(-1). At apolipoprotein A-V (APOA5), carriers of rare non-synonymous mutations were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol, whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding-sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase and apolipoprotein C-III (refs 18, 19). Combined, these observations suggest that, as well as LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.


Assuntos
Alelos , Apolipoproteínas A/genética , Exoma/genética , Predisposição Genética para Doença/genética , Infarto do Miocárdio/genética , Receptores de LDL/genética , Fatores Etários , Idade de Início , Apolipoproteína A-V , Estudos de Casos e Controles , LDL-Colesterol/sangue , Doença da Artéria Coronariana/genética , Feminino , Genética Populacional , Heterozigoto , Humanos , Masculino , Pessoa de Meia-Idade , Mutação/genética , Infarto do Miocárdio/sangue , National Heart, Lung, and Blood Institute (U.S.) , Triglicerídeos/sangue , Estados Unidos
11.
Nat Genet ; 46(9): 944-50, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25086666

RESUMO

Spontaneously arising (de novo) mutations have an important role in medical genetics. For diseases with extensive locus heterogeneity, such as autism spectrum disorders (ASDs), the signal from de novo mutations is distributed across many genes, making it difficult to distinguish disease-relevant mutations from background variation. Here we provide a statistical framework for the analysis of excesses in de novo mutation per gene and gene set by calibrating a model of de novo mutation. We applied this framework to de novo mutations collected from 1,078 ASD family trios, and, whereas we affirmed a significant role for loss-of-function mutations, we found no excess of de novo loss-of-function mutations in cases with IQ above 100, suggesting that the role of de novo mutations in ASDs might reside in fundamental neurodevelopmental processes. We also used our model to identify ∼1,000 genes that are significantly lacking in functional coding variation in non-ASD samples and are enriched for de novo loss-of-function mutations identified in ASD cases.


Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Mutação , Exoma , Feminino , Código Genético , Predisposição Genética para Doença , Genética Médica/métodos , Humanos , Masculino
12.
Genome Biol ; 15(6): R88, 2014 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-24980144

RESUMO

BACKGROUND: Population differentiation has proved to be effective for identifying loci under geographically localized positive selection, and has the potential to identify loci subject to balancing selection. We have previously investigated the pattern of genetic differentiation among human populations at 36.8 million genomic variants to identify sites in the genome showing high frequency differences. Here, we extend this dataset to include additional variants, survey sites with low levels of differentiation, and evaluate the extent to which highly differentiated sites are likely to result from selective or other processes. RESULTS: We demonstrate that while sites with low differentiation represent sampling effects rather than balancing selection, sites showing extremely high population differentiation are enriched for positive selection events and that one half may be the result of classic selective sweeps. Among these, we rediscover known examples, where we actually identify the established functional SNP, and discover novel examples including the genes ABCA12, CALD1 and ZNF804, which we speculate may be linked to adaptations in skin, calcium metabolism and defense, respectively. CONCLUSIONS: We identify known and many novel candidate regions for geographically restricted positive selection, and suggest several directions for further research.


Assuntos
Genoma Humano , Mutação INDEL , Polimorfismo de Nucleotídeo Único , Frequência do Gene , Deriva Genética , Humanos , Seleção Genética , Análise de Sequência de DNA
13.
N Engl J Med ; 371(1): 22-31, 2014 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-24941081

RESUMO

BACKGROUND: Plasma triglyceride levels are heritable and are correlated with the risk of coronary heart disease. Sequencing of the protein-coding regions of the human genome (the exome) has the potential to identify rare mutations that have a large effect on phenotype. METHODS: We sequenced the protein-coding regions of 18,666 genes in each of 3734 participants of European or African ancestry in the Exome Sequencing Project. We conducted tests to determine whether rare mutations in coding sequence, individually or in aggregate within a gene, were associated with plasma triglyceride levels. For mutations associated with triglyceride levels, we subsequently evaluated their association with the risk of coronary heart disease in 110,970 persons. RESULTS: An aggregate of rare mutations in the gene encoding apolipoprotein C3 (APOC3) was associated with lower plasma triglyceride levels. Among the four mutations that drove this result, three were loss-of-function mutations: a nonsense mutation (R19X) and two splice-site mutations (IVS2+1G→A and IVS3+1G→T). The fourth was a missense mutation (A43T). Approximately 1 in 150 persons in the study was a heterozygous carrier of at least one of these four mutations. Triglyceride levels in the carriers were 39% lower than levels in noncarriers (P<1×10(-20)), and circulating levels of APOC3 in carriers were 46% lower than levels in noncarriers (P=8×10(-10)). The risk of coronary heart disease among 498 carriers of any rare APOC3 mutation was 40% lower than the risk among 110,472 noncarriers (odds ratio, 0.60; 95% confidence interval, 0.47 to 0.75; P=4×10(-6)). CONCLUSIONS: Rare mutations that disrupt APOC3 function were associated with lower levels of plasma triglycerides and APOC3. Carriers of these mutations were found to have a reduced risk of coronary heart disease. (Funded by the National Heart, Lung, and Blood Institute and others.).


Assuntos
Apolipoproteína C-III/genética , Doença das Coronárias/genética , Mutação , Triglicerídeos/sangue , Grupo com Ancestrais do Continente Africano/genética , Apolipoproteína C-III/sangue , Doença das Coronárias/sangue , Grupo com Ancestrais do Continente Europeu/genética , Exoma , Genótipo , Heterozigoto , Humanos , Fígado/patologia , Fatores de Risco , Análise de Sequência de DNA
14.
Nature ; 506(7487): 185-90, 2014 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-24463508

RESUMO

Schizophrenia is a common disease with a complex aetiology, probably involving multiple and heterogeneous genetic factors. Here, by analysing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we demonstrate a polygenic burden primarily arising from rare (less than 1 in 10,000), disruptive mutations distributed across many genes. Particularly enriched gene sets include the voltage-gated calcium ion channel and the signalling complex formed by the activity-regulated cytoskeleton-associated scaffold protein (ARC) of the postsynaptic density, sets previously implicated by genome-wide association and copy-number variation studies. Similar to reports in autism, targets of the fragile X mental retardation protein (FMRP, product of FMR1) are enriched for case mutations. No individual gene-based test achieves significance after correction for multiple testing and we do not detect any alleles of moderately low frequency (approximately 0.5 to 1 per cent) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene-mapping paradigms in neuropsychiatric disease.


Assuntos
Herança Multifatorial/genética , Mutação/genética , Esquizofrenia/genética , Transtorno Autístico/genética , Canais de Cálcio/genética , Proteínas do Citoesqueleto/genética , Variações do Número de Cópias de DNA/genética , Proteína 4 Homóloga a Disks-Large , Feminino , Proteína do X Frágil de Retardo Mental/metabolismo , Estudo de Associação Genômica Ampla , Humanos , Deficiência Intelectual/genética , Peptídeos e Proteínas de Sinalização Intracelular/genética , Masculino , Proteínas de Membrana/genética , Proteínas do Tecido Nervoso/genética , Receptores de N-Metil-D-Aspartato/genética
15.
Neuron ; 77(2): 235-42, 2013 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-23352160

RESUMO

To characterize the role of rare complete human knockouts in autism spectrum disorders (ASDs), we identify genes with homozygous or compound heterozygous loss-of-function (LoF) variants (defined as nonsense and essential splice sites) from exome sequencing of 933 cases and 869 controls. We identify a 2-fold increase in complete knockouts of autosomal genes with low rates of LoF variation (≤ 5% frequency) in cases and estimate a 3% contribution to ASD risk by these events, confirming this observation in an independent set of 563 probands and 4,605 controls. Outside the pseudoautosomal regions on the X chromosome, we similarly observe a significant 1.5-fold increase in rare hemizygous knockouts in males, contributing to another 2% of ASDs in males. Taken together, these results provide compelling evidence that rare autosomal and X chromosome complete gene knockouts are important inherited risk factors for ASD.


Assuntos
Transtornos Globais do Desenvolvimento Infantil/diagnóstico , Transtornos Globais do Desenvolvimento Infantil/genética , Demografia/métodos , Deleção de Genes , Perda de Heterozigosidade/genética , Estudos de Casos e Controles , Transtornos Globais do Desenvolvimento Infantil/epidemiologia , Pré-Escolar , Cromossomos Humanos X/genética , Feminino , Variação Genética/genética , Homozigoto , Humanos , Desequilíbrio de Ligação/genética , Masculino , Fatores de Risco
16.
Curr Protoc Bioinformatics ; 43: 11.10.1-33, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-25431634

RESUMO

This unit describes how to use BWA and the Genome Analysis Toolkit (GATK) to map genome sequencing data to a reference and produce high-quality variant calls that can be used in downstream analyses. The complete workflow includes the core NGS data processing steps that are necessary to make the raw data suitable for analysis by the GATK, as well as the key methods involved in variant discovery using the GATK.


Assuntos
Variação Genética , Genoma Humano , Software , Calibragem , Bases de Dados Genéticas , Haploidia , Haplótipos/genética , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , Alinhamento de Sequência
17.
Nature ; 491(7422): 56-65, 2012 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-23128226

RESUMO

By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.


Assuntos
Variação Genética/genética , Genética Populacional , Genoma Humano/genética , Genômica , Alelos , Sítios de Ligação/genética , Sequência Conservada/genética , Grupos de Populações Continentais/genética , Evolução Molecular , Genética Médica , Estudo de Associação Genômica Ampla , Haplótipos/genética , Humanos , Motivos de Nucleotídeos , Polimorfismo de Nucleotídeo Único/genética , Deleção de Sequência/genética , Fatores de Transcrição/metabolismo
18.
BMC Genomics ; 13: 375, 2012 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-22863213

RESUMO

BACKGROUND: Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome some limitations of current next generation sequencing platforms by providing significantly longer reads, single molecule sequencing, low composition bias and an error profile that is orthogonal to other platforms. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical amplicon resequencing projects. RESULTS: We evaluated the Pacific Biosciences technology for SNP discovery in medical resequencing projects using the Genome Analysis Toolkit, observing high sensitivity and specificity for calling differences in amplicons containing known true or false SNPs. We assessed data quality: most errors were indels (~14%) with few apparent miscalls (~1%). In this work, we define a custom data processing pipeline for Pacific Biosciences data for human data analysis. CONCLUSION: Critically, the error properties were largely free of the context-specific effects that affect other sequencing technologies. These data show excellent utility for follow-up validation and extension studies in human data and medical genetics projects, but can be extended to other organisms with a reference genome.


Assuntos
Análise de Sequência de DNA , Variação Genética , Genoma Humano , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único , Software , Interface Usuário-Computador
19.
Nature ; 485(7397): 242-5, 2012 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-22495311

RESUMO

Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified. To identify further genetic risk factors, here we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n = 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant, and the overall rate of mutation is only modestly higher than the expected rate. In contrast, the proteins encoded by genes that harboured de novo missense or nonsense mutations showed a higher degree of connectivity among themselves and to previous ASD genes as indexed by protein-protein interaction screens. The small increase in the rate of de novo events, when taken together with the protein interaction results, are consistent with an important but limited role for de novo point mutations in ASD, similar to that documented for de novo copy number variants. Genetic models incorporating these data indicate that most of the observed de novo events are unconnected to ASD; those that do confer risk are distributed across many genes and are incompletely penetrant (that is, not necessarily sufficient for disease). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5- to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case-control study provide strong evidence in favour of CHD8 and KATNAL2 as genuine autism risk factors.


Assuntos
Transtorno Autístico/genética , Proteínas de Ligação a DNA/genética , Éxons/genética , Predisposição Genética para Doença/genética , Mutação/genética , Fatores de Transcrição/genética , Estudos de Casos e Controles , Exoma/genética , Saúde da Família , Humanos , Modelos Genéticos , Herança Multifatorial/genética , Fenótipo , Distribuição de Poisson , Mapas de Interação de Proteínas
20.
Science ; 335(6070): 823-8, 2012 Feb 17.
Artigo em Inglês | MEDLINE | ID: mdl-22344438

RESUMO

Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.


Assuntos
Variação Genética , Genoma Humano , Proteínas/genética , Doença/genética , Expressão Gênica , Frequência do Gene , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Seleção Genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA