Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 56
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 601(7893): 422-427, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34987224

RESUMO

Maternal morbidity and mortality continue to rise, and pre-eclampsia is a major driver of this burden1. Yet the ability to assess underlying pathophysiology before clinical presentation to enable identification of pregnancies at risk remains elusive. Here we demonstrate the ability of plasma cell-free RNA (cfRNA) to reveal patterns of normal pregnancy progression and determine the risk of developing pre-eclampsia months before clinical presentation. Our results centre on comprehensive transcriptome data from eight independent prospectively collected cohorts comprising 1,840 racially diverse pregnancies and retrospective analysis of 2,539 banked plasma samples. The pre-eclampsia data include 524 samples (72 cases and 452 non-cases) from two diverse independent cohorts collected 14.5 weeks (s.d., 4.5 weeks) before delivery. We show that cfRNA signatures from a single blood draw can track pregnancy progression at the placental, maternal and fetal levels and can robustly predict pre-eclampsia, with a sensitivity of 75% and a positive predictive value of 32.3% (s.d., 3%), which is superior to the state-of-the-art method2. cfRNA signatures of normal pregnancy progression and pre-eclampsia are independent of clinical factors, such as maternal age, body mass index and race, which cumulatively account for less than 1% of model variance. Further, the cfRNA signature for pre-eclampsia contains gene features linked to biological processes implicated in the underlying pathophysiology of pre-eclampsia.


Assuntos
Ácidos Nucleicos Livres , Pré-Eclâmpsia , RNA , Ácidos Nucleicos Livres/sangue , Feminino , Humanos , Pré-Eclâmpsia/diagnóstico , Pré-Eclâmpsia/genética , Valor Preditivo dos Testes , Gravidez , RNA/sangue , Estudos Retrospectivos , Sensibilidade e Especificidade
2.
Nature ; 536(7616): 285-91, 2016 08 18.
Artigo em Inglês | MEDLINE | ID: mdl-27535533

RESUMO

Large-scale reference data sets of human genetic variation are critical for the medical and functional interpretation of DNA sequence changes. Here we describe the aggregation and analysis of high-quality exome (protein-coding region) DNA sequence data for 60,706 individuals of diverse ancestries generated as part of the Exome Aggregation Consortium (ExAC). This catalogue of human genetic diversity contains an average of one variant every eight bases of the exome, and provides direct evidence for the presence of widespread mutational recurrence. We have used this catalogue to calculate objective metrics of pathogenicity for sequence variants, and to identify genes subject to strong selection against various classes of mutation; identifying 3,230 genes with near-complete depletion of predicted protein-truncating variants, with 72% of these genes having no currently established human disease phenotype. Finally, we demonstrate that these data can be used for the efficient filtering of candidate disease-causing variants, and for the discovery of human 'knockout' variants in protein-coding genes.


Assuntos
Exoma/genética , Variação Genética/genética , Análise Mutacional de DNA , Conjuntos de Dados como Assunto , Humanos , Fenótipo , Proteoma/genética , Doenças Raras/genética , Tamanho da Amostra
3.
Nature ; 518(7537): 102-6, 2015 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-25487149

RESUMO

Myocardial infarction (MI), a leading cause of death around the world, displays a complex pattern of inheritance. When MI occurs early in life, genetic inheritance is a major component to risk. Previously, rare mutations in low-density lipoprotein (LDL) genes have been shown to contribute to MI risk in individual families, whereas common variants at more than 45 loci have been associated with MI risk in the population. Here we evaluate how rare mutations contribute to early-onset MI risk in the population. We sequenced the protein-coding regions of 9,793 genomes from patients with MI at an early age (≤50 years in males and ≤60 years in females) along with MI-free controls. We identified two genes in which rare coding-sequence mutations were more frequent in MI cases versus controls at exome-wide significance. At low-density lipoprotein receptor (LDLR), carriers of rare non-synonymous mutations were at 4.2-fold increased risk for MI; carriers of null alleles at LDLR were at even higher risk (13-fold difference). Approximately 2% of early MI cases harbour a rare, damaging mutation in LDLR; this estimate is similar to one made more than 40 years ago using an analysis of total cholesterol. Among controls, about 1 in 217 carried an LDLR coding-sequence mutation and had plasma LDL cholesterol > 190 mg dl(-1). At apolipoprotein A-V (APOA5), carriers of rare non-synonymous mutations were at 2.2-fold increased risk for MI. When compared with non-carriers, LDLR mutation carriers had higher plasma LDL cholesterol, whereas APOA5 mutation carriers had higher plasma triglycerides. Recent evidence has connected MI risk with coding-sequence mutations at two genes functionally related to APOA5, namely lipoprotein lipase and apolipoprotein C-III (refs 18, 19). Combined, these observations suggest that, as well as LDL cholesterol, disordered metabolism of triglyceride-rich lipoproteins contributes to MI risk.


Assuntos
Alelos , Apolipoproteínas A/genética , Exoma/genética , Predisposição Genética para Doença/genética , Infarto do Miocárdio/genética , Receptores de LDL/genética , Fatores Etários , Idade de Início , Apolipoproteína A-V , Estudos de Casos e Controles , LDL-Colesterol/sangue , Doença da Artéria Coronariana/genética , Feminino , Genética Populacional , Heterozigoto , Humanos , Masculino , Pessoa de Meia-Idade , Mutação/genética , Infarto do Miocárdio/sangue , National Heart, Lung, and Blood Institute (U.S.) , Triglicerídeos/sangue , Estados Unidos
4.
Proc Natl Acad Sci U S A ; 115(2): 379-384, 2018 01 09.
Artigo em Inglês | MEDLINE | ID: mdl-29279374

RESUMO

A major challenge in evaluating the contribution of rare variants to complex disease is identifying enough copies of the rare alleles to permit informative statistical analysis. To investigate the contribution of rare variants to the risk of type 2 diabetes (T2D) and related traits, we performed deep whole-genome analysis of 1,034 members of 20 large Mexican-American families with high prevalence of T2D. If rare variants of large effect accounted for much of the diabetes risk in these families, our experiment was powered to detect association. Using gene expression data on 21,677 transcripts for 643 pedigree members, we identified evidence for large-effect rare-variant cis-expression quantitative trait loci that could not be detected in population studies, validating our approach. However, we did not identify any rare variants of large effect associated with T2D, or the related traits of fasting glucose and insulin, suggesting that large-effect rare variants account for only a modest fraction of the genetic risk of these traits in this sample of families. Reliable identification of large-effect rare variants will require larger samples of extended pedigrees or different study designs that further enrich for such variants.


Assuntos
Diabetes Mellitus Tipo 2/genética , Predisposição Genética para Doença/genética , Variação Genética , Americanos Mexicanos/genética , Diabetes Mellitus Tipo 2/etnologia , Diabetes Mellitus Tipo 2/patologia , Saúde da Família , Feminino , Frequência do Gene , Predisposição Genética para Doença/etnologia , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Masculino , Linhagem , Fenótipo , Locos de Características Quantitativas/genética , Sequenciamento Completo do Genoma/métodos
5.
Hum Mol Genet ; 27(R1): R63-R71, 2018 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-29648622

RESUMO

The human genome is now investigated through high-throughput functional assays, and through the generation of population genomic data. These advances support the identification of functional genetic variants and the prediction of traits (e.g. deleterious variants and disease). This review summarizes lessons learned from the large-scale analyses of genome and exome data sets, modeling of population data and machine-learning strategies to solve complex genomic sequence regions. The review also portrays the rapid adoption of artificial intelligence/deep neural networks in genomics; in particular, deep learning approaches are well suited to model the complex dependencies in the regulatory landscape of the genome, and to provide predictors for genetic variant calling and interpretation.


Assuntos
Aprendizado Profundo/tendências , Redes Reguladoras de Genes/genética , Genoma Humano/genética , Genômica/tendências , Exoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/tendências , Humanos , Análise de Sequência de DNA , Software
6.
Bioinformatics ; 35(21): 4389-4391, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-30916319

RESUMO

SUMMARY: Reference genomes are refined to reflect error corrections and other improvements. While this process improves novel data generation and analysis, incorporating data analyzed on an older reference genome assembly requires transforming the coordinates and representations of the data to the new assembly. Multiple tools exist to perform this transformation for coordinate-only data types, but none supports accurate transformation of genome-wide short variation. Here we present GenomeWarp, a tool for efficiently transforming variants between genome assemblies. GenomeWarp transforms regions and short variants in a conservative manner to minimize false positive and negative variants in the target genome, and converts over 99% of regions and short variants from a representative human genome. AVAILABILITY AND IMPLEMENTATION: GenomeWarp is written in Java. All source code and the user manual are freely available at https://github.com/verilylifesciences/genomewarp. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Software , Genoma Humano , Humanos
7.
Nature ; 506(7487): 185-90, 2014 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-24463508

RESUMO

Schizophrenia is a common disease with a complex aetiology, probably involving multiple and heterogeneous genetic factors. Here, by analysing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we demonstrate a polygenic burden primarily arising from rare (less than 1 in 10,000), disruptive mutations distributed across many genes. Particularly enriched gene sets include the voltage-gated calcium ion channel and the signalling complex formed by the activity-regulated cytoskeleton-associated scaffold protein (ARC) of the postsynaptic density, sets previously implicated by genome-wide association and copy-number variation studies. Similar to reports in autism, targets of the fragile X mental retardation protein (FMRP, product of FMR1) are enriched for case mutations. No individual gene-based test achieves significance after correction for multiple testing and we do not detect any alleles of moderately low frequency (approximately 0.5 to 1 per cent) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene-mapping paradigms in neuropsychiatric disease.


Assuntos
Herança Multifatorial/genética , Mutação/genética , Esquizofrenia/genética , Transtorno Autístico/genética , Canais de Cálcio/genética , Proteínas do Citoesqueleto/genética , Variações do Número de Cópias de DNA/genética , Proteína 4 Homóloga a Disks-Large , Feminino , Proteína do X Frágil da Deficiência Intelectual/metabolismo , Estudo de Associação Genômica Ampla , Humanos , Deficiência Intelectual/genética , Peptídeos e Proteínas de Sinalização Intracelular/genética , Masculino , Proteínas de Membrana/genética , Proteínas do Tecido Nervoso/genética , Receptores de N-Metil-D-Aspartato/genética
8.
Nature ; 491(7422): 56-65, 2012 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-23128226

RESUMO

By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.


Assuntos
Variação Genética/genética , Genética Populacional , Genoma Humano/genética , Genômica , Alelos , Sítios de Ligação/genética , Sequência Conservada/genética , Evolução Molecular , Genética Médica , Estudo de Associação Genômica Ampla , Haplótipos/genética , Humanos , Motivos de Nucleotídeos , Polimorfismo de Nucleotídeo Único/genética , Grupos Raciais/genética , Deleção de Sequência/genética , Fatores de Transcrição/metabolismo
9.
Nature ; 485(7397): 242-5, 2012 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-22495311

RESUMO

Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified. To identify further genetic risk factors, here we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n = 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant, and the overall rate of mutation is only modestly higher than the expected rate. In contrast, the proteins encoded by genes that harboured de novo missense or nonsense mutations showed a higher degree of connectivity among themselves and to previous ASD genes as indexed by protein-protein interaction screens. The small increase in the rate of de novo events, when taken together with the protein interaction results, are consistent with an important but limited role for de novo point mutations in ASD, similar to that documented for de novo copy number variants. Genetic models incorporating these data indicate that most of the observed de novo events are unconnected to ASD; those that do confer risk are distributed across many genes and are incompletely penetrant (that is, not necessarily sufficient for disease). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5- to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case-control study provide strong evidence in favour of CHD8 and KATNAL2 as genuine autism risk factors.


Assuntos
Transtorno Autístico/genética , Proteínas de Ligação a DNA/genética , Éxons/genética , Predisposição Genética para Doença/genética , Mutação/genética , Fatores de Transcrição/genética , Estudos de Casos e Controles , Exoma/genética , Saúde da Família , Humanos , Modelos Genéticos , Herança Multifatorial/genética , Fenótipo , Distribuição de Poisson , Mapas de Interação de Proteínas
10.
Mol Cell ; 37(3): 311-20, 2010 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-20159551

RESUMO

Antibiotic resistance arises through mechanisms such as selection of naturally occurring resistant mutants and horizontal gene transfer. Recently, oxidative stress has been implicated as one of the mechanisms whereby bactericidal antibiotics kill bacteria. Here, we show that sublethal levels of bactericidal antibiotics induce mutagenesis, resulting in heterogeneous increases in the minimum inhibitory concentration for a range of antibiotics, irrespective of the drug target. This increase in mutagenesis correlates with an increase in ROS and is prevented by the ROS scavenger thiourea and by anaerobic conditions, indicating that sublethal concentrations of antibiotics induce mutagenesis by stimulating the production of ROS. We demonstrate that these effects can lead to mutant strains that are sensitive to the applied antibiotic but resistant to other antibiotics. This work establishes a radical-based molecular mechanism whereby sublethal levels of antibiotics can lead to multidrug resistance, which has important implications for the widespread use and misuse of antibiotics.


Assuntos
Antibacterianos/farmacologia , Farmacorresistência Bacteriana Múltipla/efeitos dos fármacos , Escherichia coli/efeitos dos fármacos , Mutagênese , Sequência de Aminoácidos , Sequência de Bases , Farmacorresistência Bacteriana Múltipla/genética , Escherichia coli/genética , Escherichia coli/metabolismo , Transferência Genética Horizontal/efeitos dos fármacos , Testes de Sensibilidade Microbiana , Dados de Sequência Molecular , Espécies Reativas de Oxigênio/metabolismo , Alinhamento de Sequência , Análise de Sequência de Proteína
11.
N Engl J Med ; 371(1): 22-31, 2014 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-24941081

RESUMO

BACKGROUND: Plasma triglyceride levels are heritable and are correlated with the risk of coronary heart disease. Sequencing of the protein-coding regions of the human genome (the exome) has the potential to identify rare mutations that have a large effect on phenotype. METHODS: We sequenced the protein-coding regions of 18,666 genes in each of 3734 participants of European or African ancestry in the Exome Sequencing Project. We conducted tests to determine whether rare mutations in coding sequence, individually or in aggregate within a gene, were associated with plasma triglyceride levels. For mutations associated with triglyceride levels, we subsequently evaluated their association with the risk of coronary heart disease in 110,970 persons. RESULTS: An aggregate of rare mutations in the gene encoding apolipoprotein C3 (APOC3) was associated with lower plasma triglyceride levels. Among the four mutations that drove this result, three were loss-of-function mutations: a nonsense mutation (R19X) and two splice-site mutations (IVS2+1G→A and IVS3+1G→T). The fourth was a missense mutation (A43T). Approximately 1 in 150 persons in the study was a heterozygous carrier of at least one of these four mutations. Triglyceride levels in the carriers were 39% lower than levels in noncarriers (P<1×10(-20)), and circulating levels of APOC3 in carriers were 46% lower than levels in noncarriers (P=8×10(-10)). The risk of coronary heart disease among 498 carriers of any rare APOC3 mutation was 40% lower than the risk among 110,472 noncarriers (odds ratio, 0.60; 95% confidence interval, 0.47 to 0.75; P=4×10(-6)). CONCLUSIONS: Rare mutations that disrupt APOC3 function were associated with lower levels of plasma triglycerides and APOC3. Carriers of these mutations were found to have a reduced risk of coronary heart disease. (Funded by the National Heart, Lung, and Blood Institute and others.).


Assuntos
Apolipoproteína C-III/genética , Doença das Coronárias/genética , Mutação , Triglicerídeos/sangue , Apolipoproteína C-III/sangue , População Negra/genética , Doença das Coronárias/sangue , Exoma , Genótipo , Heterozigoto , Humanos , Fígado/patologia , Fatores de Risco , Análise de Sequência de DNA , População Branca/genética
12.
PLoS Genet ; 9(4): e1003443, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23593035

RESUMO

We report on results from whole-exome sequencing (WES) of 1,039 subjects diagnosed with autism spectrum disorders (ASD) and 870 controls selected from the NIMH repository to be of similar ancestry to cases. The WES data came from two centers using different methods to produce sequence and to call variants from it. Therefore, an initial goal was to ensure the distribution of rare variation was similar for data from different centers. This proved straightforward by filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. Results were evaluated using seven samples sequenced at both centers and by results from the association study. Next we addressed how the data and/or results from the centers should be combined. Gene-based analyses of association was an obvious choice, but should statistics for association be combined across centers (meta-analysis) or should data be combined and then analyzed (mega-analysis)? Because of the nature of many gene-based tests, we showed by theory and simulations that mega-analysis has better power than meta-analysis. Finally, before analyzing the data for association, we explored the impact of population structure on rare variant analysis in these data. Like other recent studies, we found evidence that population structure can confound case-control studies by the clustering of rare variants in ancestry space; yet, unlike some recent studies, for these data we found that principal component-based analyses were sufficient to control for ancestry and produce test statistics with appropriate distributions. After using a variety of gene-based tests and both meta- and mega-analysis, we found no new risk genes for ASD in this sample. Our results suggest that standard gene-based tests will require much larger samples of cases and controls before being effective for gene discovery, even for a disorder like ASD.


Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Exoma , Estudo de Associação Genômica Ampla , Estudos de Casos e Controles , Criança , Transtornos Globais do Desenvolvimento Infantil/fisiopatologia , Predisposição Genética para Doença , Variação Genética , Humanos , Controle da População , Análise de Sequência de DNA , Software
14.
Hum Mol Genet ; 20(7): 1285-9, 2011 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-21212097

RESUMO

Exome sequencing is a powerful tool for discovery of the Mendelian disease genes. Previously, we reported a novel locus for autosomal recessive non-syndromic mental retardation (NSMR) in a consanguineous family [Nolan, D.K., Chen, P., Das, S., Ober, C. and Waggoner, D. (2008) Fine mapping of a locus for nonsyndromic mental retardation on chromosome 19p13. Am. J. Med. Genet. A, 146A, 1414-1422]. Using linkage and homozygosity mapping, we previously localized the gene to chromosome 19p13. The parents of this sibship were recently included in an exome sequencing project. Using a series of filters, we narrowed the putative causal mutation to a single variant site that segregated with NSMR: the mutation was homozygous in five affected siblings but in none of eight unaffected siblings. This mutation causes a substitution of a leucine for a highly conserved proline at amino acid 182 in TECR (trans-2,3-enoyl-CoA reductase), a synaptic glycoprotein. Our results reveal the value of massively parallel sequencing for identification of novel disease genes that could not be found using traditional approaches and identifies only the seventh causal mutation for autosomal recessive NSMR.


Assuntos
Cromossomos Humanos Par 19/genética , Doenças Genéticas Inatas/genética , Deficiência Intelectual/genética , Glicoproteínas de Membrana/genética , Mutação , Oxirredutases/genética , Membranas Sinápticas/genética , Feminino , Doenças Genéticas Inatas/enzimologia , Humanos , Deficiência Intelectual/enzimologia , Masculino , Glicoproteínas de Membrana/metabolismo , Oxirredutases/metabolismo , Linhagem , Membranas Sinápticas/enzimologia
15.
N Engl J Med ; 363(23): 2220-7, 2010 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-20942659

RESUMO

We sequenced all protein-coding regions of the genome (the "exome") in two family members with combined hypolipidemia, marked by extremely low plasma levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, and triglycerides. These two participants were compound heterozygotes for two distinct nonsense mutations in ANGPTL3 (encoding the angiopoietin-like 3 protein). ANGPTL3 has been reported to inhibit lipoprotein lipase and endothelial lipase, thereby increasing plasma triglyceride and HDL cholesterol levels in rodents. Our finding of ANGPTL3 mutations highlights a role for the gene in LDL cholesterol metabolism in humans and shows the usefulness of exome sequencing for identification of novel genetic causes of inherited disorders. (Funded by the National Human Genome Research Institute and others.).


Assuntos
Angiopoietinas/genética , Códon sem Sentido , Hipobetalipoproteinemias/genética , Proteína 3 Semelhante a Angiopoietina , Proteínas Semelhantes a Angiopoietina , HDL-Colesterol/sangue , HDL-Colesterol/genética , LDL-Colesterol/sangue , LDL-Colesterol/genética , Análise Mutacional de DNA , Feminino , Ligação Genética , Humanos , Masculino , Linhagem
16.
Genome Res ; 20(9): 1297-303, 2010 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-20644199

RESUMO

Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS--the 1000 Genome pilot alone includes nearly five terabases--make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.


Assuntos
Genoma , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Sequência de Bases
17.
PLoS Comput Biol ; 8(7): e1002604, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22807667

RESUMO

High coverage whole genome sequencing provides near complete information about genetic variation. However, other technologies can be more efficient in some settings by (a) reducing redundant coverage within samples and (b) exploiting patterns of genetic variation across samples. To characterize as many samples as possible, many genetic studies therefore employ lower coverage sequencing or SNP array genotyping coupled to statistical imputation. To compare these approaches individually and in conjunction, we developed a statistical framework to estimate genotypes jointly from sequence reads, array intensities, and imputation. In European samples, we find similar sensitivity (89%) and specificity (99.6%) from imputation with either 1× sequencing or 1 M SNP arrays. Sensitivity is increased, particularly for low-frequency polymorphisms (MAF < 5%), when low coverage sequence reads are added to dense genome-wide SNP arrays--the converse, however, is not true. At sites where sequence reads and array intensities produce different sample genotypes, joint analysis reduces genotype errors and identifies novel error modes. Our joint framework informs the use of next-generation sequencing in genome wide association studies and supports development of improved methods for genotype calling.


Assuntos
Genômica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Algoritmos , Análise por Conglomerados , Bases de Dados Genéticas , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Sensibilidade e Especificidade , População Branca
18.
BMC Genomics ; 13: 375, 2012 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-22863213

RESUMO

BACKGROUND: Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome some limitations of current next generation sequencing platforms by providing significantly longer reads, single molecule sequencing, low composition bias and an error profile that is orthogonal to other platforms. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical amplicon resequencing projects. RESULTS: We evaluated the Pacific Biosciences technology for SNP discovery in medical resequencing projects using the Genome Analysis Toolkit, observing high sensitivity and specificity for calling differences in amplicons containing known true or false SNPs. We assessed data quality: most errors were indels (~14%) with few apparent miscalls (~1%). In this work, we define a custom data processing pipeline for Pacific Biosciences data for human data analysis. CONCLUSION: Critically, the error properties were largely free of the context-specific effects that affect other sequencing technologies. These data show excellent utility for follow-up validation and extension studies in human data and medical genetics projects, but can be extended to other organisms with a reference genome.


Assuntos
Análise de Sequência de DNA , Variação Genética , Genoma Humano , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único , Software , Interface Usuário-Computador
19.
Bioinformatics ; 27(18): 2601-2, 2011 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-21803805

RESUMO

SUMMARY: Here, we present ContEst, a tool for estimating the level of cross-individual contamination in next-generation sequencing data. We demonstrate the accuracy of ContEst across a range of contamination levels, sources and read depths using sequencing data mixed in silico at known concentrations. We applied our tool to published cancer sequencing datasets and report their estimated contamination levels. AVAILABILITY AND IMPLEMENTATION: ContEst is a GATK module, and distributed under a BSD style license at http://www.broadinstitute.org/cancer/cga/contest CONTACT: kcibul@broadinstitute.org; gadgetz@broadinstitute.org SUPPLEMENTARY INFORMATION: Supplementary data is available at Bioinformatics online.


Assuntos
Neoplasias/genética , Análise de Sequência de DNA/métodos , Sequência de Bases , Teorema de Bayes , Reações Falso-Positivas , Genótipo , Humanos , Modelos Genéticos , Software
20.
Bioinformatics ; 27(15): 2156-8, 2011 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-21653522

RESUMO

SUMMARY: The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. The format was developed for the 1000 Genomes Project, and has also been adopted by other projects such as UK10K, dbSNP and the NHLBI Exome Project. VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. AVAILABILITY: http://vcftools.sourceforge.net


Assuntos
Variação Genética , Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Software , Alelos , Genoma Humano , Genótipo , Humanos
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa