Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros

Bases de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Gigascience ; 10(2)2021 01 29.
Artigo em Inglês | MEDLINE | ID: mdl-33511994

RESUMO

BACKGROUND: Rapid and thorough quality assessment of sequenced genomes on an ultra-high-throughput scale is crucial for successful large-scale genomic studies. Comprehensive quality assessment typically requires full genome alignment, which costs a substantial amount of computational resources and turnaround time. Existing tools are either computationally expensive owing to full alignment or lacking essential quality metrics by skipping read alignment. FINDINGS: We developed a set of rapid and accurate methods to produce comprehensive quality metrics directly from a subset of raw sequence reads (from whole-genome or whole-exome sequencing) without full alignment. Our methods offer orders of magnitude faster turnaround time than existing full alignment-based methods while providing comprehensive and sophisticated quality metrics, including estimates of genetic ancestry and cross-sample contamination. CONCLUSIONS: By rapidly and comprehensively performing the quality assessment, our tool will help investigators detect potential issues in ultra-high-throughput sequence reads in real time within a low computational cost at the early stages of the analyses, ensuring high-quality downstream results and preventing unexpected loss in time, money, and invaluable specimens.


Assuntos
Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Software , Sequenciamento do Exoma
2.
Nat Commun ; 10(1): 1847, 2019 04 23.
Artigo em Inglês | MEDLINE | ID: mdl-31015462

RESUMO

Chronic kidney disease (CKD) is a growing health burden currently affecting 10-15% of adults worldwide. Estimated glomerular filtration rate (eGFR) as a marker of kidney function is commonly used to diagnose CKD. We analyze eGFR data from the Nord-Trøndelag Health Study and Michigan Genomics Initiative and perform a GWAS meta-analysis with public summary statistics, more than doubling the sample size of previous meta-analyses. We identify 147 loci (53 novel) associated with eGFR, including genes involved in transcriptional regulation, kidney development, cellular signaling, metabolism, and solute transport. Additionally, sex-stratified analysis identifies one locus with more significant effects in women than men. Using genetic risk scores constructed from these eGFR meta-analysis results, we show that associated variants are generally predictive of CKD with only modest improvements in detection compared with other known clinical risk factors. Collectively, these results yield additional insight into the genetic factors underlying kidney function and progression to CKD.


Assuntos
Loci Gênicos , Estudo de Associação Genômica Ampla , Taxa de Filtração Glomerular/genética , Insuficiência Renal Crônica/genética , Feminino , Carga Global da Doença , Humanos , Rim/fisiopatologia , Masculino , Prognóstico , Insuficiência Renal Crônica/diagnóstico , Insuficiência Renal Crônica/epidemiologia , Insuficiência Renal Crônica/fisiopatologia , Medição de Risco/métodos , Fatores de Risco , Fatores Sexuais
3.
G3 (Bethesda) ; 8(10): 3255-3267, 2018 10 03.
Artigo em Inglês | MEDLINE | ID: mdl-30131328

RESUMO

The emergence of very large cohorts in genomic research has facilitated a focus on genotype-imputation strategies to power rare variant association. These strategies have benefited from improvements in imputation methods and association tests, however little attention has been paid to ways in which array design can increase rare variant association power. Therefore, we developed a novel framework to select tag SNPs using the reference panel of 26 populations from Phase 3 of the 1000 Genomes Project. We evaluate tag SNP performance via mean imputed r2 at untyped sites using leave-one-out internal validation and standard imputation methods, rather than pairwise linkage disequilibrium. Moving beyond pairwise metrics allows us to account for haplotype diversity across the genome for improve imputation accuracy and demonstrates population-specific biases from pairwise estimates. We also examine array design strategies that contrast multi-ethnic cohorts vs. single populations, and show a boost in performance for the former can be obtained by prioritizing tag SNPs that contribute information across multiple populations simultaneously. Using our framework, we demonstrate increased imputation accuracy for rare variants (frequency < 1%) by 0.5-3.1% for an array of one million sites and 0.7-7.1% for an array of 500,000 sites, depending on the population. Finally, we show how recent explosive growth in non-African populations means tag SNPs capture on average 30% fewer other variants than in African populations. The unified framework presented here will enable investigators to make informed decisions for the design of new arrays, and help empower the next phase of rare variant association for global health.


Assuntos
Etnicidade/genética , Estudos de Associação Genética , Genética Populacional , Polimorfismo de Nucleotídeo Único , Seleção Genética , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Estudo de Associação Genômica Ampla , Humanos , Desequilíbrio de Ligação , Modelos Genéticos , Reprodutibilidade dos Testes
4.
Bioinformatics ; 31(22): 3682-4, 2015 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-26209433

RESUMO

UNLABELLED: When performing DNA sequencing to diagnose affected individuals with monogenic forms of rare diseases, accurate attribution of causality to detected variants is imperative but imperfect. Even if a gene has variants already known to cause a disease, rare disruptive variants predicted to be causal are not always so, mainly due to imperfect ability to predict the pathogenicity of variants. Existing population-scale sequence resources such as 1000 Genomes are useful to quantify the 'background prevalence' of an unaffected individual being falsely predicted to carry causal variants. We developed GeneVetter to allow users to quantify the 'background prevalence' of subjects with predicted causal variants within specific genes under user-specified filtering parameters. GeneVetter helps quantify uncertainty in monogenic diagnosis and design genetic studies with support for power and sample size calculations for specific genes with specific filtering criteria. GeneVetter also allows users to analyze their own sequence data without sending genotype information over the Internet. Overall, GeneVetter is an interactive web tool that facilitates quantifying and accounting for the background prevalence of predicted pathogenic variants in a population. AVAILABILITY AND IMPLEMENTATION: GeneVetter is available at http://genevetter.org/ CONTACT: mgsamps@med.umich.edu or hmkang@umich.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Internet , Doenças Raras/genética , Software , Diabetes Mellitus Tipo 2/genética , Humanos , Síndrome Nefrótica/genética , Análise de Sequência de DNA
5.
Biomed Res Int ; 2013: 865181, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24319692

RESUMO

BACKGROUND: Next generation sequencing (NGS) is being widely used to identify genetic variants associated with human disease. Although the approach is cost effective, the underlying data is susceptible to many types of error. Importantly, since NGS technologies and protocols are rapidly evolving, with constantly changing steps ranging from sample preparation to data processing software updates, it is important to enable researchers to routinely assess the quality of sequencing and alignment data prior to downstream analyses. RESULTS: Here we describe QPLOT, an automated tool that can facilitate the quality assessment of sequencing run performance. Taking standard sequence alignments as input, QPLOT generates a series of diagnostic metrics summarizing run quality and produces convenient graphical summaries for these metrics. QPLOT is computationally efficient, generates webpages for interactive exploration of detailed results, and can handle the joint output of many sequencing runs. CONCLUSION: QPLOT is an automated tool that facilitates assessment of sequence run quality. We routinely apply QPLOT to ensure quick detection of diagnostic of sequencing run problems. We hope that QPLOT will be useful to the community as well.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/normas , Software , Interpretação Estatística de Dados , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Controle de Qualidade , Alinhamento de Sequência/normas , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de RNA/normas , Análise de Sequência de RNA/estatística & dados numéricos
6.
Am J Hum Genet ; 93(5): 891-9, 2013 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-24210252

RESUMO

Estimates of the ancestry of specific chromosomal regions in admixed individuals are useful for studies of human evolutionary history and for genetic association studies. Previously, this ancestry inference relied on high-quality genotypes from genome-wide association study (GWAS) arrays. These high-quality genotypes are not always available when samples are exome sequenced, and exome sequencing is the strategy of choice for many ongoing genetic studies. Here we show that off-target reads generated during exome-sequencing experiments can be combined with on-target reads to accurately estimate the ancestry of each chromosomal segment in an admixed individual. To reconstruct local ancestry, our method SEQMIX models aligned bases directly instead of relying on hard genotype calls. We evaluate the accuracy of our method through simulations and analysis of samples sequenced by the 1000 Genomes Project and the NHLBI Grand Opportunity Exome Sequencing Project. In African Americans, we show that local-ancestry estimates derived by our method are very similar to those derived with Illumina's Omni 2.5M genotyping array and much improved in relation to estimates that use only exome genotypes and ignore off-target sequencing reads. Software implementing this method, SEQMIX, can be applied to analysis of human population history or used for genetic association studies in admixed individuals.


Assuntos
Exoma , Estudos de Associação Genética/métodos , Genética Populacional/métodos , Análise de Sequência de DNA/métodos , Negro ou Afro-Americano/genética , Algoritmos , Mapeamento Cromossômico , Simulação por Computador , Pesquisa Empírica , Genoma Humano , Genótipo , Humanos , Desequilíbrio de Ligação , Cadeias de Markov , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos , Software
7.
Am J Hum Genet ; 92(4): 547-57, 2013 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-23541341

RESUMO

Clinical trials for preventative therapies are complex and costly endeavors focused on individuals likely to develop disease in a short time frame, randomizing them to treatment groups, and following them over time. In such trials, statistical power is governed by the rate of disease events in each group and cost is determined by randomization, treatment, and follow-up. Strategies that increase the rate of disease events by enrolling individuals with high risk of disease can significantly reduce study size, duration, and cost. Comprehensive study of common, complex diseases has resulted in a growing list of robustly associated genetic markers. Here, we evaluate the utility--in terms of trial size, duration, and cost--of enriching prevention trial samples by combining clinical information with genetic risk scores to identify individuals at greater risk of disease. We also describe a framework for utilizing genetic risk scores in these trials and evaluating the associated cost and time savings. With type 1 diabetes (T1D), type 2 diabetes (T2D), myocardial infarction (MI), and advanced age-related macular degeneration (AMD) as examples, we illustrate the potential and limitations of using genetic data for prevention trial design. We illustrate settings where incorporating genetic information could reduce trial cost or duration considerably, as well as settings where potential savings are negligible. Results are strongly dependent on the genetic architecture of the disease, but we also show that these benefits should increase as the list of robustly associated markers for each disease grows and as large samples of genotyped individuals become available.


Assuntos
Diabetes Mellitus Tipo 1/prevenção & controle , Diabetes Mellitus Tipo 2/prevenção & controle , Testes Genéticos/estatística & dados numéricos , Variação Genética/genética , Genótipo , Degeneração Macular/prevenção & controle , Infarto do Miocárdio/prevenção & controle , Projetos de Pesquisa , Ensaios Clínicos como Assunto , Análise Custo-Benefício , Diabetes Mellitus Tipo 1/genética , Diabetes Mellitus Tipo 2/genética , Humanos , Degeneração Macular/genética , Modelos Estatísticos , Infarto do Miocárdio/genética , Fenótipo , Fatores de Risco
8.
Genome Res ; 21(6): 940-51, 2011 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-21460063

RESUMO

New sequencing technologies allow genomic variation to be surveyed in much greater detail than previously possible. While detailed analysis of a single individual typically requires deep sequencing, when many individuals are sequenced it is possible to combine shallow sequence data across individuals to generate accurate calls in shared stretches of chromosome. Here, we show that, as progressively larger numbers of individuals are sequenced, increasingly accurate genotype calls can be generated for a given sequence depth. We evaluate the implications of low-coverage sequencing for complex trait association studies. We systematically compare study designs based on genotyping of tagSNPs, sequencing of many individuals at depths ranging between 2× and 30×, and imputation of variants discovered by sequencing a subset of individuals into the remainder of the sample. We show that sequencing many individuals at low depth is an attractive strategy for studies of complex trait genetics. For example, for disease-associated variants with frequency >0.2%, sequencing 3000 individuals at 4× depth provides similar power to deep sequencing of >2000 individuals at 30× depth but requires only ~20% of the sequencing effort. We also show low-coverage sequencing can be used to build a reference panel that can drive imputation into additional samples to increase power further. We provide guidance for investigators wishing to combine results from sequenced, genotyped, and imputed samples.


Assuntos
Variação Genética , Estudo de Associação Genômica Ampla/métodos , Análise de Sequência de DNA/métodos , Biologia Computacional/métodos , Funções Verossimilhança , Cadeias de Markov , Polimorfismo de Nucleotídeo Único/genética
9.
J Comput Biol ; 17(3): 547-60, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20377463

RESUMO

Genome-wide association studies have proven to be a highly successful method for identification of genetic loci for complex phenotypes in both humans and model organisms. These large scale studies rely on the collection of hundreds of thousands of single nucleotide polymorphisms (SNPs) across the genome. Standard high-throughput genotyping technologies capture only a fraction of the total genetic variation. Recent efforts have shown that it is possible to "impute" with high accuracy the genotypes of SNPs that are not collected in the study provided that they are present in a reference data set which contains both SNPs collected in the study as well as other SNPs. We here introduce a novel HMM based technique to solve the imputation problem that addresses several shortcomings of existing methods. First, our method is adaptive which lets it estimate population genetic parameters from the data and be applied to model organisms that have very different evolutionary histories. Compared to previous methods, our method is up to ten times more accurate on model organisms such as mouse. Second, our algorithm scales in memory usage in the number of collected markers as opposed to the number of known SNPs. This issue is very relevant due to the size of the reference data sets currently being generated. We compare our method over mouse and human data sets to existing methods, and show that each has either comparable or better performance and much lower memory usage. The method is available for download at http://genetics.cs.ucla.edu/eminim.


Assuntos
Algoritmos , Haplótipos/genética , Animais , Estudos de Casos e Controles , Diploide , Humanos , Cadeias de Markov , Camundongos , Camundongos Endogâmicos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA