Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nat Commun ; 12(1): 2845, 2021 05 14.
Artigo em Inglês | MEDLINE | ID: mdl-33990588

RESUMO

Quantifying the overall magnitude of every single locus' genetic effect on the widely measured human phenome is of great challenge. We introduce a unified modelling technique that can consistently provide a total genetic contribution assessment (TGCA) of a gene or genetic variant without thresholding genetic association signals. Genome-wide TGCA in five UK Biobank phenotype domains highlights loci such as the HLA locus for medical conditions, the bone mineral density locus WNT16 for physical measures, and the skin tanning locus MC1R and smoking behaviour locus CHRNA3 for lifestyle. Tissue-specificity investigation reveals several tissues associated with total genetic contributions, including the brain tissues for mental health. Such associations are driven by tissue-specific gene expressions, which share genetic basis with the total genetic contributions. TGCA can provide a genome-wide atlas for the overall genetic contributions in each particular domain of human complex traits.


Assuntos
Genoma Humano , Modelos Genéticos , Bancos de Espécimes Biológicos/estatística & dados numéricos , Simulação por Computador , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Anotação de Sequência Molecular/estatística & dados numéricos , Herança Multifatorial/genética , Especificidade de Órgãos/genética , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
2.
J Comput Biol ; 27(7): 1171-1179, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-31692371

RESUMO

Logistic regression is an effective tool in case-control analysis. With the advanced high throughput technology, a quest to seek a fast and efficient method in fitting high-dimensional logistic regression has gained much interest. An empirical Bayes model for logistic regression is considered in this article. A spike-and-slab prior is used for variable selection purpose, which plays a vital role in building an effective predictive model while making model interpretable. To increase the power of variable selection, we incorporate biological knowledge through the Ising prior. The development of the iterated conditional modes/medians (ICM/M) algorithm is proposed to fit the logistic model that has computational advantage over Markov Chain Monte Carlo (MCMC) algorithms. The implementation of the ICM/M algorithm for both linear and logistic models can be found in R package icmm that is freely available on Comprehensive R Archive Network (CRAN). Simulation studies were carried out to assess the performances of our method, with lasso and adaptive lasso as benchmark. Overall, the simulation studies show that the ICM/M outperform the others in terms of number of false positives and have competitive predictive ability. An application to a real data set from Parkinson's disease study was also carried out for illustration. To identify important variables, our approach provides flexibility to select variables based on local posterior probabilities while controlling false discovery rate at a desired level rather than relying only on regression coefficients.


Assuntos
Algoritmos , Estudos de Casos e Controles , Genômica/estatística & dados numéricos , Doença de Parkinson/genética , Teorema de Bayes , Frequência do Gene , Redes Reguladoras de Genes , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Modelos Logísticos , Cadeias de Markov , Polimorfismo de Nucleotídeo Único
3.
Pac Symp Biocomput ; 24: 403-414, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30963078

RESUMO

The proliferation of sequencing technologies in biomedical research has raised many new privacy concerns. These include concerns over the publication of aggregate data at a genomic scale (e.g. minor allele frequencies, regression coefficients). Methods such as differential privacy can overcome these concerns by providing strong privacy guarantees, but come at the cost of greatly perturbing the results of the analysis of interest. Here we investigate an alternative approach for achieving privacy-preserving aggregate genomic data sharing without the high cost to accuracy of differentially private methods. In particular, we demonstrate how other ideas from the statistical disclosure control literature (in particular, the idea of disclosure risk) can be applied to aggregate data to help ensure privacy. This is achieved by combining minimal amounts of perturbation with Bayesian statistics and Markov Chain Monte Carlo techniques. We test our technique on a GWAS dataset to demonstrate its utility in practice.


Assuntos
Privacidade Genética , Teorema de Bayes , Biologia Computacional , Revelação , Frequência do Gene , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Genômica/estatística & dados numéricos , Humanos , Disseminação de Informação , Cadeias de Markov , Modelos Estatísticos , Método de Monte Carlo , Polimorfismo de Nucleotídeo Único
4.
Heredity (Edinb) ; 123(3): 287-306, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-30858595

RESUMO

Power calculation prior to a genetic experiment can help investigators choose the optimal sample size to detect a quantitative trait locus (QTL). Without the guidance of power analysis, an experiment may be underpowered or overpowered. Either way will result in wasted resource. QTL mapping and genome-wide association studies (GWAS) are often conducted using a linear mixed model (LMM) with controls of population structure and polygenic background using markers of the whole genome. Power analysis for such a mixed model is often conducted via Monte Carlo simulations. In this study, we derived a non-centrality parameter for the Wald test statistic for association, which allows analytical power analysis. We show that large samples are not necessary to detect a biologically meaningful QTL, say explaining 5% of the phenotypic variance. Several R functions are provided so that users can perform power analysis to determine the minimum sample size required to detect a given QTL with a certain statistical power or calculate the statistical power with given sample size and known values of other population parameters.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Genoma , Modelos Estatísticos , Locos de Características Quantitativas , Característica Quantitativa Herdável , Marcadores Genéticos , Genótipo , Humanos , Método de Monte Carlo , Oryza/genética , Fenótipo , Tamanho da Amostra
5.
Biometrics ; 75(1): 163-171, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30039847

RESUMO

Assessing the statistical significance of risk factors when screening large numbers of 2×2 tables that cross-classify disease status with each type of exposure poses a challenging multiple testing problem. The problem is especially acute in large-scale genomic case-control studies. We develop a potentially more powerful and computationally efficient approach (compared with existing methods, including Bonferroni and permutation testing) by taking into account the presence of complex dependencies between the 2×2 tables. Our approach gains its power by exploiting Monte Carlo simulation from the estimated null distribution of a maximally selected log-odds ratio. We apply the method to case-control data from a study of a large collection of genetic variants related to the risk of early onset stroke.


Assuntos
Estudos de Casos e Controles , Interpretação Estatística de Dados , Programas de Rastreamento/métodos , Polimorfismo de Nucleotídeo Único , Simulação por Computador , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Programas de Rastreamento/estatística & dados numéricos , Método de Monte Carlo , Fatores de Risco , Acidente Vascular Cerebral/genética , Fatores de Tempo
6.
Proc Natl Acad Sci U S A ; 115(22): E4970-E4979, 2018 05 29.
Artigo em Inglês | MEDLINE | ID: mdl-29686100

RESUMO

Identifying causal effects in nonexperimental data is an enduring challenge. One proposed solution that recently gained popularity is the idea to use genes as instrumental variables [i.e., Mendelian randomization (MR)]. However, this approach is problematic because many variables of interest are genetically correlated, which implies the possibility that many genes could affect both the exposure and the outcome directly or via unobserved confounding factors. Thus, pleiotropic effects of genes are themselves a source of bias in nonexperimental data that would also undermine the ability of MR to correct for endogeneity bias from nongenetic sources. Here, we propose an alternative approach, genetic instrumental variable (GIV) regression, that provides estimates for the effect of an exposure on an outcome in the presence of pleiotropy. As a valuable byproduct, GIV regression also provides accurate estimates of the chip heritability of the outcome variable. GIV regression uses polygenic scores (PGSs) for the outcome of interest which can be constructed from genome-wide association study (GWAS) results. By splitting the GWAS sample for the outcome into nonoverlapping subsamples, we obtain multiple indicators of the outcome PGSs that can be used as instruments for each other and, in combination with other methods such as sibling fixed effects, can address endogeneity bias from both pleiotropy and the environment. In two empirical applications, we demonstrate that our approach produces reasonable estimates of the chip heritability of educational attainment (EA) and show that standard regression and MR provide upwardly biased estimates of the effect of body height on EA.


Assuntos
Pleiotropia Genética , Variação Genética , Estudo de Associação Genômica Ampla , Fatores Socioeconômicos , Estatura/fisiologia , Escolaridade , Estudo de Associação Genômica Ampla/normas , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Avaliação de Resultados em Cuidados de Saúde
7.
Stat Methods Med Res ; 27(3): 905-919, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-27215414

RESUMO

We propose a new genome-wide association test for mixed binary and continuous phenotypes that uses an efficient numerical method to estimate the empirical distribution of the Fisher's combination statistic under the null hypothesis. Our simulation study shows that the proposed method controls the type I error rate and also maintains its power at the level of the permutation method. More importantly, the computational efficiency of the proposed method is much higher than the one of the permutation method. The simulation results also indicate that the power of the test increases when the genetic effect increases, the minor allele frequency increases, and the correlation between responses decreases. The statistical analysis on the database of the Study of Addiction: Genetics and Environment demonstrates that the proposed method combining multiple phenotypes can increase the power of identifying markers that may not be, otherwise, chosen using marginal tests.


Assuntos
Bioestatística/métodos , Estudo de Associação Genômica Ampla/métodos , Transtornos Relacionados ao Uso de Substâncias/genética , Simulação por Computador , Bases de Dados Genéticas/estatística & dados numéricos , Interação Gene-Ambiente , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Modelos Genéticos , Modelos Estatísticos , Método de Monte Carlo , Herança Multifatorial , Análise Multivariada , Fenótipo
8.
Stat Methods Med Res ; 27(3): 933-954, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-27177884

RESUMO

In the field of aging research, family-based sampling study designs are commonly used to study the lifespans of long-lived family members. However, the specific sampling procedure should be carefully taken into account in order to avoid biases. This work is motivated by the Leiden Longevity Study, a family-based cohort of long-lived siblings. Families were invited to participate in the study if at least two siblings were 'long-lived', where 'long-lived' meant being older than 89 years for men or older than 91 years for women. As a result, more than 400 families were included in the study and followed for around 10 years. For estimation of marker-specific survival probabilities and correlations among life times of family members, delayed entry due to outcome-dependent sampling mechanisms has to be taken into account. We consider shared frailty models to model left-truncated correlated survival data. The treatment of left truncation in shared frailty models is still an open issue and the literature on this topic is scarce. We show that the current approaches provide, in general, biased estimates and we propose a new method to tackle this selection problem by applying a correction on the likelihood estimation by means of inverse probability weighting at the family level.


Assuntos
Longevidade/genética , Análise de Sobrevida , Idoso de 80 Anos ou mais , Apolipoproteína E2/genética , Apolipoproteína E4/genética , Viés , Bioestatística/métodos , Estudos de Coortes , Simulação por Computador , Família , Feminino , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Funções Verossimilhança , Masculino , Modelos Estatísticos , Método de Monte Carlo , Países Baixos , Probabilidade , Software
9.
Nature ; 544(7648): 20-22, 2017 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-28383002

Assuntos
População Negra/genética , Genética Médica/tendências , Genômica/tendências , Medicina de Precisão/tendências , Saúde Pública/tendências , África/epidemiologia , África/etnologia , Alcinos , Fármacos Anti-HIV/efeitos adversos , Fármacos Anti-HIV/metabolismo , Fármacos Anti-HIV/uso terapêutico , Apolipoproteína L1 , Apolipoproteínas/genética , Benzoxazinas/administração & dosagem , Benzoxazinas/efeitos adversos , Benzoxazinas/metabolismo , Benzoxazinas/uso terapêutico , Instituições de Caridade/economia , Ciclopropanos , Diabetes Mellitus Tipo 2/epidemiologia , Diabetes Mellitus Tipo 2/genética , Predisposição Genética para Doença , Genética Médica/economia , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Genômica/economia , Infecções por HIV/tratamento farmacológico , Infecções por HIV/epidemiologia , Infecções por HIV/genética , Compostos Heterocíclicos com 3 Anéis/uso terapêutico , Humanos , Nefropatias/economia , Nefropatias/epidemiologia , Nefropatias/genética , Nefropatias/terapia , Lipoproteínas HDL/genética , National Institutes of Health (U.S.)/economia , Neoplasias/genética , Neoplasias/radioterapia , Neoplasias/terapia , Oxazinas , Piperazinas , Polimorfismo de Nucleotídeo Único/genética , Medicina de Precisão/economia , Saúde Pública/economia , Piridonas , Inibidores da Transcriptase Reversa/efeitos adversos , Inibidores da Transcriptase Reversa/metabolismo , Inibidores da Transcriptase Reversa/uso terapêutico , Acidente Vascular Cerebral/epidemiologia , Acidente Vascular Cerebral/genética , Tripanossomíase Africana/epidemiologia , Tripanossomíase Africana/genética , Estados Unidos , População Branca/genética
10.
Genet Epidemiol ; 41(2): 145-151, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-27990689

RESUMO

Genome-wide association studies (GWAS) of common disease have been hugely successful in implicating loci that modify disease risk. The bulk of these associations have proven robust and reproducible, in part due to community adoption of statistical criteria for claiming significant genotype-phenotype associations. As the cost of sequencing continues to drop, assembling large samples in global populations is becoming increasingly feasible. Sequencing studies interrogate not only common variants, as was true for genotyping-based GWAS, but variation across the full allele frequency spectrum, yielding many more (independent) statistical tests. We sought to empirically determine genome-wide significance thresholds for various analysis scenarios. Using whole-genome sequence data, we simulated sequencing-based disease studies of varying sample size and ancestry. We determined that future sequencing efforts in >2,000 samples of European, Asian, or admixed ancestry should set genome-wide significance at approximately P = 5 × 10-9 , and studies of African samples should apply a more stringent genome-wide significance threshold of P = 1 × 10-9 . Adoption of a revised multiple test correction will be crucial in avoiding irreproducible claims of association.


Assuntos
Etnicidade/genética , Genoma Humano , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Metagenômica , Polimorfismo de Nucleotídeo Único/genética , Genótipo , Saúde Global , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
11.
J Bioinform Comput Biol ; 13(5): 1543001, 2015 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-26493682

RESUMO

Automated assignment of protein function has received considerable attention in recent years for genome-wide study. With the rapid accumulation of genome sequencing data produced by high-throughput experimental techniques, the process of manually predicting functional properties of proteins has become increasingly cumbersome. Such large genomics data sets can only be annotated computationally. However, automated assignment of functions to unknown protein is challenging due to its inherent difficulty and complexity. Previous studies have revealed that solving problems involving complicated objects with multiple semantic meanings using the multi-instance multi-label (MIML) framework is effective. For the protein function prediction problems, each protein object in nature may associate with distinct structural units (instances) and multiple functional properties (class labels) where each unit is described by an instance and each functional property is considered as a class label. Thus, it is convenient and natural to tackle the protein function prediction problem by using the MIML framework. In this paper, we propose a sparse Markov chain-based semi-supervised MIML method, called Sparse-Markov. A sparse transductive probability graph is constructed to encode the affinity information of the data based on ensemble of Hausdorff distance metrics. Our goal is to exploit the affinity between protein objects in the sparse transductive probability graph to seek a sparse steady state probability of the Markov chain model to do protein function prediction, such that two proteins are given similar functional labels if they are close to each other in terms of an ensemble Hausdorff distance in the graph. Experimental results on seven real-world organism data sets covering three biological domains show that our proposed Sparse-Markov method is able to achieve better performance than four state-of-the-art MIML learning algorithms.


Assuntos
Cadeias de Markov , Proteínas/química , Proteínas/fisiologia , Aprendizado de Máquina Supervisionado , Algoritmos , Animais , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Proteínas/genética
12.
Pac Symp Biocomput ; : 241-52, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24297551

RESUMO

A striking finding from recent large-scale sequencing efforts is that the vast majority of variants in the human genome are rare and found within single populations or lineages. These observations hold important implications for the design of the next round of disease variant discovery efforts-if genetic variants that influence disease risk follow the same trend, then we expect to see population-specific disease associations that require large sample sizes for detection. To address this challenge, and due to the still prohibitive cost of sequencing large cohorts, researchers have developed a new generation of low-cost genotyping arrays that assay rare variation previously identified from large exome sequencing studies. Genotyping approaches rely not only on directly observing variants, but also on phasing and imputation methods that use publicly available reference panels to infer unobserved variants in a study cohort. Rare variant exome arrays are intentionally enriched for variants likely to be disease causing, and here we assay the ability of the first commercially available rare exome variant array (the Illumina Infinium HumanExome BeadChip) to also tag other potentially damaging variants not molecularly assayed. Using full sequence data from chromosome 22 from the phase I 1000 Genomes Project, we evaluate three methods for imputation (BEAGLE, MaCH-Admix, and SHAPEIT2/IMPUTE2) with the rare exome variant array under varied study panel sizes, reference panel sizes, and LD structures via population differences. We find that imputation is more accurate across both the genome and exome for common variant arrays than the next generation array for all allele frequencies, including rare alleles. We also find that imputation is the least accurate in African populations, and accuracy is substantially improved for rare variants when the same population is included in the reference panel. Depending on the goals of GWAS researchers, our results will aid budget decisions by helping determine whether money is best spent sequencing the genomes of smaller sample sizes, genotyping larger sample sizes with rare and/or common variant arrays and imputing SNPs, or some combination of the two.


Assuntos
Exoma , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Algoritmos , Biologia Computacional , Genética Populacional/estatística & dados numéricos , Genoma Humano , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Genótipo , Projeto Genoma Humano , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único , Medicina de Precisão/estatística & dados numéricos , Tamanho da Amostra
13.
Comput Math Methods Med ; 2013: 235825, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23737858

RESUMO

In family-based genetic association studies, it is possible to encounter missing genotype information for one of the parents. This leads to a study consisting of both case-parent trios and case-parent pairs. One of the approaches to this problem is permutation-based combined transmission disequilibrium test statistic. However, it is still unknown how powerful this test statistic is with small sample sizes. In this paper, a simulation study is carried out to estimate the power and false positive rate of this test across different sample sizes for a family-based genome-wide association study. It is observed that a statistical power of over 80% and a reasonable false positive rate estimate can be achieved even with a combination of 50 trios and 30 pairs when 2% of the SNPs are assumed to be associated. Moreover, even smaller samples provide high power when smaller percentages of SNPs are associated with the disease.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Biologia Computacional , Simulação por Computador , Família , Feminino , Genótipo , Humanos , Masculino , Modelos Estatísticos , Método de Monte Carlo , Polimorfismo de Nucleotídeo Único , Tamanho da Amostra
14.
Methods Mol Biol ; 1019: 237-74, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23756894

RESUMO

Bayesian multiple-regression methods are being successfully used for genomic prediction and selection. These regression models simultaneously fit many more markers than the number of observations available for the analysis. Thus, the Bayes theorem is used to combine prior beliefs of marker effects, which are expressed in terms of prior distributions, with information from data for inference. Often, the analyses are too complex for closed-form solutions and Markov chain Monte Carlo (MCMC) sampling is used to draw inferences from posterior distributions. This chapter describes how these Bayesian multiple-regression analyses can be used for GWAS. In most GWAS, false positives are controlled by limiting the genome-wise error rate, which is the probability of one or more false-positive results, to a small value. As the number of test in GWAS is very large, this results in very low power. Here we show how in Bayesian GWAS false positives can be controlled by limiting the proportion of false-positive results among all positives to some small value. The advantage of this approach is that the power of detecting associations is not inversely related to the number of markers.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Software , Teorema de Bayes , Genômica , Genótipo , Humanos , Cadeias de Markov , Método de Monte Carlo , Fenótipo , Projetos de Pesquisa
15.
Methods Mol Biol ; 1019: 347-80, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23756899

RESUMO

We herein present a haplotype-based method to perform genome-wide association studies. The method relies on hidden Markov models to describe haplotypes from a population as a mosaic of a set of ancestral haplotypes. For a given position in the genome, haplotypes deriving from the same ancestral haplotype are also likely to carry the same risk alleles. Therefore, the model can be used in several applications such as haplotype reconstruction, imputation, association studies or genomic predictions. We illustrate then the model with two applications: the fine-mapping of a QTL affecting live weight in cattle and association studies in a stratified cattle population. Both applications show the potential of the method and the high linkage disequilibrium between ancestral haplotypes and causative variants.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Genoma , Haplótipos , Modelos Genéticos , Locos de Características Quantitativas/genética , Software , Alelos , Animais , Peso Corporal/genética , Bovinos , Mapeamento Cromossômico , Humanos , Desequilíbrio de Ligação , Cadeias de Markov , Fenótipo
16.
Gene ; 510(1): 87-92, 2012 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-22951808

RESUMO

In genome-wide association studies (GWAS), multiple diseases with shared controls is one of the case-control study designs. If data obtained from these studies are appropriately analyzed, this design can have several advantages such as improving statistical power in detecting associations and reducing the time and cost in the data collection process. In this paper, we propose a study design for GWAS which involves multiple diseases but without controls. We also propose corresponding statistical data analysis strategy for GWAS with multiple diseases but no controls. Through a simulation study, we show that the statistical association test with the proposed study design is more powerful than the test with single disease sharing common controls, and it has comparable power to the overall test based on the whole dataset including the controls. We also apply the proposed method to a real GWAS dataset to illustrate the methodologies and the advantages of the proposed design. Some possible limitations of this study design and testing method and their solutions are also discussed. Our findings indicate that the proposed study design and statistical analysis strategy could be more efficient than the usual case-control GWAS as well as those with shared controls.


Assuntos
Predisposição Genética para Doença/genética , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Polimorfismo de Nucleotídeo Único , Algoritmos , Estudos de Casos e Controles , Distribuição de Qui-Quadrado , Simulação por Computador , Grupos Controle , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Método de Monte Carlo , Projetos de Pesquisa
17.
Am J Epidemiol ; 175(8): 739-49, 2012 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-22427610

RESUMO

There has been a steep increase in the number of meta-analyses of genome-wide association (GWA) studies aimed at identifying genetic variants with increasingly smaller effects, but pressure to publish findings of new genetic associations has limited the time available for careful consideration of all of their methodological aspects. The authors surveyed the literature (2007-2010) to provide empirical evidence on the methods used in GWA meta-analyses, including their organization, requirements about the uniformity of methods used in primary studies, methods for data pooling, investigation of between-study heterogeneity, and quality of reporting. This review showed that a great variety of methods are being used, but the rationale for their choice is often unclear. It also highlights how important methodological aspects have received insufficient attention, potentially leading to missed opportunities for improving gene discovery and characterization. Evaluation of power to replicate findings was inadequate, and the number of variants selected for replication was not associated with replication sample size. A low proportion of GWA meta-analyses investigated the presence and magnitude of heterogeneity, even when there was little uniformity in methods used in primary studies. More methodological work is required before clear guidance can be offered as to optimal methods or tradeoffs between alternative methods. However, there is a clear need for guidelines for reporting the results of GWA meta-analyses.


Assuntos
Estudo de Associação Genômica Ampla , Metanálise como Assunto , Interpretação Estatística de Dados , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Guias como Assunto , Humanos , Polimorfismo de Nucleotídeo Único , Relatório de Pesquisa , Tamanho da Amostra
18.
Genet Epidemiol ; 35(8): 861-6, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22006681

RESUMO

We describe implementation of a set-based method to assess the significance of findings from genomewide association study data. Our method, implemented in PLINK, is based on theoretical approximation of Fisher's statistics such that the combination of P-vales at a gene or across a pathway is carried out in a manner that accounts for the correlation structure, or linkage disequilibrium, between single nucleotide polymorphisms. We compare our method to a permutation-based product of P-values approach and show a typical correlation in excess of 0.98 for a number of comparisons. The method gives Type I error rates that are less than or equal to the corresponding nominal significance levels, making it robust to the effects of false positives. We show that in broadly similar populations, reference data sets of markers are an appropriate substrate for deriving marker-marker linkage disequilibrium (LD), negating the need to access individual level genotypes, greatly facilitating its generic applicability. We show that the method is thus robust to LD-associated bias and has equivalent performance to permutation-based methods, with a significantly shorter runtime. This is particularly relevant at a time of increasing public availability of significantly larger genetic data sets and should go a long way to assist in the rapid analysis of these data sets.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Esquizofrenia/genética , Humanos , Reino Unido
19.
PLoS Genet ; 7(4): e1001371, 2011 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-21541012

RESUMO

While genome-wide association studies (GWAS) have primarily examined populations of European ancestry, more recent studies often involve additional populations, including admixed populations such as African Americans and Latinos. In admixed populations, linkage disequilibrium (LD) exists both at a fine scale in ancestral populations and at a coarse scale (admixture-LD) due to chromosomal segments of distinct ancestry. Disease association statistics in admixed populations have previously considered SNP association (LD mapping) or admixture association (mapping by admixture-LD), but not both. Here, we introduce a new statistical framework for combining SNP and admixture association in case-control studies, as well as methods for local ancestry-aware imputation. We illustrate the gain in statistical power achieved by these methods by analyzing data of 6,209 unrelated African Americans from the CARe project genotyped on the Affymetrix 6.0 chip, in conjunction with both simulated and real phenotypes, as well as by analyzing the FGFR2 locus using breast cancer GWAS data from 5,761 African-American women. We show that, at typed SNPs, our method yields an 8% increase in statistical power for finding disease risk loci compared to the power achieved by standard methods in case-control studies. At imputed SNPs, we observe an 11% increase in statistical power for mapping disease loci when our local ancestry-aware imputation framework and the new scoring statistic are jointly employed. Finally, we show that our method increases statistical power in regions harboring the causal SNP in the case when the causal SNP is untyped and cannot be imputed. Our methods and our publicly available software are broadly applicable to GWAS in admixed populations.


Assuntos
Negro ou Afro-Americano/genética , Neoplasias da Mama/genética , Genoma Humano , Estudo de Associação Genômica Ampla/métodos , Receptor Tipo 2 de Fator de Crescimento de Fibroblastos/genética , Negro ou Afro-Americano/estatística & dados numéricos , Algoritmos , Mapeamento Cromossômico , Doença das Coronárias/genética , Diabetes Mellitus Tipo 2/genética , Feminino , Frequência do Gene , Variação Genética , Genética Populacional/estatística & dados numéricos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Genótipo , Humanos , Desequilíbrio de Ligação , Masculino , Razão de Chances , Fenótipo , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA