Your browser doesn't support javascript.
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 549
Filtrar
3.
PLoS Genet ; 15(4): e1008009, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30951530

RESUMO

Recent and classical work has revealed biologically and medically significant subtypes in complex diseases and traits. However, relevant subtypes are often unknown, unmeasured, or actively debated, making automated statistical approaches to subtype definition valuable. We propose reverse GWAS (RGWAS) to identify and validate subtypes using genetics and multiple traits: while GWAS seeks the genetic basis of a given trait, RGWAS seeks to define trait subtypes with distinct genetic bases. Unlike existing approaches relying on off-the-shelf clustering methods, RGWAS uses a novel decomposition, MFMR, to model covariates, binary traits, and population structure. We use extensive simulations to show that modelling these features can be crucial for power and calibration. We validate RGWAS in practice by recovering a recently discovered stress subtype in major depression. We then show the utility of RGWAS by identifying three novel subtypes of metabolic traits. We biologically validate these metabolic subtypes with SNP-level tests and a novel polygenic test: the former recover known metabolic GxE SNPs; the latter suggests subtypes may explain substantial missing heritability. Crucially, statins, which are widely prescribed and theorized to increase diabetes risk, have opposing effects on blood glucose across metabolic subtypes, suggesting the subtypes have potential translational value.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Herança Multifatorial , Fenótipo , Algoritmos , Glicemia/efeitos dos fármacos , Glicemia/genética , Análise por Conglomerados , Simulação por Computador , Doença das Coronárias/sangue , Doença das Coronárias/tratamento farmacológico , Doença das Coronárias/genética , Transtorno Depressivo Maior/classificação , Transtorno Depressivo Maior/genética , Diabetes Mellitus Tipo 2/sangue , Diabetes Mellitus Tipo 2/tratamento farmacológico , Diabetes Mellitus Tipo 2/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Inibidores de Hidroximetilglutaril-CoA Redutases/farmacologia , Lipídeos/sangue , Polimorfismo de Nucleotídeo Único , Estado Pré-Diabético/genética , Locos de Características Quantitativas
4.
PLoS Genet ; 15(3): e1007530, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30875371

RESUMO

A common complementary strategy in Genome-Wide Association Studies (GWAS) is to perform Gene Set Analysis (GSA), which tests for the association between one phenotype of interest and an entire set of Single Nucleotide Polymorphisms (SNPs) residing in selected genes. While there exist many tools for performing GSA, popular methods often include a number of ad-hoc steps that are difficult to justify statistically, provide complicated interpretations based on permutation inference, and demonstrate poor operating characteristics. Additionally, the lack of gold standard gene set lists can produce misleading results and create difficulties in comparing analyses even across the same phenotype. We introduce the Generalized Berk-Jones (GBJ) statistic for GSA, a permutation-free parametric framework that offers asymptotic power guarantees in certain set-based testing settings. To adjust for confounding introduced by different gene set lists, we further develop a GBJ step-down inference technique that can discriminate between gene sets driven to significance by single genes and those demonstrating group-level effects. We compare GBJ to popular alternatives through simulation and re-analysis of summary statistics from a large breast cancer GWAS, and we show how GBJ can increase power by incorporating information from multiple signals in the same gene. In addition, we illustrate how breast cancer pathway analysis can be confounded by the frequency of FGFR2 in pathway lists. Our approach is further validated on two other datasets of summary statistics generated from GWAS of height and schizophrenia.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Estatura/genética , Neoplasias da Mama/genética , Mapeamento Cromossômico/estatística & dados numéricos , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados Genéticas , Feminino , Redes Reguladoras de Genes , Humanos , Modelos Genéticos , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único , Receptor Tipo 2 de Fator de Crescimento de Fibroblastos/genética , Esquizofrenia/genética
5.
Pac Symp Biocomput ; 24: 184-195, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30864321

RESUMO

Genetic variations of the human genome are linked to many disease phenotypes. While whole-genome sequencing and genome-wide association studies (GWAS) have uncovered a number of genotype-phenotype associations, their functional interpretation remains challenging given most single nucleotide polymorphisms (SNPs) fall into the non-coding region of the genome. Advances in chromatin immunoprecipitation sequencing (ChIP-seq) have made large-scale repositories of epigenetic data available, allowing investigation of coordinated mechanisms of epigenetic markers and transcriptional regulation and their influence on biological function. To address this, we propose SNPs2ChIP, a method to infer biological functions of non-coding variants through unsupervised statistical learning methods applied to publicly-available epigenetic datasets. We systematically characterized latent factors by applying singular value decomposition to ChIP-seq tracks of lymphoblastoid cell lines, and annotated the biological function of each latent factor using the genomic region enrichment analysis tool. Using these annotated latent factors as reference, we developed SNPs2ChIP, a pipeline that takes genomic region(s) as an input, identifies the relevant latent factors with quantitative scores, and returns them along with their inferred functions. As a case study, we focused on systemic lupus erythematosus and demonstrated our method's ability to infer relevant biological function. We systematically applied SNPs2ChIP on publicly available datasets, including known GWAS associations from the GWAS catalogue and ChIP-seq peaks from a previously published study. Our approach to leverage latent patterns across genome-wide epigenetic datasets to infer the biological function will advance understanding of the genetics of human diseases by accelerating the interpretation of non-coding genomes.


Assuntos
Imunoprecipitação da Cromatina/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único , Algoritmos , Linhagem Celular , Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Epigênese Genética , Estudos de Associação Genética , Genoma Humano , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Lúpus Eritematoso Sistêmico/genética , Linfócitos/metabolismo , Receptores de Calcitriol/genética
6.
Pac Symp Biocomput ; 24: 76-87, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30864312

RESUMO

Noncoding single nucleotide polymorphisms (SNPs) and their target genes are important components of the heritability of diseases and other polygenic traits. Identifying these SNPs and target genes could potentially reveal new molecular mechanisms and advance precision medicine. For polygenic traits, genome-wide association studies (GWAS) are preferred tools for identifying trait-associated regions. However, identifying causal noncoding SNPs within such regions is a difficult problem in computational biology. The DNA sequence context of a noncoding SNP is well-established as an important source of information that is beneficial for discriminating functional from nonfunctional noncoding SNPs. We describe the use of a deep residual network (ResNet)-based model-entitled Res2s2aM-that fuses anking DNA sequence information with additional SNP annotation information to discriminate functional from nonfunctional noncoding SNPs. On a ground-truth set of disease-associated SNPs compiled from the Genome-wide Repository of Associations between SNPs and Phenotypes (GRASP) database, Res2s2aM improves the prediction accuracy of functional SNPs significantly in comparison to models based only on sequence information as well as a leading tool for post-GWAS noncoding SNP prioritization (RegulomeDB).


Assuntos
Aprendizado Profundo , Redes Neurais (Computação) , Polimorfismo de Nucleotídeo Único , Algoritmos , Biologia Computacional , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Modelos Genéticos , Anotação de Sequência Molecular , Análise de Sequência de DNA
7.
Hum Genet ; 138(4): 307-326, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30820706

RESUMO

Genome-wide association studies have reported 56 independently associated colorectal cancer (CRC) risk variants, most of which are non-coding and believed to exert their effects by modulating gene expression. The computational method PrediXcan uses cis-regulatory variant predictors to impute expression and perform gene-level association tests in GWAS without directly measured transcriptomes. In this study, we used reference datasets from colon (n = 169) and whole blood (n = 922) transcriptomes to test CRC association with genetically determined expression levels in a genome-wide analysis of 12,186 cases and 14,718 controls. Three novel associations were discovered from colon transverse models at FDR ≤ 0.2 and further evaluated in an independent replication including 32,825 cases and 39,933 controls. After adjusting for multiple comparisons, we found statistically significant associations using colon transcriptome models with TRIM4 (discovery P = 2.2 × 10- 4, replication P = 0.01), and PYGL (discovery P = 2.3 × 10- 4, replication P = 6.7 × 10- 4). Interestingly, both genes encode proteins that influence redox homeostasis and are related to cellular metabolic reprogramming in tumors, implicating a novel CRC pathway linked to cell growth and proliferation. Defining CRC risk regions as one megabase up- and downstream of one of the 56 independent risk variants, we defined 44 non-overlapping CRC-risk regions. Among these risk regions, we identified genes associated with CRC (P < 0.05) in 34/44 CRC-risk regions. Importantly, CRC association was found for two genes in the previously reported 2q25 locus, CXCR1 and CXCR2, which are potential cancer therapeutic targets. These findings provide strong candidate genes to prioritize for subsequent laboratory follow-up of GWAS loci. This study is the first to implement PrediXcan in a large colorectal cancer study and findings highlight the utility of integrating transcriptome data in GWAS for discovery of, and biological insight into, risk loci.


Assuntos
Neoplasias Colorretais/diagnóstico , Neoplasias Colorretais/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Estudos de Casos e Controles , Neoplasias Colorretais/epidemiologia , Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Frequência do Gene , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Valor Preditivo dos Testes , Prognóstico , Fatores de Risco
8.
Hum Genet ; 138(4): 293-305, 2019 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30840129

RESUMO

The understanding that differences in biological epistasis may impact disease risk, diagnosis, or disease management stands in wide contrast to the unavailability of widely accepted large-scale epistasis analysis protocols. Several choices in the analysis workflow will impact false-positive and false-negative rates. One of these choices relates to the exploitation of particular modelling or testing strategies. The strengths and limitations of these need to be well understood, as well as the contexts in which these hold. This will contribute to determining the potentially complementary value of epistasis detection workflows and is expected to increase replication success with biological relevance. In this contribution, we take a recently introduced regression-based epistasis detection tool as a leading example to review the key elements that need to be considered to fully appreciate the value of analytical epistasis detection performance assessments. We point out unresolved hurdles and give our perspectives towards overcoming these.


Assuntos
Interpretação Estatística de Dados , Epistasia Genética/fisiologia , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Cultura , Reações Falso-Positivas , Testes Genéticos/métodos , Testes Genéticos/estatística & dados numéricos , Estudo de Associação Genômica Ampla/métodos , Humanos , Polimorfismo de Nucleotídeo Único
9.
PLoS Genet ; 15(2): e1007978, 2019 02.
Artigo em Inglês | MEDLINE | ID: mdl-30735486

RESUMO

Linear mixed effect models are powerful tools used to account for population structure in genome-wide association studies (GWASs) and estimate the genetic architecture of complex traits. However, fully-specified models are computationally demanding and common simplifications often lead to reduced power or biased inference. We describe Grid-LMM (https://github.com/deruncie/GridLMM), an extendable algorithm for repeatedly fitting complex linear models that account for multiple sources of heterogeneity, such as additive and non-additive genetic variance, spatial heterogeneity, and genotype-environment interactions. Grid-LMM can compute approximate (yet highly accurate) frequentist test statistics or Bayesian posterior summaries at a genome-wide scale in a fraction of the time compared to existing general-purpose methods. We apply Grid-LMM to two types of quantitative genetic analyses. The first is focused on accounting for spatial variability and non-additive genetic variance while scanning for QTL; and the second aims to identify gene expression traits affected by non-additive genetic variation. In both cases, modeling multiple sources of heterogeneity leads to new discoveries.


Assuntos
Algoritmos , Modelos Lineares , Modelos Genéticos , Animais , Arabidopsis/genética , Arabidopsis/crescimento & desenvolvimento , Teorema de Bayes , Peso Corporal/genética , Simulação por Computador , Flores/genética , Flores/crescimento & desenvolvimento , Interação Gene-Ambiente , Marcadores Genéticos , Variação Genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Camundongos , Locos de Características Quantitativas
10.
Genet Epidemiol ; 43(4): 356-364, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-30657194

RESUMO

When interpreting genome-wide association peaks, it is common to annotate each peak by searching for genes with plausible relationships to the trait. However, "all that glitters is not gold"-one might interpret apparent patterns in the data as plausible even when the peak is a false positive. Accordingly, we sought to see how human annotators interpreted association results containing a mixture of peaks from both the original trait and a genetically uncorrelated "synthetic" trait. Two of us prepared a mix of original and synthetic peaks of three significance categories from five different scans along with relevant literature search results and then we all annotated these regions. Three annotators also scored the strength of evidence connecting each peak to the scanned trait and the likelihood of further studying that region. While annotators found original peaks to have stronger evidence (p Bonferroni = 0.017) and higher likelihood of further study ( p Bonferroni = 0.006) than synthetic peaks, annotators often made convincing connections between the synthetic peaks and the original trait, finding these connections 55% of the time. These results show that it is not difficult for annotators to make convincing connections between synthetic association signals and genes found in those regions.


Assuntos
Curadoria de Dados , Interpretação Estatística de Dados , Reações Falso-Positivas , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Curadoria de Dados/métodos , Curadoria de Dados/normas , Curadoria de Dados/estatística & dados numéricos , Decepção , Estudo de Associação Genômica Ampla/normas , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único
11.
BMC Bioinformatics ; 20(1): 22, 2019 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-30634901

RESUMO

BACKGROUND: Selection of interesting regions from genome wide association studies (GWAS) is typically performed by eyeballing of Manhattan Plots. This is no longer possible with thousands of different phenotypes. There is a need for tools that can automatically detect genomic regions that correspond to what the experienced researcher perceives as peaks worthwhile of further study. RESULTS: We developed Manhattan Harvester, a tool designed for "peak extraction" from GWAS summary files and computation of parameters characterizing various aspects of individual peaks. We present the algorithms used and a model for creating a general quality score that evaluates peaks similarly to that of a human researcher. Our tool Cropper utilizes a graphical interface for inspecting, cropping and subsetting Manhattan Plot regions. Cropper is used to validate and visualize the regions detected by Manhattan Harvester. CONCLUSIONS: We conclude that our tools fill the current void in automatically screening large number of GWAS output files in batch mode. The interesting regions are detected and quantified by various parameters by Manhattan Harvester. Cropper offers graphical tools for in-depth inspection of the regions. The tools are open source and freely available.


Assuntos
Gráficos por Computador , Interpretação Estatística de Dados , Mineração de Dados/métodos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Genômica/métodos , Software , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único
12.
Acta Oncol ; 58(2): 135-146, 2019 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-30606073

RESUMO

INTRODUCTION: Heritage is the most important risk factor for breast cancer. About 15-20% of breast cancer is familial, referring to affected women who have one or more first- or second-degree relatives with the disease. The heritable component in these families is substantial, especially in families with aggregation of breast cancer with low age at onset. Identifying breast cancer susceptibility genes: Since the discovery of the highly penetrant autosomal dominant susceptibility genes BRCA1 and BRCA2 in the 1990s, several more breast cancer genes that confer a moderate to high risk of breast cancer have been identified. Furthermore, during the last decade, advances in genomic technologies have led to large scale genotyping in genome-wide association studies that have identified a considerable amount of common low penetrance loci. In total, the high risk genes, BRCA1, BRCA2, TP53, STK11, CD1 and PTEN account for approximately 20% of the familial risk. Moderate risk variants account for up to 5% of the inherited familial risk. The more than 180 identified low-risk loci explain 18% of the familial risk. Altogether more than half of the genetic background in familial breast cancer remains unclear. Other genes and low risk loci that explain a part the remaining fraction will probably be identified. Clinical aspects and future perspectives: Definitive clinical recommendations can be drawn only for carriers of germline variants in a limited number of high and moderate risk genes for which an association with breast cancer has been established. Future progress in evaluating previously identified breast cancer candidate variants and low risk loci as well as exploring new ones can play an important role in improving individual risk prediction in familial breast cancer.


Assuntos
Neoplasias da Mama/genética , Patrimônio Genético , Predisposição Genética para Doença , Neoplasias da Mama/classificação , Neoplasias da Mama/epidemiologia , Análise Mutacional de DNA/estatística & dados numéricos , Feminino , Genes Neoplásicos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos
13.
BMC Musculoskelet Disord ; 20(1): 24, 2019 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-30646882

RESUMO

BACKGROUND: Rare variants of HSPG2 have recently been reported to function as a potential contributor to the susceptibility of adolescent idiopathic scoliosis (AIS) in the Caucasians. A replication study in the different population is warranted to validate the role of HSPG2 in AIS. The aim of this study was to determine the association between HSPG2 and AIS in the Chinese patients and to further investigate its influence on the phenotype of the patients. METHODS: SNVs p.Asn786Ser of HSPG2 was genotyped in 1752 patients and 1584 normal controls using multiple ligase detection reactions. The mRNA expression of HSPG2 in the paraspinal muscles was quantified for 90 patients and 26 controls. The The Student's t test was used to analyze the inter-group comparison of the HSPG2 expression. The relationship between the HSPG2 expression and the curve magnitude of the patients was analyzed by the Pearson correlation analysis. RESULTS: No case of mutation in the reported SNV p.Asn786Ser of HSPG2 was found in our cohort. The mRNA expression of HSPG2 in patients was comparable with that in the controls (0.0016 ± 0.0013 vs. 0.0019 ± 0.0012, p = 0.29). 42 patients with curve magnitude > 60 degrees were assigned to the severe curve group. The other 58 patients were assigned to the moderate curve group. These two groups were found to have comparable HSPG2 expression (0.0015 ± 0.0011 vs. 0.0017 ± 0.0014, p = 0.57). And there was no remarkable correlation between the expression level of HSPG2 and the curve severity (r = 0.131, p = 0.71). CONCLUSIONS: HSPG2 gene was not associated with the susceptibility or the phenotypes of AIS in the Chinese population. The whole HSPG2 gene can be sequenced in more AIS patients to identify potentially causative mutations.


Assuntos
Grupo com Ancestrais do Continente Asiático/genética , Predisposição Genética para Doença , Proteoglicanas de Heparan Sulfato/genética , Escoliose/genética , Adolescente , Adulto , Estudos de Casos e Controles , Criança , Estudos de Coortes , Análise Mutacional de DNA , Feminino , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Glicosídeos/genética , Humanos , Polimorfismo de Nucleotídeo Único , Esteróis , Adulto Jovem
14.
PLoS Genet ; 15(1): e1007889, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30668570

RESUMO

Integration of genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) studies is needed to improve our understanding of the biological mechanisms underlying GWAS hits, and our ability to identify therapeutic targets. Gene-level association methods such as PrediXcan can prioritize candidate targets. However, limited eQTL sample sizes and absence of relevant developmental and disease context restrict our ability to detect associations. Here we propose an efficient statistical method (MultiXcan) that leverages the substantial sharing of eQTLs across tissues and contexts to improve our ability to identify potential target genes. MultiXcan integrates evidence across multiple panels using multivariate regression, which naturally takes into account the correlation structure. We apply our method to simulated and real traits from the UK Biobank and show that, in realistic settings, we can detect a larger set of significantly associated genes than using each panel separately. To improve applicability, we developed a summary result-based extension called S-MultiXcan, which we show yields highly concordant results with the individual level version when LD is well matched. Our multivariate model-based approach allowed us to use the individual level results as a gold standard to calibrate and develop a robust implementation of the summary-based extension. Results from our analysis as well as software and necessary resources to apply our method are publicly available.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Locos de Características Quantitativas/genética , Transcriptoma/genética , Expressão Gênica/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Software/estatística & dados numéricos
15.
J Anim Breed Genet ; 136(2): 113-117, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-30614572

RESUMO

A curious result from mixed linear models applied to genome-wide association studies was expanded. In particular, a model in which one or more markers are considered as fixed but are allowed to contribute to the covariance structure by treating such markers as random as well was examined. The best linear unbiased estimator of marker effects is invariant with respect to whether those markers are employed in constructing a genomic relationship matrix or are ignored, provided marker effects are uncorrelated with those not being tested. Also, the implications of regarding some marker effects as fixed when, in fact, these possess a non-trivial covariance structure with those declared as random were examined.


Assuntos
Estudo de Associação Genômica Ampla/estatística & dados numéricos , Modelos Lineares , Modelos Genéticos , Modelos Estatísticos , Animais , Cruzamento , Genoma/genética , Genômica , Polimorfismo de Nucleotídeo Único
16.
J Bioinform Comput Biol ; 16(6): 1840026, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30567476

RESUMO

Although genome-wide association studies (GWAS) have successfully identified thousands of single nucleotide polymorphisms (SNPs) associated with common diseases, these observations are limited for fully explaining "missing heritability". Determining gene-gene interactions (GGI) are one possible avenue for addressing the missing heritability problem. While many statistical approaches have been proposed to detect GGI, most of these focus primarily on SNP-to-SNP interactions. While there are many advantages of gene-based GGI analyses, such as reducing the burden of multiple-testing correction, and increasing power by aggregating multiple causal signals across SNPs in specific genes, only a few methods are available. In this study, we proposed a new statistical approach for gene-based GGI analysis, "Hierarchical structural CoMponent analysis of Gene-Gene Interactions" (HisCoM-GGI). HisCoM-GGI is based on generalized structured component analysis, and can consider hierarchical structural relationships between genes and SNPs. For a pair of genes, HisCoM-GGI first effectively summarizes all possible pairwise SNP-SNP interactions into a latent variable, from which it then performs GGI analysis. HisCoM-GGI can evaluate both gene-level and SNP-level interactions. Through simulation studies, HisCoM-GGI demonstrated higher statistical power than existing gene-based GGI methods, in analyzing a GWAS of a Korean population for identifying GGI associated with body mass index. Resultantly, HisCoM-GGI successfully identified 14 potential GGI, two of which, (NCOR2 × SPOCK1) and (LINGO2 × ZNF385D) were successfully replicated in independent datasets. We conclude that HisCoM-GGI method may be a valuable tool for genome to identify GGI in missing heritability, allowing us to better understand the biological genetic mechanisms of complex traits. We conclude that HisCoM-GGI method may be a valuable tool for genome to identify GGI in missing heritability, allowing us to better understand biological genetic mechanisms of complex traits. An implementation of HisCoM-GGI can be downloaded from the website ( http://statgen.snu.ac.kr/software/hiscom-ggi ).


Assuntos
Índice de Massa Corporal , Epistasia Genética , Genômica/métodos , Polimorfismo de Nucleotídeo Único , Grupo com Ancestrais do Continente Asiático/genética , Bases de Dados Genéticas , Feminino , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Genômica/estatística & dados numéricos , Humanos , Masculino , Proteínas de Membrana/genética , Modelos Genéticos , Proteínas do Tecido Nervoso/genética , Correpressor 2 de Receptor Nuclear/genética , Proteoglicanas/genética , Fatores de Transcrição/genética
17.
Genet Sel Evol ; 50(1): 67, 2018 Dec 18.
Artigo em Inglês | MEDLINE | ID: mdl-30563452

RESUMO

BACKGROUND: In this paper, we extend multi-locus iterative peeling to provide a computationally efficient method for calling, phasing, and imputing sequence data of any coverage in small or large pedigrees. Our method, called hybrid peeling, uses multi-locus iterative peeling to estimate shared chromosome segments between parents and their offspring at a subset of loci, and then uses single-locus iterative peeling to aggregate genomic information across multiple generations at the remaining loci. RESULTS: Using a synthetic dataset, we first analysed the performance of hybrid peeling for calling and phasing genotypes in disconnected families, which contained only a focal individual and its parents and grandparents. Second, we analysed the performance of hybrid peeling for calling and phasing genotypes in the context of a full general pedigree. Third, we analysed the performance of hybrid peeling for imputing whole-genome sequence data to non-sequenced individuals in the population. We found that hybrid peeling substantially increased the number of called and phased genotypes by leveraging sequence information on related individuals. The calling rate and accuracy increased when the full pedigree was used compared to a reduced pedigree of just parents and grandparents. Finally, hybrid peeling imputed accurately whole-genome sequence to non-sequenced individuals. CONCLUSIONS: We believe that this algorithm will enable the generation of low cost and high accuracy whole-genome sequence data in many pedigreed populations. We make this algorithm available as a standalone program called AlphaPeel.


Assuntos
Biologia Computacional/métodos , Técnicas de Genotipagem/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Alelos , Animais , Frequência do Gene/genética , Variação Genética/genética , Genoma/genética , Estudo de Associação Genômica Ampla/métodos , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Genômica/métodos , Genótipo , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA/estatística & dados numéricos
18.
PLoS Genet ; 14(12): e1007309, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30589851

RESUMO

A genome-wide association study (GWAS) seeks to identify genetic variants that contribute to the development and progression of a specific disease. Over the past 10 years, new approaches using mixed models have emerged to mitigate the deleterious effects of population structure and relatedness in association studies. However, developing GWAS techniques to accurately test for association while correcting for population structure is a computational and statistical challenge. Using laboratory mouse strains as an example, our review characterizes the problem of population structure in association studies and describes how it can cause false positive associations. We then motivate mixed models in the context of unmodeled factors.


Assuntos
Genética Populacional , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Animais , Viés , Doença/genética , Feminino , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Modelos Lineares , Masculino , Camundongos , Modelos Estatísticos , Linhagem , Fenótipo , Filogenia , Polimorfismo de Nucleotídeo Único
19.
Genet Epidemiol ; 42(6): 539-550, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29900581

RESUMO

In a genome-wide association study (GWAS), association between genotype and phenotype at autosomal loci is generally tested by regression models. However, X-chromosome data are often excluded from published analyses of autosomes because of the difference between males and females in number of X chromosomes. Failure to analyze X-chromosome data at all is obviously less than ideal, and can lead to missed discoveries. Even when X-chromosome data are included, they are often analyzed with suboptimal statistics. Several mathematically sensible statistics for X-chromosome association have been proposed. The optimality of these statistics, however, is based on very specific simple genetic models. In addition, while previous simulation studies of these statistics have been informative, they have focused on single-marker tests and have not considered the types of error that occur even under the null hypothesis when the entire X chromosome is scanned. In this study, we comprehensively tested several X-chromosome association statistics using simulation studies that include the entire chromosome. We also considered a wide range of trait models for sex differences and phenotypic effects of X inactivation. We found that models that do not incorporate a sex effect can have large type I error in some cases. We also found that many of the best statistics perform well even when there are modest deviations, such as trait variance differences between the sexes or small sex differences in allele frequencies, from assumptions.


Assuntos
Cromossomos Humanos X/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Alelos , Feminino , Frequência do Gene/genética , Genótipo , Humanos , Masculino , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Característica Quantitativa Herdável , Análise de Regressão , Inativação do Cromossomo X/genética
20.
Proc Natl Acad Sci U S A ; 115(22): E4970-E4979, 2018 05 29.
Artigo em Inglês | MEDLINE | ID: mdl-29686100

RESUMO

Identifying causal effects in nonexperimental data is an enduring challenge. One proposed solution that recently gained popularity is the idea to use genes as instrumental variables [i.e., Mendelian randomization (MR)]. However, this approach is problematic because many variables of interest are genetically correlated, which implies the possibility that many genes could affect both the exposure and the outcome directly or via unobserved confounding factors. Thus, pleiotropic effects of genes are themselves a source of bias in nonexperimental data that would also undermine the ability of MR to correct for endogeneity bias from nongenetic sources. Here, we propose an alternative approach, genetic instrumental variable (GIV) regression, that provides estimates for the effect of an exposure on an outcome in the presence of pleiotropy. As a valuable byproduct, GIV regression also provides accurate estimates of the chip heritability of the outcome variable. GIV regression uses polygenic scores (PGSs) for the outcome of interest which can be constructed from genome-wide association study (GWAS) results. By splitting the GWAS sample for the outcome into nonoverlapping subsamples, we obtain multiple indicators of the outcome PGSs that can be used as instruments for each other and, in combination with other methods such as sibling fixed effects, can address endogeneity bias from both pleiotropy and the environment. In two empirical applications, we demonstrate that our approach produces reasonable estimates of the chip heritability of educational attainment (EA) and show that standard regression and MR provide upwardly biased estimates of the effect of body height on EA.


Assuntos
Pleiotropia Genética , Variação Genética , Estudo de Associação Genômica Ampla , Fatores Socioeconômicos , Estatura/fisiologia , Escolaridade , Estudo de Associação Genômica Ampla/normas , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Avaliação de Resultados (Cuidados de Saúde)
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA