Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 52
Filtrar
1.
Am J Hum Genet ; 108(4): 656-668, 2021 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-33770507

RESUMEN

Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches.


Asunto(s)
Análisis Mutacional de ADN/economía , Análisis Mutacional de ADN/normas , Variación Genética/genética , Genética de Población/economía , África , Análisis Mutacional de ADN/métodos , Genética de Población/métodos , Genoma Humano/genética , Estudio de Asociación del Genoma Completo , Equidad en Salud , Humanos , Microbiota , Secuenciación Completa del Genoma/economía , Secuenciación Completa del Genoma/normas
2.
Genome Res ; 31(4): 529-537, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-33536225

RESUMEN

Low-pass sequencing (sequencing a genome to an average depth less than 1× coverage) combined with genotype imputation has been proposed as an alternative to genotyping arrays for trait mapping and calculation of polygenic scores. To empirically assess the relative performance of these technologies for different applications, we performed low-pass sequencing (targeting coverage levels of 0.5× and 1×) and array genotyping (using the Illumina Global Screening Array [GSA]) on 120 DNA samples derived from African- and European-ancestry individuals that are part of the 1000 Genomes Project. We then imputed both the sequencing data and the genotyping array data to the 1000 Genomes Phase 3 haplotype reference panel using a leave-one-out design. We evaluated overall imputation accuracy from these different assays as well as overall power for GWAS from imputed data and computed polygenic risk scores for coronary artery disease and breast cancer using previously derived weights. We conclude that low-pass sequencing plus imputation, in addition to providing a substantial increase in statistical power for genome-wide association studies, provides increased accuracy for polygenic risk prediction at effective coverages of ∼0.5× and higher compared to the Illumina GSA.


Asunto(s)
Estudio de Asociación del Genoma Completo , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Genoma Humano , Estudio de Asociación del Genoma Completo/métodos , Estudio de Asociación del Genoma Completo/normas , Haplotipos , Humanos , Factores de Riesgo
3.
Nature ; 533(7604): 539-42, 2016 05 26.
Artículo en Inglés | MEDLINE | ID: mdl-27225129

RESUMEN

Educational attainment is strongly influenced by social and other environmental factors, but genetic factors are estimated to account for at least 20% of the variation across individuals. Here we report the results of a genome-wide association study (GWAS) for educational attainment that extends our earlier discovery sample of 101,069 individuals to 293,723 individuals, and a replication study in an independent sample of 111,349 individuals from the UK Biobank. We identify 74 genome-wide significant loci associated with the number of years of schooling completed. Single-nucleotide polymorphisms associated with educational attainment are disproportionately found in genomic regions regulating gene expression in the fetal brain. Candidate genes are preferentially expressed in neural tissue, especially during the prenatal period, and enriched for biological pathways involved in neural development. Our findings demonstrate that, even for a behavioural phenotype that is mostly environmentally determined, a well-powered GWAS identifies replicable associated genetic variants that suggest biologically relevant pathways. Because educational attainment is measured in large numbers of individuals, it will continue to be useful as a proxy phenotype in efforts to characterize the genetic influences of related phenotypes, including cognition and neuropsychiatric diseases.


Asunto(s)
Encéfalo/metabolismo , Escolaridad , Feto/metabolismo , Regulación de la Expresión Génica/genética , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple/genética , Enfermedad de Alzheimer/genética , Trastorno Bipolar/genética , Cognición , Biología Computacional , Interacción Gen-Ambiente , Humanos , Anotación de Secuencia Molecular , Esquizofrenia/genética , Reino Unido
4.
BMC Genomics ; 22(1): 197, 2021 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-33743587

RESUMEN

BACKGROUND: Low pass sequencing has been proposed as a cost-effective alternative to genotyping arrays to identify genetic variants that influence multifactorial traits in humans. For common diseases this typically has required both large sample sizes and comprehensive variant discovery. Genotyping arrays are also routinely used to perform pharmacogenetic (PGx) experiments where sample sizes are likely to be significantly smaller, but clinically relevant effect sizes likely to be larger. RESULTS: To assess how low pass sequencing would compare to array based genotyping for PGx we compared a low-pass assay (in which 1x coverage or less of a target genome is sequenced) along with software for genotype imputation to standard approaches. We sequenced 79 individuals to 1x genome coverage and genotyped the same samples on the Affymetrix Axiom Biobank Precision Medicine Research Array (PMRA). We then down-sampled the sequencing data to 0.8x, 0.6x, and 0.4x coverage, and performed imputation. Both the genotype data and the sequencing data were further used to impute human leukocyte antigen (HLA) genotypes for all samples. We compared the sequencing data and the genotyping array data in terms of four metrics: overall concordance, concordance at single nucleotide polymorphisms in pharmacogenetics-related genes, concordance in imputed HLA genotypes, and imputation r2. Overall concordance between the two assays ranged from 98.2% (for 0.4x coverage sequencing) to 99.2% (for 1x coverage sequencing), with qualitatively similar numbers for the subsets of variants most important in pharmacogenetics. At common single nucleotide polymorphisms (SNPs), the mean imputation r2 from the genotyping array was 0.90, which was comparable to the imputation r2 from 0.4x coverage sequencing, while the mean imputation r2 from 1x sequencing data was 0.96. CONCLUSIONS: These results indicate that low-pass sequencing to a depth above 0.4x coverage attains higher power for association studies when compared to the PMRA and should be considered as a competitive alternative to genotyping arrays for trait mapping in pharmacogenetics.


Asunto(s)
Estudio de Asociación del Genoma Completo , Farmacogenética , Genotipo , Técnicas de Genotipaje , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido Simple
5.
Nature ; 528(7583): 499-503, 2015 Dec 24.
Artículo en Inglés | MEDLINE | ID: mdl-26595274

RESUMEN

Ancient DNA makes it possible to observe natural selection directly by analysing samples from populations before, during and after adaptation events. Here we report a genome-wide scan for selection using ancient DNA, capitalizing on the largest ancient DNA data set yet assembled: 230 West Eurasians who lived between 6500 and 300 bc, including 163 with newly reported data. The new samples include, to our knowledge, the first genome-wide ancient DNA from Anatolian Neolithic farmers, whose genetic material we obtained by extracting from petrous bones, and who we show were members of the population that was the source of Europe's first farmers. We also report a transect of the steppe region in Samara between 5600 and 300 bc, which allows us to identify admixture into the steppe from at least two external sources. We detect selection at loci associated with diet, pigmentation and immunity, and two independent episodes of selection on height.


Asunto(s)
Genoma Humano/genética , Selección Genética/genética , Agricultura/historia , Asia/etnología , Estatura/genética , Huesos , ADN/genética , ADN/aislamiento & purificación , Dieta/historia , Europa (Continente)/etnología , Genética de Población , Haplotipos/genética , Historia Antigua , Humanos , Inmunidad/genética , Masculino , Herencia Multifactorial/genética , Pigmentación/genética , Análisis de Secuencia de ADN
6.
PLoS Genet ; 14(7): e1007499, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29965964

RESUMEN

[This corrects the article DOI: 10.1371/journal.pgen.1006915.].

7.
PLoS Biol ; 15(9): e2002458, 2017 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-28873088

RESUMEN

A number of open questions in human evolutionary genetics would become tractable if we were able to directly measure evolutionary fitness. As a step towards this goal, we developed a method to examine whether individual genetic variants, or sets of genetic variants, currently influence viability. The approach consists in testing whether the frequency of an allele varies across ages, accounting for variation in ancestry. We applied it to the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort and to the parents of participants in the UK Biobank. Across the genome, we found only a few common variants with large effects on age-specific mortality: tagging the APOE ε4 allele and near CHRNA3. These results suggest that when large, even late-onset effects are kept at low frequency by purifying selection. Testing viability effects of sets of genetic variants that jointly influence 1 of 42 traits, we detected a number of strong signals. In participants of the UK Biobank of British ancestry, we found that variants that delay puberty timing are associated with a longer parental life span (P~6.2 × 10-6 for fathers and P~2.0 × 10-3 for mothers), consistent with epidemiological studies. Similarly, variants associated with later age at first birth are associated with a longer maternal life span (P~1.4 × 10-3). Signals are also observed for variants influencing cholesterol levels, risk of coronary artery disease (CAD), body mass index, as well as risk of asthma. These signals exhibit consistent effects in the GERA cohort and among participants of the UK Biobank of non-British ancestry. We also found marked differences between males and females, most notably at the CHRNA3 locus, and variants associated with risk of CAD and cholesterol levels. Beyond our findings, the analysis serves as a proof of principle for how upcoming biomedical data sets can be used to learn about selection effects in contemporary humans.


Asunto(s)
Evolución Molecular , Aptitud Genética , Genética de Población/métodos , Modelos Genéticos , Selección Genética , Estudios de Cohortes , Femenino , Frecuencia de los Genes , Variación Genética , Humanos , Masculino
8.
Nature ; 505(7481): 43-9, 2014 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-24352235

RESUMEN

We present a high-quality genome sequence of a Neanderthal woman from Siberia. We show that her parents were related at the level of half-siblings and that mating among close relatives was common among her recent ancestors. We also sequenced the genome of a Neanderthal from the Caucasus to low coverage. An analysis of the relationships and population history of available archaic genomes and 25 present-day human genomes shows that several gene flow events occurred among Neanderthals, Denisovans and early modern humans, possibly including gene flow into Denisovans from an unknown archaic group. Thus, interbreeding, albeit of low magnitude, occurred among many hominin groups in the Late Pleistocene. In addition, the high-quality Neanderthal genome allows us to establish a definitive list of substitutions that became fixed in modern humans after their separation from the ancestors of Neanderthals and Denisovans.


Asunto(s)
Fósiles , Genoma/genética , Hombre de Neandertal/genética , África , Animales , Cuevas , Variaciones en el Número de Copia de ADN/genética , Femenino , Flujo Génico/genética , Frecuencia de los Genes , Heterocigoto , Humanos , Endogamia , Modelos Genéticos , Hombre de Neandertal/clasificación , Filogenia , Densidad de Población , Siberia/etnología , Falanges de los Dedos del Pie/anatomía & histología
9.
PLoS Genet ; 13(9): e1006915, 2017 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-28957316

RESUMEN

Do the frequencies of disease mutations in human populations reflect a simple balance between mutation and purifying selection? What other factors shape the prevalence of disease mutations? To begin to answer these questions, we focused on one of the simplest cases: recessive mutations that alone cause lethal diseases or complete sterility. To this end, we generated a hand-curated set of 417 Mendelian mutations in 32 genes reported to cause a recessive, lethal Mendelian disease. We then considered analytic models of mutation-selection balance in infinite and finite populations of constant sizes and simulations of purifying selection in a more realistic demographic setting, and tested how well these models fit allele frequencies estimated from 33,370 individuals of European ancestry. In doing so, we distinguished between CpG transitions, which occur at a substantially elevated rate, and three other mutation types. Intriguingly, the observed frequency for CpG transitions is slightly higher than expectation but close, whereas the frequencies observed for the three other mutation types are an order of magnitude higher than expected, with a bigger deviation from expectation seen for less mutable types. This discrepancy is even larger when subtle fitness effects in heterozygotes or lethal compound heterozygotes are taken into account. In principle, higher than expected frequencies of disease mutations could be due to widespread errors in reporting causal variants, compensation by other mutations, or balancing selection. It is unclear why these factors would have a greater impact on disease mutations that occur at lower rates, however. We argue instead that the unexpectedly high frequency of disease mutations and the relationship to the mutation rate likely reflect an ascertainment bias: of all the mutations that cause recessive lethal diseases, those that by chance have reached higher frequencies are more likely to have been identified and thus to have been included in this study. Beyond the specific application, this study highlights the parameters likely to be important in shaping the frequencies of Mendelian disease alleles.


Asunto(s)
Genes Letales/genética , Enfermedades Genéticas Congénitas/genética , Genética de Población , Selección Genética/genética , Frecuencia de los Genes , Genes Recesivos , Heterocigoto , Humanos , Modelos Genéticos , Mutación
10.
Nature ; 482(7385): 390-4, 2012 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-22307276

RESUMEN

The mapping of expression quantitative trait loci (eQTLs) has emerged as an important tool for linking genetic variation to changes in gene regulation. However, it remains difficult to identify the causal variants underlying eQTLs, and little is known about the regulatory mechanisms by which they act. Here we show that genetic variants that modify chromatin accessibility and transcription factor binding are a major mechanism through which genetic variation leads to gene expression differences among humans. We used DNase I sequencing to measure chromatin accessibility in 70 Yoruba lymphoblastoid cell lines, for which genome-wide genotypes and estimates of gene expression levels are also available. We obtained a total of 2.7 billion uniquely mapped DNase I-sequencing (DNase-seq) reads, which allowed us to produce genome-wide maps of chromatin accessibility for each individual. We identified 8,902 locations at which the DNase-seq read depth correlated significantly with genotype at a nearby single nucleotide polymorphism or insertion/deletion (false discovery rate = 10%). We call such variants 'DNase I sensitivity quantitative trait loci' (dsQTLs). We found that dsQTLs are strongly enriched within inferred transcription factor binding sites and are frequently associated with allele-specific changes in transcription factor binding. A substantial fraction (16%) of dsQTLs are also associated with variation in the expression levels of nearby genes (that is, these loci are also classified as eQTLs). Conversely, we estimate that as many as 55% of eQTL single nucleotide polymorphisms are also dsQTLs. Our observations indicate that dsQTLs are highly abundant in the human genome and are likely to be important contributors to phenotypic variation.


Asunto(s)
Huella de ADN , Desoxirribonucleasa I/metabolismo , Regulación de la Expresión Génica/genética , Variación Genética/genética , Sitios de Carácter Cuantitativo/genética , Cromatina/genética , Cromatina/metabolismo , Perfilación de la Expresión Génica , Genoma Humano/genética , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Análisis de Secuencia de ADN , Factores de Transcripción/metabolismo
11.
Trends Genet ; 30(9): 377-89, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25168683

RESUMEN

Genetic information contains a record of the history of our species, and technological advances have transformed our ability to access this record. Many studies have used genome-wide data from populations today to learn about the peopling of the globe and subsequent adaptation to local conditions. Implicit in this research is the assumption that the geographic locations of people today are informative about the geographic locations of their ancestors in the distant past. However, it is now clear that long-range migration, admixture, and population replacement subsequent to the initial out-of-Africa expansion have altered the genetic structure of most of the world's human populations. In light of this we argue that it is time to critically reevaluate current models of the peopling of the globe, as well as the importance of natural selection in determining the geographic distribution of phenotypes. We specifically highlight the transformative potential of ancient DNA. By accessing the genetic make-up of populations living at archaeologically known times and places, ancient DNA makes it possible to directly track migrations and responses to natural selection.


Asunto(s)
ADN/genética , ADN/historia , Genética de Población , Genoma Humano , Geografía , Selección Genética/genética , África , Evolución Molecular , Historia Antigua , Humanos , Fenotipo
12.
Am J Hum Genet ; 94(4): 559-73, 2014 Apr 03.
Artículo en Inglés | MEDLINE | ID: mdl-24702953

RESUMEN

Annotations of gene structures and regulatory elements can inform genome-wide association studies (GWASs). However, choosing the relevant annotations for interpreting an association study of a given trait remains challenging. I describe a statistical model that uses association statistics computed across the genome to identify classes of genomic elements that are enriched with or depleted of loci influencing a trait. The model naturally incorporates multiple types of annotations. I applied the model to GWASs of 18 human traits, including red blood cell traits, platelet traits, glucose levels, lipid levels, height, body mass index, and Crohn disease. For each trait, I used the model to evaluate the relevance of 450 different genomic annotations, including protein-coding genes, enhancers, and DNase-I hypersensitive sites in over 100 tissues and cell lines. The fraction of phenotype-associated SNPs influencing protein sequence ranged from around 2% (for platelet volume) up to around 20% (for low-density lipoprotein cholesterol), repressed chromatin was significantly depleted for SNPs associated with several traits, and cell-type-specific DNase-I hypersensitive sites were enriched with SNPs associated with several traits (for example, the spleen in platelet volume). Finally, reweighting each GWAS by using information from functional genomics increased the number of loci with high-confidence associations by around 5%.


Asunto(s)
Estudio de Asociación del Genoma Completo , Modelos Genéticos , Teorema de Bayes , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple
13.
Bioinformatics ; 32(2): 283-5, 2016 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-26395773

RESUMEN

UNLABELLED: We present a method to identify approximately independent blocks of linkage disequilibrium in the human genome. These blocks enable automated analysis of multiple genome-wide association studies. AVAILABILITY AND IMPLEMENTATION: code: http://bitbucket.org/nygcresearch/ldetect; data: http://bitbucket.org/nygcresearch/ldetect-data. CONTACT: tberisa@nygenome.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Mapeo Cromosómico/métodos , Genoma Humano , Estudio de Asociación del Genoma Completo , Programas Informáticos , Algoritmos , Marcadores Genéticos , Humanos , Desequilibrio de Ligamiento
14.
Proc Natl Acad Sci U S A ; 111(7): 2632-7, 2014 Feb 18.
Artículo en Inglés | MEDLINE | ID: mdl-24550290

RESUMEN

The history of southern Africa involved interactions between indigenous hunter-gatherers and a range of populations that moved into the region. Here we use genome-wide genetic data to show that there are at least two admixture events in the history of Khoisan populations (southern African hunter-gatherers and pastoralists who speak non-Bantu languages with click consonants). One involved populations related to Niger-Congo-speaking African populations, and the other introduced ancestry most closely related to west Eurasian (European or Middle Eastern) populations. We date this latter admixture event to ∼900-1,800 y ago and show that it had the largest demographic impact in Khoisan populations that speak Khoe-Kwadi languages. A similar signal of west Eurasian ancestry is present throughout eastern Africa. In particular, we also find evidence for two admixture events in the history of Kenyan, Tanzanian, and Ethiopian populations, the earlier of which involved populations related to west Eurasians and which we date to ∼2,700-3,300 y ago. We reconstruct the allele frequencies of the putative west Eurasian population in eastern Africa and show that this population is a good proxy for the west Eurasian ancestry in southern Africa. The most parsimonious explanation for these findings is that west Eurasian ancestry entered southern Africa indirectly through eastern Africa.


Asunto(s)
Demografía , Emigración e Inmigración , Etnicidad/genética , Genética de Población/métodos , Población Blanca/genética , África Oriental , África Austral , Simulación por Computador , Europa (Continente)/etnología , Flujo Génico , Frecuencia de los Genes , Genotipo , Humanos , Desequilibrio de Ligamiento , Modelos Genéticos
15.
Nature ; 464(7289): 768-72, 2010 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-20220758

RESUMEN

Understanding the genetic mechanisms underlying natural variation in gene expression is a central goal of both medical and evolutionary genetics, and studies of expression quantitative trait loci (eQTLs) have become an important tool for achieving this goal. Although all eQTL studies so far have assayed messenger RNA levels using expression microarrays, recent advances in RNA sequencing enable the analysis of transcript variation at unprecedented resolution. We sequenced RNA from 69 lymphoblastoid cell lines derived from unrelated Nigerian individuals that have been extensively genotyped by the International HapMap Project. By pooling data from all individuals, we generated a map of the transcriptional landscape of these cells, identifying extensive use of unannotated untranslated regions and more than 100 new putative protein-coding exons. Using the genotypes from the HapMap project, we identified more than a thousand genes at which genetic variation influences overall expression levels or splicing. We demonstrate that eQTLs near genes generally act by a mechanism involving allele-specific expression, and that variation that influences the inclusion of an exon is enriched within and near the consensus splice sites. Our results illustrate the power of high-throughput sequencing for the joint analysis of variation in transcription, splicing and allele-specific expression across individuals.


Asunto(s)
Perfilación de la Expresión Génica , Regulación de la Expresión Génica/genética , Variación Genética/genética , ARN Mensajero/análisis , ARN Mensajero/genética , Transcripción Genética/genética , Alelos , Población Negra/genética , Secuencia de Consenso/genética , ADN Complementario/genética , Exones/genética , Humanos , Nigeria , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Sitios de Empalme de ARN/genética , Análisis de Secuencia de ARN
16.
Genome Res ; 22(4): 602-10, 2012 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-22207615

RESUMEN

Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success.


Asunto(s)
Variación Genética , Primates/genética , Análisis de Secuencia de ARN/métodos , Transcriptoma/genética , Animales , Especies en Peligro de Extinción , Evolución Molecular , Genoma/genética , Humanos , Hígado/metabolismo , Filogenia , Primates/clasificación , Especificidad de la Especie
17.
Bioinformatics ; 30(20): 2906-14, 2014 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-24990607

RESUMEN

MOTIVATION: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. RESULTS: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. AVAILABILITY AND IMPLEMENTATION: Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. CONTACT: bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu SUPPLEMENTARY INFORMATION: Supplementary materials are available at Bioinformatics online.


Asunto(s)
Bioestadística/métodos , Estudio de Asociación del Genoma Completo/métodos , Algoritmos , Estudios de Casos y Controles , Estudios de Cohortes , Genotipo , Humanos , Desequilibrio de Ligamiento , Fenotipo , Polimorfismo de Nucleótido Simple , Programas Informáticos , Factores de Tiempo
18.
PLoS Genet ; 8(11): e1002967, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23166502

RESUMEN

Many aspects of the historical relationships between populations in a species are reflected in genetic data. Inferring these relationships from genetic data, however, remains a challenging task. In this paper, we present a statistical model for inferring the patterns of population splits and mixtures in multiple populations. In our model, the sampled populations in a species are related to their common ancestor through a graph of ancestral populations. Using genome-wide allele frequency data and a Gaussian approximation to genetic drift, we infer the structure of this graph. We applied this method to a set of 55 human populations and a set of 82 dog breeds and wild canids. In both species, we show that a simple bifurcating tree does not fully describe the data; in contrast, we infer many migration events. While some of the migration events that we find have been detected previously, many have not. For example, in the human data, we infer that Cambodians trace approximately 16% of their ancestry to a population ancestral to other extant East Asian populations. In the dog data, we infer that both the boxer and basenji trace a considerable fraction of their ancestry (9% and 25%, respectively) to wolves subsequent to domestication and that East Asian toy breeds (the Shih Tzu and the Pekingese) result from admixture between modern toy breeds and "ancient" Asian breeds. Software implementing the model described here, called TreeMix, is available at http://treemix.googlecode.com.


Asunto(s)
Frecuencia de los Genes , Flujo Genético , Estudio de Asociación del Genoma Completo , Algoritmos , Animales , Cruzamiento , Perros , Humanos , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Población/genética , Lobos/genética
19.
PLoS Genet ; 8(10): e1003000, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23071454

RESUMEN

Recent gene expression QTL (eQTL) mapping studies have provided considerable insight into the genetic basis for inter-individual regulatory variation. However, a limitation of all eQTL studies to date, which have used measurements of steady-state gene expression levels, is the inability to directly distinguish between variation in transcription and decay rates. To address this gap, we performed a genome-wide study of variation in gene-specific mRNA decay rates across individuals. Using a time-course study design, we estimated mRNA decay rates for over 16,000 genes in 70 Yoruban HapMap lymphoblastoid cell lines (LCLs), for which extensive genotyping data are available. Considering mRNA decay rates across genes, we found that: (i) as expected, highly expressed genes are generally associated with lower mRNA decay rates, (ii) genes with rapid mRNA decay rates are enriched with putative binding sites for miRNA and RNA binding proteins, and (iii) genes with similar functional roles tend to exhibit correlated rates of mRNA decay. Focusing on variation in mRNA decay across individuals, we estimate that steady-state expression levels are significantly correlated with variation in decay rates in 10% of genes. Somewhat counter-intuitively, for about half of these genes, higher expression is associated with faster decay rates, possibly due to a coupling of mRNA decay with transcriptional processes in genes involved in rapid cellular responses. Finally, we used these data to map genetic variation that is specifically associated with variation in mRNA decay rates across individuals. We found 195 such loci, which we named RNA decay quantitative trait loci ("rdQTLs"). All the observed rdQTLs are located near the regulated genes and therefore are assumed to act in cis. By analyzing our data within the context of known steady-state eQTLs, we estimate that a substantial fraction of eQTLs are associated with inter-individual variation in mRNA decay rates.


Asunto(s)
Expresión Génica , Variación Genética , Sitios de Carácter Cuantitativo , Estabilidad del ARN , Línea Celular , Mapeo Cromosómico , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Estudio de Asociación del Genoma Completo , Humanos , MicroARNs/genética , MicroARNs/metabolismo , Interferencia de ARN
20.
Proc Biol Sci ; 281(1789): 20140930, 2014 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-24990677

RESUMEN

While gene flow between distantly related populations is increasingly recognized as a potentially important source of adaptive genetic variation for humans, fully characterized examples are rare. In addition, the role that natural selection for resistance to vivax malaria may have played in the extreme distribution of the protective Duffy-null allele, which is nearly completely fixed in mainland sub-Saharan Africa and absent elsewhere, is controversial. We address both these issues by investigating the evolution of the Duffy-null allele in the Malagasy, a recently admixed population with major ancestry components from both East Asia and mainland sub-Saharan Africa. We used genome-wide genetic data and extensive computer simulations to show that the high frequency of the Duffy-null allele in Madagascar can only be explained in the absence of positive natural selection under extreme demographic scenarios involving high genetic drift. However, the observed genomic single nucleotide polymorphism diversity in the Malagasy is incompatible with such extreme demographic scenarios, indicating that positive selection for the Duffy-null allele best explains the high frequency of the allele in Madagascar. We estimate the selection coefficient to be 0.066. Because vivax malaria is endemic to Madagascar, this result supports the hypothesis that malaria resistance drove fixation of the Duffy-null allele in mainland sub-Saharan Africa.


Asunto(s)
Sistema del Grupo Sanguíneo Duffy/genética , Frecuencia de los Genes , Receptores de Superficie Celular/genética , Selección Genética , África del Sur del Sahara , Pueblo Asiatico/genética , Población Negra/genética , Simulación por Computador , Flujo Genético , Genética de Población , Humanos , Madagascar , Modelos Genéticos , Polimorfismo de Nucleótido Simple
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA