Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 70
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 173(1): 53-61.e9, 2018 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-29551270

RESUMO

Anatomically modern humans interbred with Neanderthals and with a related archaic population known as Denisovans. Genomes of several Neanderthals and one Denisovan have been sequenced, and these reference genomes have been used to detect introgressed genetic material in present-day human genomes. Segments of introgression also can be detected without use of reference genomes, and doing so can be advantageous for finding introgressed segments that are less closely related to the sequenced archaic genomes. We apply a new reference-free method for detecting archaic introgression to 5,639 whole-genome sequences from Eurasia and Oceania. We find Denisovan ancestry in populations from East and South Asia and Papuans. Denisovan ancestry comprises two components with differing similarity to the sequenced Altai Denisovan individual. This indicates that at least two distinct instances of Denisovan admixture into modern humans occurred, involving Denisovan populations that had different levels of relatedness to the sequenced Altai Denisovan. VIDEO ABSTRACT.


Assuntos
Genoma Humano , Animais , Povo Asiático/genética , Humanos , Homem de Neandertal/genética , Seleção Genética , Sequenciamento do Exoma
2.
Am J Hum Genet ; 111(4): 691-700, 2024 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-38513668

RESUMO

We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.


Assuntos
Bancos de Espécimes Biológicos , Conversão Gênica , Humanos , Software , Haplótipos/genética , Cromossomos , Polimorfismo de Nucleotídeo Único
3.
PLoS Genet ; 20(5): e1011297, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38787916

RESUMO

Genotype data include errors that may influence conclusions reached by downstream statistical analyses. Previous studies have estimated genotype error rates from discrepancies in human pedigree data, such as Mendelian inconsistent genotypes or apparent phase violations. However, uncalled deletions, which generally have not been accounted for in these studies, can lead to biased error rate estimates. In this study, we propose a genotype error model that considers both genotype errors and uncalled deletions when calculating the likelihood of the observed genotypes in parent-offspring trios. Using simulations, we show that when there are uncalled deletions, our model produces genotype error rate estimates that are less biased than estimates from a model that does not account for these deletions. We applied our model to SNVs in 77 sequenced White British parent-offspring trios in the UK Biobank. We use the Akaike information criterion to show that our model fits the data better than a model that does not account for uncalled deletions. We estimate the genotype error rate at SNVs with minor allele frequency > 0.001 in these data to be [Formula: see text]. We estimate that 77% of the genotype errors at these markers are attributable to uncalled deletions [Formula: see text].


Assuntos
Genótipo , Sequenciamento Completo do Genoma , Humanos , Polimorfismo de Nucleotídeo Único/genética , Modelos Genéticos , Frequência do Gene , Genoma Humano , Linhagem , Deleção de Sequência , Simulação por Computador
4.
Am J Hum Genet ; 110(1): 161-165, 2023 01 05.
Artigo em Inglês | MEDLINE | ID: mdl-36450278

RESUMO

The first release of UK Biobank whole-genome sequence data contains 150,119 genomes. We present an open-source pipeline for filtering, phasing, and indexing these genomes on the cloud-based UK Biobank Research Analysis Platform. This pipeline makes it possible to apply haplotype-based methods to UK Biobank whole-genome sequence data. The pipeline uses BCFtools for marker filtering, Beagle for genotype phasing, and Tabix for VCF indexing. We used the pipeline to phase 406 million single-nucleotide variants on chromosomes 1-22 and X at a cost of £2,309. The maximum time required to process a chromosome was 2.6 days. In order to assess phase accuracy, we modified the pipeline to exclude trio parents. We observed a switch error rate of 0.0016 on chromosome 20 in the White British trio offspring. If we exclude markers with nonmajor allele frequency < 0.1% after phasing, this switch error rate decreases by 80% to 0.00032.


Assuntos
Bancos de Espécimes Biológicos , Genoma , Humanos , Cães , Animais , Genótipo , Haplótipos/genética , Polimorfismo de Nucleotídeo Único/genética , Reino Unido , Algoritmos , Análise de Sequência de DNA/métodos
5.
Am J Hum Genet ; 110(2): 326-335, 2023 02 02.
Artigo em Inglês | MEDLINE | ID: mdl-36610402

RESUMO

Local ancestry is the source ancestry at each point in the genome of an admixed individual. Inferred local ancestry is used for admixture mapping and population genetic analyses. We present FLARE (fast local ancestry estimation), a method for local ancestry inference. FLARE achieves high accuracy through the use of an extended Li and Stephens model, and it achieves exceptional computational performance through incorporation of computational techniques developed for genotype imputation. Memory requirements are reduced through on-the-fly compression of reference haplotypes and stored checkpoints. Computation time is reduced through the use of composite reference haplotypes. These techniques allow FLARE to scale to datasets with hundreds of thousands of sequenced individuals and to provide superior accuracy on large-scale data. FLARE is open source and available at https://github.com/browning-lab/flare.


Assuntos
Genética Populacional , Genoma Humano , Humanos , Etnicidade , Genótipo , Haplótipos/genética
6.
Am J Hum Genet ; 109(6): 1016-1025, 2022 06 02.
Artigo em Inglês | MEDLINE | ID: mdl-35659928

RESUMO

Haplotypes can be estimated from unphased genotype data via statistical methods. When parent-offspring trios are available for inferring the true phase from Mendelian inheritance rules, the accuracy of statistical phasing is usually measured by the switch error rate, which is the proportion of pairs of consecutive heterozygotes that are incorrectly phased. We present a method for estimating the genotype error rate from parent-offspring trios and a method for estimating the bias that occurs in the observed switch error rate as a result of genotype error. We apply these methods to 485,301 genotyped UK Biobank samples that include 898 White British trios and to 38,387 sequenced TOPMed samples that include 217 African Caribbean trios and 669 European American trios. We show that genotype error inflates the observed switch error rate and that the relative bias increases with sample size. For the UK Biobank White British trios, the observed switch error rate in the trio offspring is 2.4 times larger than the estimated true switch error rate (1.4 × 10-3 vs 5.8 × 10-4. We propose an alternate definition of phase error that counts two consecutive switch errors as a single error because back-to-back switch errors arise when a single heterozygote is incorrectly phased with respect to the surrounding heterozygotes. With this definition, we estimate that the average distance between phase errors is 64 megabases in the UK Biobank White British individuals.


Assuntos
Hereditariedade , Polimorfismo de Nucleotídeo Único , Viés , Genótipo , Haplótipos/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética
7.
Am J Hum Genet ; 109(12): 2178-2184, 2022 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-36370709

RESUMO

We provide a method for estimating the genome-wide mutation rate from sequence data on unrelated individuals by using segments of identity by descent (IBD). The length of an IBD segment indicates the time to shared ancestor of the segment, and mutations that have occurred since the shared ancestor result in discordances between the two IBD haplotypes. Previous methods for IBD-based estimation of mutation rate have required the use of family data for accurate phasing of the genotypes. This has limited the scope of application of IBD-based mutation rate estimation. Here, we develop an IBD-based method for mutation rate estimation from population data, and we apply it to whole-genome sequence data on 4,166 European American individuals from the TOPMed Framingham Heart Study, 2,996 European American individuals from the TOPMed My Life, Our Future study, and 1,586 African American individuals from the TOPMed Hypertension Genetic Epidemiology Network study. Although mutation rates may differ between populations as a result of genetic factors, demographic factors such as average parental age, and environmental exposures, our results are consistent with equal genome-wide average mutation rates across these three populations. Our overall estimate of the average genome-wide mutation rate per 108 base pairs per generation for single-nucleotide variants is 1.24 (95% CI 1.18-1.33).


Assuntos
Genoma Humano , Taxa de Mutação , Humanos , Genoma Humano/genética , Polimorfismo de Nucleotídeo Único/genética , Haplótipos , Genótipo
8.
Am J Hum Genet ; 108(10): 1880-1890, 2021 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-34478634

RESUMO

Haplotype phasing is the estimation of haplotypes from genotype data. We present a fast, accurate, and memory-efficient haplotype phasing method that scales to large-scale SNP array and sequence data. The method uses marker windowing and composite reference haplotypes to reduce memory usage and computation time. It incorporates a progressive phasing algorithm that identifies confidently phased heterozygotes in each iteration and fixes the phase of these heterozygotes in subsequent iterations. For data with many low-frequency variants, such as whole-genome sequence data, the method employs a two-stage phasing algorithm that phases high-frequency markers via progressive phasing in the first stage and phases low-frequency markers via genotype imputation in the second stage. This haplotype phasing method is implemented in the open-source Beagle 5.2 software package. We compare Beagle 5.2 and SHAPEIT 4.2.1 by using expanding subsets of 485,301 UK Biobank samples and 38,387 TOPMed samples. Both methods have very similar accuracy and computation time for UK Biobank SNP array data. However, for TOPMed sequence data, Beagle is more than 20 times faster than SHAPEIT, achieves similar accuracy, and scales to larger sample sizes.


Assuntos
Asma/genética , Fibrilação Atrial/genética , Interpretação Estatística de Dados , Genoma Humano , Haplótipos , Polimorfismo de Nucleotídeo Único , Software , Algoritmos , Feminino , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Masculino
9.
Am J Hum Genet ; 107(5): 895-910, 2020 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33053335

RESUMO

Most methods for fast detection of identity by descent (IBD) segments report identity by state segments without any quantification of the uncertainty in the endpoints and lengths of the IBD segments. We present a method for determining the posterior probability distribution of IBD segment endpoints. Our approach accounts for genotype errors, recent mutations, and gene conversions which disrupt DNA sequence identity within IBD segments, and it can be applied to large cohorts with whole-genome sequence or SNP array data. We find that our method's estimates of uncertainty are well calibrated for homogeneous samples. We quantify endpoint uncertainty for 77.7 billion IBD segments from 408,883 individuals of white British ancestry in the UK Biobank, and we use these IBD segments to find regions showing evidence of recent natural selection. We show that many spurious selection signals are eliminated by the use of unbiased estimates of IBD segment endpoints and a pedigree-based genetic map. Eleven of the twelve regions with the greatest evidence for recent selection in our scan have been identified as selected in previous analyses using different approaches. Our computationally efficient method for quantifying IBD segment endpoint uncertainty is implemented in the open source ibd-ends software package.


Assuntos
Identificação Biométrica/métodos , Mapeamento Cromossômico/estatística & dados numéricos , Genoma Humano , Padrões de Herança , Modelos Estatísticos , Polimorfismo de Nucleotídeo Único , Bancos de Espécimes Biológicos , Família , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Linhagem , Software , Incerteza , Reino Unido
10.
Am J Hum Genet ; 106(4): 426-437, 2020 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-32169169

RESUMO

Segments of identity by descent (IBD) are used in many genetic analyses. We present a method for detecting identical-by-descent haplotype segments in phased genotype data. Our method, called hap-IBD, combines a compressed representation of haplotype data, the positional Burrows-Wheeler transform, and multi-threaded execution to produce very fast analysis times. An attractive feature of hap-IBD is its simplicity: the input parameters clearly and precisely define the IBD segments that are reported, so that program correctness can be confirmed by users. We evaluate hap-IBD and four state-of-the-art IBD segment detection methods (GERMLINE, iLASH, RaPID, and TRUFFLE) using UK Biobank chromosome 20 data and simulated sequence data. We show that hap-IBD detects IBD segments faster and more accurately than competing methods, and that hap-IBD is the only method that can rapidly and accurately detect short 2-4 centiMorgan (cM) IBD segments in the full UK Biobank data. Analysis of 485,346 UK Biobank samples through the use of hap-IBD with 12 computational threads detects 231.5 billion autosomal IBD segments with length ≥2 cM in 24.4 h.


Assuntos
Genoma Humano/genética , Análise de Sequência de DNA/métodos , Alelos , Cromossomos/genética , Simulação por Computador , Análise de Dados , Marcadores Genéticos/genética , Genética Populacional/métodos , Genótipo , Haplótipos/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Software
11.
Am J Hum Genet ; 107(1): 137-148, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32533945

RESUMO

Recombination rates vary significantly across the genome, and estimates of recombination rates are needed for downstream analyses such as haplotype phasing and genotype imputation. Existing methods for recombination rate estimation are limited by insufficient amounts of informative genetic data or by high computational cost. We present a method and software, called IBDrecomb, for using segments of identity by descent to infer recombination rates. IBDrecomb can be applied to sequenced population cohorts to obtain high-resolution, population-specific recombination maps. In simulated admixed data, IBDrecomb obtains higher accuracy than admixture-based estimation of recombination rates. When applied to 2,500 simulated individuals, IBDrecomb obtains similar accuracy to a linkage-disequilibrium (LD)-based method applied to 96 individuals (the largest number for which computation is tractable). Compared to LD-based maps, our IBD-based maps have the advantage of estimating recombination rates in the recent past rather than the distant past. We used IBDrecomb to generate new recombination maps for European Americans and for African Americans from TOPMed sequence data from the Framingham Heart Study (1,626 unrelated individuals) and the Jackson Heart Study (2,046 unrelated individuals), and we compare them to LD-based, admixture-based, and family-based maps.


Assuntos
Recombinação Genética/genética , Negro ou Afro-Americano/genética , Genética Populacional/métodos , Genoma Humano/genética , Haplótipos/genética , Humanos , Desequilíbrio de Ligação/genética , Polimorfismo de Nucleotídeo Único/genética , População Branca/genética
12.
Proc Natl Acad Sci U S A ; 117(5): 2560-2569, 2020 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-31964835

RESUMO

De novo mutations (DNMs), or mutations that appear in an individual despite not being seen in their parents, are an important source of genetic variation whose impact is relevant to studies of human evolution, genetics, and disease. Utilizing high-coverage whole-genome sequencing data as part of the Trans-Omics for Precision Medicine (TOPMed) Program, we called 93,325 single-nucleotide DNMs across 1,465 trios from an array of diverse human populations, and used them to directly estimate and analyze DNM counts, rates, and spectra. We find a significant positive correlation between local recombination rate and local DNM rate, and that DNM rate explains a substantial portion (8.98 to 34.92%, depending on the model) of the genome-wide variation in population-level genetic variation from 41K unrelated TOPMed samples. Genome-wide heterozygosity does correlate with DNM rate, but only explains <1% of variation. While we are underpowered to see small differences, we do not find significant differences in DNM rate between individuals of European, African, and Latino ancestry, nor across ancestrally distinct segments within admixed individuals. However, we did find significantly fewer DNMs in Amish individuals, even when compared with other Europeans, and even after accounting for parental age and sequencing center. Specifically, we found significant reductions in the number of C→A and T→C mutations in the Amish, which seem to underpin their overall reduction in DNMs. Finally, we calculated near-zero estimates of narrow sense heritability (h2), which suggest that variation in DNM rate is significantly shaped by nonadditive genetic effects and the environment.


Assuntos
Amish/genética , Genoma Humano , Adulto , Estudos de Coortes , Análise Mutacional de DNA , Feminino , Genética Populacional , Heterozigoto , Humanos , Masculino , Mutação , Linhagem , Sequenciamento Completo do Genoma , Adulto Jovem
13.
Am J Hum Genet ; 105(5): 883-893, 2019 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-31587867

RESUMO

The two primary methods for estimating the genome-wide mutation rate have been counting de novo mutations in parent-offspring trios and comparing sequence data between closely related species. With parent-offspring trio analysis it is difficult to control for genotype error, and resolution is limited because each trio provides information from only two meioses. Inter-species comparison is difficult to calibrate due to uncertainty in the number of meioses separating species, and it can be biased by selection and by changing mutation rates over time. An alternative class of approaches for estimating mutation rates that avoids these limitations is based on identity by descent (IBD) segments that arise from common ancestry within the past few thousand years. Existing IBD-based methods are limited to highly inbred samples, or lack robustness to genotype error and error in the estimated demographic history. We present an IBD-based method that uses sharing of IBD segments among sets of three individuals to estimate the mutation rate. Our method is applicable to accurately phased genotype data, such as parent-offspring trio data phased using Mendelian rules of inheritance. Unlike standard parent-offspring analysis, our method utilizes distant relationships and is robust to genotype error. We apply our method to data from 1,307 European-ancestry individuals in the Framingham Heart Study sequenced by the NHLBI TOPMed project. We obtain an estimate of 1.29 × 10-8 mutations per base pair per meiosis with a 95% confidence interval of [1.02 × 10-8, 1.56 × 10-8].


Assuntos
Genoma Humano/genética , Mutação/genética , Genótipo , Hereditariedade/genética , Humanos , Meiose/genética , Taxa de Mutação , Linhagem , Polimorfismo de Nucleotídeo Único/genética
14.
Am J Hum Genet ; 104(3): 454-465, 2019 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-30773276

RESUMO

Admixture mapping studies have become more common in recent years, due in part to technological advances and growing international efforts to increase the diversity of genetic studies. However, many open questions remain about appropriate implementation of admixture mapping studies, including how best to control for multiple testing, particularly in the presence of population structure. In this study, we develop a theoretical framework to characterize the correlation of local ancestry and admixture mapping test statistics in admixed populations with contributions from any number of ancestral populations and arbitrary population structure. Based on this framework, we develop an analytical approach for obtaining genome-wide significance thresholds for admixture mapping studies. We validate our approach via analysis of simulated traits with real genotype data for 8,064 unrelated African American and 3,425 Hispanic/Latina women from the Women's Health Initiative SNP Health Association Resource (WHI SHARe). In an application to these WHI SHARe data, our approach yields genome-wide significant p value thresholds of 2.1 × 10-5 and 4.5 × 10-6 for admixture mapping studies in the African American and Hispanic/Latina cohorts, respectively. Compared to other commonly used multiple testing correction procedures, our method is fast, easy to implement (using our publicly available R package), and controls the family-wise error rate even in structured populations. Importantly, we note that the appropriate admixture mapping significance threshold depends on the number of ancestral populations, generations since admixture, and population structure of the sample; as a result, significance thresholds are not, in general, transferable across studies.


Assuntos
Negro ou Afro-Americano/genética , Biologia Computacional/métodos , Genética Populacional , Genoma Humano , Estudo de Associação Genômica Ampla , Hispânico ou Latino/genética , População Branca/genética , Idoso , Mapeamento Cromossômico , Feminino , Genótipo , Humanos , Pessoa de Meia-Idade , Fenótipo , Pós-Menopausa
15.
Hum Mol Genet ; 28(4): 675-687, 2019 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-30403821

RESUMO

Obstructive sleep apnea (OSA) is a common disorder associated with increased risk of cardiovascular disease and mortality. Its prevalence and severity vary across ancestral background. Although OSA traits are heritable, few genetic associations have been identified. To identify genetic regions associated with OSA and improve statistical power, we applied admixture mapping on three primary OSA traits [the apnea hypopnea index (AHI), overnight average oxyhemoglobin saturation (SaO2) and percentage time SaO2 < 90%] and a secondary trait (respiratory event duration) in a Hispanic/Latino American population study of 11 575 individuals with significant variation in ancestral background. Linear mixed models were performed using previously inferred African, European and Amerindian local genetic ancestry markers. Global African ancestry was associated with a lower AHI, higher SaO2 and shorter event duration. Admixture mapping analysis of the primary OSA traits identified local African ancestry at the chromosomal region 2q37 as genome-wide significantly associated with AHI (P < 5.7 × 10-5), and European and Amerindian ancestries at 18q21 suggestively associated with both AHI and percentage time SaO2 < 90% (P < 10-3). Follow-up joint ancestry-SNP association analyses identified novel variants in ferrochelatase (FECH), significantly associated with AHI and percentage time SaO2 < 90% after adjusting for multiple tests (P < 8 × 10-6). These signals contributed to the admixture mapping associations and were replicated in independent cohorts. In this first admixture mapping study of OSA, novel associations with variants in the iron/heme metabolism pathway suggest a role for iron in influencing respiratory traits underlying OSA.


Assuntos
Ferroquelatase/genética , Estudo de Associação Genômica Ampla , Apneia Obstrutiva do Sono/genética , Idoso , Mapeamento Cromossômico , Feminino , Genótipo , Hispânico ou Latino/genética , Humanos , Masculino , Pessoa de Meia-Idade , Polimorfismo de Nucleotídeo Único/genética , Polissonografia , Apneia Obstrutiva do Sono/diagnóstico por imagem , Apneia Obstrutiva do Sono/fisiopatologia , População Branca/genética
16.
Am J Hum Genet ; 103(3): 338-348, 2018 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-30100085

RESUMO

Genotype imputation is commonly performed in genome-wide association studies because it greatly increases the number of markers that can be tested for association with a trait. In general, one should perform genotype imputation using the largest reference panel that is available because the number of accurately imputed variants increases with reference panel size. However, one impediment to using larger reference panels is the increased computational cost of imputation. We present a new genotype imputation method, Beagle 5.0, which greatly reduces the computational cost of imputation from large reference panels. We compare Beagle 5.0 with Beagle 4.1, Impute4, Minimac3, and Minimac4 using 1000 Genomes Project data, Haplotype Reference Consortium data, and simulated data for 10k, 100k, 1M, and 10M reference samples. All methods produce nearly identical accuracy, but Beagle 5.0 has the lowest computation time and the best scaling of computation time with increasing reference panel size. For 10k, 100k, 1M, and 10M reference samples and 1,000 phased target samples, Beagle 5.0's computation time is 3× (10k), 12× (100k), 43× (1M), and 533× (10M) faster than the fastest alternative method. Cost data from the Amazon Elastic Compute Cloud show that Beagle 5.0 can perform genome-wide imputation from 10M reference samples into 1,000 phased target samples at a cost of less than one US cent per sample.


Assuntos
Genoma Humano/genética , Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Haplótipos/genética , Humanos , Software
17.
Bioinformatics ; 36(16): 4519-4520, 2020 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-32844204

RESUMO

MOTIVATION: Estimation of pairwise kinship coefficients in large datasets is computationally challenging because the number of related individuals increases quadratically with sample size. RESULTS: We present IBDkin, a software package written in C for estimating kinship coefficients from identity by descent (IBD) segments. We use IBDkin to estimate kinship coefficients for 7.95 billion pairs of individuals in the UK Biobank who share at least one detected IBD segment with length ≥ 4 cM. AVAILABILITY AND IMPLEMENTATION: https://github.com/YingZhou001/IBDkin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Humanos
18.
PLoS Genet ; 14(5): e1007385, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29795556

RESUMO

Populations change in size over time due to factors such as population growth, migration, bottleneck events, natural disasters, and disease. The historical effective size of a population affects the power and resolution of genetic association studies. For admixed populations, it is not only the overall effective population size that is of interest, but also the effective sizes of the component ancestral populations. We use identity by descent and local ancestry inferred from genome-wide genetic data to estimate overall and ancestry-specific effective population size during the past hundred generations for nine admixed American populations from the Hispanic Community Health Study/Study of Latinos, and for African-American and European-American populations from two US cities. In these populations, the estimated pre-admixture effective sizes of the ancestral populations vary by sampled population, suggesting that the ancestors of different sampled populations were drawn from different sub-populations. In addition, we estimate that overall effective population sizes dropped substantially in the generations immediately after the commencement of European and African immigration, reaching a minimum around 12 generations ago, but rebounded within a small number of generations afterwards. Of the populations that we considered, the population of individuals originating from Puerto Rico has the smallest bottleneck size of one thousand, while the Pittsburgh African-American population has the largest bottleneck size of two hundred thousand.


Assuntos
Negro ou Afro-Americano/genética , Genoma Humano/genética , Hispânico ou Latino/genética , População Branca/genética , Negro ou Afro-Americano/estatística & dados numéricos , América , Simulação por Computador , Estudos de Associação Genética/métodos , Genética Populacional/métodos , Haplótipos , Hispânico ou Latino/estatística & dados numéricos , Humanos , Densidade Demográfica , Estados Unidos , População Branca/estatística & dados numéricos
19.
Annu Rev Genet ; 46: 617-33, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22994355

RESUMO

Short segments of identity by descent (IBD) between individuals with no known relationship can be detected using genome-wide single nucleotide polymorphism data and recently developed statistical methodology. Emerging applications for the detected IBD segments include IBD mapping, haplotype phase inference, genotype imputation, and inference of population structure. In this review, we explain the principles behind methods for IBD segment detection, describe recently developed methods, discuss approaches to comparing methods, and give an overview of applications.


Assuntos
Mapeamento Cromossômico/métodos , Genética Populacional/métodos , Linhagem , Alelos , Cromossomos Humanos/genética , Biologia Computacional/métodos , Simulação por Computador , Frequência do Gene , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Haplótipos , Humanos , Padrões de Herança , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único
20.
PLoS Genet ; 13(4): e1006760, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28453575

RESUMO

Prior GWAS have identified loci associated with red blood cell (RBC) traits in populations of European, African, and Asian ancestry. These studies have not included individuals with an Amerindian ancestral background, such as Hispanics/Latinos, nor evaluated the full spectrum of genomic variation beyond single nucleotide variants. Using a custom genotyping array enriched for Amerindian ancestral content and 1000 Genomes imputation, we performed GWAS in 12,502 participants of Hispanic Community Health Study and Study of Latinos (HCHS/SOL) for hematocrit, hemoglobin, RBC count, RBC distribution width (RDW), and RBC indices. Approximately 60% of previously reported RBC trait loci generalized to HCHS/SOL Hispanics/Latinos, including African ancestral alpha- and beta-globin gene variants. In addition to the known 3.8kb alpha-globin copy number variant, we identified an Amerindian ancestral association in an alpha-globin regulatory region on chromosome 16p13.3 for mean corpuscular volume and mean corpuscular hemoglobin. We also discovered and replicated three genome-wide significant variants in previously unreported loci for RDW (SLC12A2 rs17764730, PSMB5 rs941718), and hematocrit (PROX1 rs3754140). Among the proxy variants at the SLC12A2 locus we identified rs3812049, located in a bi-directional promoter between SLC12A2 (which encodes a red cell membrane ion-transport protein) and an upstream anti-sense long-noncoding RNA, LINC01184, as the likely causal variant. We further demonstrate that disruption of the regulatory element harboring rs3812049 affects transcription of SLC12A2 and LINC01184 in human erythroid progenitor cells. Together, these results reinforce the importance of genetic study of diverse ancestral populations, in particular Hispanics/Latinos.


Assuntos
Proteínas de Homeodomínio/genética , Complexo de Endopeptidases do Proteassoma/genética , RNA Longo não Codificante/genética , Membro 2 da Família 12 de Carreador de Soluto/genética , Proteínas Supressoras de Tumor/genética , alfa-Globinas/genética , Contagem de Eritrócitos , Eritrócitos , Feminino , Estudo de Associação Genômica Ampla , Hemoglobinas/genética , Hispânico ou Latino/genética , Humanos , Masculino , Polimorfismo de Nucleotídeo Único , Globinas beta/genética
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa