Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 69
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 173(1): 53-61.e9, 2018 03 22.
Artículo en Inglés | MEDLINE | ID: mdl-29551270

RESUMEN

Anatomically modern humans interbred with Neanderthals and with a related archaic population known as Denisovans. Genomes of several Neanderthals and one Denisovan have been sequenced, and these reference genomes have been used to detect introgressed genetic material in present-day human genomes. Segments of introgression also can be detected without use of reference genomes, and doing so can be advantageous for finding introgressed segments that are less closely related to the sequenced archaic genomes. We apply a new reference-free method for detecting archaic introgression to 5,639 whole-genome sequences from Eurasia and Oceania. We find Denisovan ancestry in populations from East and South Asia and Papuans. Denisovan ancestry comprises two components with differing similarity to the sequenced Altai Denisovan individual. This indicates that at least two distinct instances of Denisovan admixture into modern humans occurred, involving Denisovan populations that had different levels of relatedness to the sequenced Altai Denisovan. VIDEO ABSTRACT.


Asunto(s)
Genoma Humano , Animales , Pueblo Asiatico/genética , Humanos , Hombre de Neandertal/genética , Selección Genética , Secuenciación del Exoma
2.
Am J Hum Genet ; 111(4): 691-700, 2024 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-38513668

RESUMEN

We present a method for efficiently identifying clusters of identical-by-descent haplotypes in biobank-scale sequence data. Our multi-individual approach enables much more computationally efficient inference of identity by descent (IBD) than approaches that infer pairwise IBD segments and provides locus-specific IBD clusters rather than IBD segments. Our method's computation time, memory requirements, and output size scale linearly with the number of individuals in the dataset. We also present a method for using multi-individual IBD to detect alleles changed by gene conversion. Application of our methods to the autosomal sequence data for 125,361 White British individuals in the UK Biobank detects more than 9 million converted alleles. This is 2,900 times more alleles changed by gene conversion than were detected in a previous analysis of familial data. We estimate that more than 250,000 sequenced probands and a much larger number of additional genomes from multi-generational family members would be required to find a similar number of alleles changed by gene conversion using a family-based approach. Our IBD clustering method is implemented in the open-source ibd-cluster software package.


Asunto(s)
Bancos de Muestras Biológicas , Conversión Génica , Humanos , Programas Informáticos , Haplotipos/genética , Cromosomas , Polimorfismo de Nucleótido Simple
3.
Nature ; 590(7845): 290-299, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33568819

RESUMEN

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.


Asunto(s)
Variación Genética/genética , Genoma Humano/genética , Genómica , National Heart, Lung, and Blood Institute (U.S.) , Medicina de Precisión , Citocromo P-450 CYP2D6/genética , Haplotipos/genética , Heterocigoto , Humanos , Mutación INDEL , Mutación con Pérdida de Función , Mutagénesis , Fenotipo , Polimorfismo de Nucleótido Simple , Densidad de Población , Medicina de Precisión/normas , Control de Calidad , Tamaño de la Muestra , Estados Unidos , Secuenciación Completa del Genoma/normas
4.
PLoS Genet ; 20(5): e1011297, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38787916

RESUMEN

Genotype data include errors that may influence conclusions reached by downstream statistical analyses. Previous studies have estimated genotype error rates from discrepancies in human pedigree data, such as Mendelian inconsistent genotypes or apparent phase violations. However, uncalled deletions, which generally have not been accounted for in these studies, can lead to biased error rate estimates. In this study, we propose a genotype error model that considers both genotype errors and uncalled deletions when calculating the likelihood of the observed genotypes in parent-offspring trios. Using simulations, we show that when there are uncalled deletions, our model produces genotype error rate estimates that are less biased than estimates from a model that does not account for these deletions. We applied our model to SNVs in 77 sequenced White British parent-offspring trios in the UK Biobank. We use the Akaike information criterion to show that our model fits the data better than a model that does not account for uncalled deletions. We estimate the genotype error rate at SNVs with minor allele frequency > 0.001 in these data to be [Formula: see text]. We estimate that 77% of the genotype errors at these markers are attributable to uncalled deletions [Formula: see text].


Asunto(s)
Genotipo , Secuenciación Completa del Genoma , Humanos , Polimorfismo de Nucleótido Simple/genética , Modelos Genéticos , Frecuencia de los Genes , Genoma Humano , Linaje , Eliminación de Secuencia , Simulación por Computador
5.
Am J Hum Genet ; 110(1): 161-165, 2023 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-36450278

RESUMEN

The first release of UK Biobank whole-genome sequence data contains 150,119 genomes. We present an open-source pipeline for filtering, phasing, and indexing these genomes on the cloud-based UK Biobank Research Analysis Platform. This pipeline makes it possible to apply haplotype-based methods to UK Biobank whole-genome sequence data. The pipeline uses BCFtools for marker filtering, Beagle for genotype phasing, and Tabix for VCF indexing. We used the pipeline to phase 406 million single-nucleotide variants on chromosomes 1-22 and X at a cost of £2,309. The maximum time required to process a chromosome was 2.6 days. In order to assess phase accuracy, we modified the pipeline to exclude trio parents. We observed a switch error rate of 0.0016 on chromosome 20 in the White British trio offspring. If we exclude markers with nonmajor allele frequency < 0.1% after phasing, this switch error rate decreases by 80% to 0.00032.


Asunto(s)
Bancos de Muestras Biológicas , Genoma , Humanos , Perros , Animales , Genotipo , Haplotipos/genética , Polimorfismo de Nucleótido Simple/genética , Reino Unido , Algoritmos , Análisis de Secuencia de ADN/métodos
6.
Am J Hum Genet ; 110(2): 326-335, 2023 02 02.
Artículo en Inglés | MEDLINE | ID: mdl-36610402

RESUMEN

Local ancestry is the source ancestry at each point in the genome of an admixed individual. Inferred local ancestry is used for admixture mapping and population genetic analyses. We present FLARE (fast local ancestry estimation), a method for local ancestry inference. FLARE achieves high accuracy through the use of an extended Li and Stephens model, and it achieves exceptional computational performance through incorporation of computational techniques developed for genotype imputation. Memory requirements are reduced through on-the-fly compression of reference haplotypes and stored checkpoints. Computation time is reduced through the use of composite reference haplotypes. These techniques allow FLARE to scale to datasets with hundreds of thousands of sequenced individuals and to provide superior accuracy on large-scale data. FLARE is open source and available at https://github.com/browning-lab/flare.


Asunto(s)
Genética de Población , Genoma Humano , Humanos , Etnicidad , Genotipo , Haplotipos/genética
7.
Am J Hum Genet ; 109(6): 1016-1025, 2022 06 02.
Artículo en Inglés | MEDLINE | ID: mdl-35659928

RESUMEN

Haplotypes can be estimated from unphased genotype data via statistical methods. When parent-offspring trios are available for inferring the true phase from Mendelian inheritance rules, the accuracy of statistical phasing is usually measured by the switch error rate, which is the proportion of pairs of consecutive heterozygotes that are incorrectly phased. We present a method for estimating the genotype error rate from parent-offspring trios and a method for estimating the bias that occurs in the observed switch error rate as a result of genotype error. We apply these methods to 485,301 genotyped UK Biobank samples that include 898 White British trios and to 38,387 sequenced TOPMed samples that include 217 African Caribbean trios and 669 European American trios. We show that genotype error inflates the observed switch error rate and that the relative bias increases with sample size. For the UK Biobank White British trios, the observed switch error rate in the trio offspring is 2.4 times larger than the estimated true switch error rate (1.4 × 10-3 vs 5.8 × 10-4. We propose an alternate definition of phase error that counts two consecutive switch errors as a single error because back-to-back switch errors arise when a single heterozygote is incorrectly phased with respect to the surrounding heterozygotes. With this definition, we estimate that the average distance between phase errors is 64 megabases in the UK Biobank White British individuals.


Asunto(s)
Herencia , Polimorfismo de Nucleótido Simple , Sesgo , Genotipo , Haplotipos/genética , Humanos , Polimorfismo de Nucleótido Simple/genética
8.
Am J Hum Genet ; 108(10): 1880-1890, 2021 10 07.
Artículo en Inglés | MEDLINE | ID: mdl-34478634

RESUMEN

Haplotype phasing is the estimation of haplotypes from genotype data. We present a fast, accurate, and memory-efficient haplotype phasing method that scales to large-scale SNP array and sequence data. The method uses marker windowing and composite reference haplotypes to reduce memory usage and computation time. It incorporates a progressive phasing algorithm that identifies confidently phased heterozygotes in each iteration and fixes the phase of these heterozygotes in subsequent iterations. For data with many low-frequency variants, such as whole-genome sequence data, the method employs a two-stage phasing algorithm that phases high-frequency markers via progressive phasing in the first stage and phases low-frequency markers via genotype imputation in the second stage. This haplotype phasing method is implemented in the open-source Beagle 5.2 software package. We compare Beagle 5.2 and SHAPEIT 4.2.1 by using expanding subsets of 485,301 UK Biobank samples and 38,387 TOPMed samples. Both methods have very similar accuracy and computation time for UK Biobank SNP array data. However, for TOPMed sequence data, Beagle is more than 20 times faster than SHAPEIT, achieves similar accuracy, and scales to larger sample sizes.


Asunto(s)
Asma/genética , Fibrilación Atrial/genética , Interpretación Estadística de Datos , Genoma Humano , Haplotipos , Polimorfismo de Nucleótido Simple , Programas Informáticos , Algoritmos , Femenino , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Masculino
9.
Am J Hum Genet ; 107(5): 895-910, 2020 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-33053335

RESUMEN

Most methods for fast detection of identity by descent (IBD) segments report identity by state segments without any quantification of the uncertainty in the endpoints and lengths of the IBD segments. We present a method for determining the posterior probability distribution of IBD segment endpoints. Our approach accounts for genotype errors, recent mutations, and gene conversions which disrupt DNA sequence identity within IBD segments, and it can be applied to large cohorts with whole-genome sequence or SNP array data. We find that our method's estimates of uncertainty are well calibrated for homogeneous samples. We quantify endpoint uncertainty for 77.7 billion IBD segments from 408,883 individuals of white British ancestry in the UK Biobank, and we use these IBD segments to find regions showing evidence of recent natural selection. We show that many spurious selection signals are eliminated by the use of unbiased estimates of IBD segment endpoints and a pedigree-based genetic map. Eleven of the twelve regions with the greatest evidence for recent selection in our scan have been identified as selected in previous analyses using different approaches. Our computationally efficient method for quantifying IBD segment endpoint uncertainty is implemented in the open source ibd-ends software package.


Asunto(s)
Identificación Biométrica/métodos , Mapeo Cromosómico/estadística & datos numéricos , Genoma Humano , Patrón de Herencia , Modelos Estadísticos , Polimorfismo de Nucleótido Simple , Bancos de Muestras Biológicas , Familia , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Linaje , Programas Informáticos , Incertidumbre , Reino Unido
10.
Am J Hum Genet ; 106(4): 426-437, 2020 04 02.
Artículo en Inglés | MEDLINE | ID: mdl-32169169

RESUMEN

Segments of identity by descent (IBD) are used in many genetic analyses. We present a method for detecting identical-by-descent haplotype segments in phased genotype data. Our method, called hap-IBD, combines a compressed representation of haplotype data, the positional Burrows-Wheeler transform, and multi-threaded execution to produce very fast analysis times. An attractive feature of hap-IBD is its simplicity: the input parameters clearly and precisely define the IBD segments that are reported, so that program correctness can be confirmed by users. We evaluate hap-IBD and four state-of-the-art IBD segment detection methods (GERMLINE, iLASH, RaPID, and TRUFFLE) using UK Biobank chromosome 20 data and simulated sequence data. We show that hap-IBD detects IBD segments faster and more accurately than competing methods, and that hap-IBD is the only method that can rapidly and accurately detect short 2-4 centiMorgan (cM) IBD segments in the full UK Biobank data. Analysis of 485,346 UK Biobank samples through the use of hap-IBD with 12 computational threads detects 231.5 billion autosomal IBD segments with length ≥2 cM in 24.4 h.


Asunto(s)
Genoma Humano/genética , Análisis de Secuencia de ADN/métodos , Alelos , Cromosomas/genética , Simulación por Computador , Análisis de Datos , Marcadores Genéticos/genética , Genética de Población/métodos , Genotipo , Haplotipos/genética , Humanos , Polimorfismo de Nucleótido Simple/genética , Programas Informáticos
11.
Am J Hum Genet ; 107(1): 137-148, 2020 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-32533945

RESUMEN

Recombination rates vary significantly across the genome, and estimates of recombination rates are needed for downstream analyses such as haplotype phasing and genotype imputation. Existing methods for recombination rate estimation are limited by insufficient amounts of informative genetic data or by high computational cost. We present a method and software, called IBDrecomb, for using segments of identity by descent to infer recombination rates. IBDrecomb can be applied to sequenced population cohorts to obtain high-resolution, population-specific recombination maps. In simulated admixed data, IBDrecomb obtains higher accuracy than admixture-based estimation of recombination rates. When applied to 2,500 simulated individuals, IBDrecomb obtains similar accuracy to a linkage-disequilibrium (LD)-based method applied to 96 individuals (the largest number for which computation is tractable). Compared to LD-based maps, our IBD-based maps have the advantage of estimating recombination rates in the recent past rather than the distant past. We used IBDrecomb to generate new recombination maps for European Americans and for African Americans from TOPMed sequence data from the Framingham Heart Study (1,626 unrelated individuals) and the Jackson Heart Study (2,046 unrelated individuals), and we compare them to LD-based, admixture-based, and family-based maps.


Asunto(s)
Recombinación Genética/genética , Negro o Afroamericano/genética , Genética de Población/métodos , Genoma Humano/genética , Haplotipos/genética , Humanos , Desequilibrio de Ligamiento/genética , Polimorfismo de Nucleótido Simple/genética , Población Blanca/genética
12.
Am J Hum Genet ; 105(5): 883-893, 2019 11 07.
Artículo en Inglés | MEDLINE | ID: mdl-31587867

RESUMEN

The two primary methods for estimating the genome-wide mutation rate have been counting de novo mutations in parent-offspring trios and comparing sequence data between closely related species. With parent-offspring trio analysis it is difficult to control for genotype error, and resolution is limited because each trio provides information from only two meioses. Inter-species comparison is difficult to calibrate due to uncertainty in the number of meioses separating species, and it can be biased by selection and by changing mutation rates over time. An alternative class of approaches for estimating mutation rates that avoids these limitations is based on identity by descent (IBD) segments that arise from common ancestry within the past few thousand years. Existing IBD-based methods are limited to highly inbred samples, or lack robustness to genotype error and error in the estimated demographic history. We present an IBD-based method that uses sharing of IBD segments among sets of three individuals to estimate the mutation rate. Our method is applicable to accurately phased genotype data, such as parent-offspring trio data phased using Mendelian rules of inheritance. Unlike standard parent-offspring analysis, our method utilizes distant relationships and is robust to genotype error. We apply our method to data from 1,307 European-ancestry individuals in the Framingham Heart Study sequenced by the NHLBI TOPMed project. We obtain an estimate of 1.29 × 10-8 mutations per base pair per meiosis with a 95% confidence interval of [1.02 × 10-8, 1.56 × 10-8].


Asunto(s)
Genoma Humano/genética , Mutación/genética , Genotipo , Herencia/genética , Humanos , Meiosis/genética , Tasa de Mutación , Linaje , Polimorfismo de Nucleótido Simple/genética
13.
Annu Rev Genomics Hum Genet ; 19: 73-96, 2018 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-29799802

RESUMEN

Genotype imputation has become a standard tool in genome-wide association studies because it enables researchers to inexpensively approximate whole-genome sequence data from genome-wide single-nucleotide polymorphism array data. Genotype imputation increases statistical power, facilitates fine mapping of causal variants, and plays a key role in meta-analyses of genome-wide association studies. Only variants that were previously observed in a reference panel of sequenced individuals can be imputed. However, the rapid increase in the number of deeply sequenced individuals will soon make it possible to assemble enormous reference panels that greatly increase the number of imputable variants. In this review, we present an overview of genotype imputation and describe the computational techniques that make it possible to impute genotypes from reference panels with millions of individuals.


Asunto(s)
Genotipo , Simulación por Computador , Estudio de Asociación del Genoma Completo/métodos , Humanos , Modelos Genéticos , Polimorfismo de Nucleótido Simple
14.
Am J Hum Genet ; 103(3): 338-348, 2018 09 06.
Artículo en Inglés | MEDLINE | ID: mdl-30100085

RESUMEN

Genotype imputation is commonly performed in genome-wide association studies because it greatly increases the number of markers that can be tested for association with a trait. In general, one should perform genotype imputation using the largest reference panel that is available because the number of accurately imputed variants increases with reference panel size. However, one impediment to using larger reference panels is the increased computational cost of imputation. We present a new genotype imputation method, Beagle 5.0, which greatly reduces the computational cost of imputation from large reference panels. We compare Beagle 5.0 with Beagle 4.1, Impute4, Minimac3, and Minimac4 using 1000 Genomes Project data, Haplotype Reference Consortium data, and simulated data for 10k, 100k, 1M, and 10M reference samples. All methods produce nearly identical accuracy, but Beagle 5.0 has the lowest computation time and the best scaling of computation time with increasing reference panel size. For 10k, 100k, 1M, and 10M reference samples and 1,000 phased target samples, Beagle 5.0's computation time is 3× (10k), 12× (100k), 43× (1M), and 533× (10M) faster than the fastest alternative method. Cost data from the Amazon Elastic Compute Cloud show that Beagle 5.0 can perform genome-wide imputation from 10M reference samples into 1,000 phased target samples at a cost of less than one US cent per sample.


Asunto(s)
Genoma Humano/genética , Biología Computacional/métodos , Estudio de Asociación del Genoma Completo/métodos , Haplotipos/genética , Humanos , Programas Informáticos
15.
Bioinformatics ; 36(16): 4519-4520, 2020 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-32844204

RESUMEN

MOTIVATION: Estimation of pairwise kinship coefficients in large datasets is computationally challenging because the number of related individuals increases quadratically with sample size. RESULTS: We present IBDkin, a software package written in C for estimating kinship coefficients from identity by descent (IBD) segments. We use IBDkin to estimate kinship coefficients for 7.95 billion pairs of individuals in the UK Biobank who share at least one detected IBD segment with length ≥ 4 cM. AVAILABILITY AND IMPLEMENTATION: https://github.com/YingZhou001/IBDkin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Humanos
16.
PLoS Genet ; 14(5): e1007385, 2018 05.
Artículo en Inglés | MEDLINE | ID: mdl-29795556

RESUMEN

Populations change in size over time due to factors such as population growth, migration, bottleneck events, natural disasters, and disease. The historical effective size of a population affects the power and resolution of genetic association studies. For admixed populations, it is not only the overall effective population size that is of interest, but also the effective sizes of the component ancestral populations. We use identity by descent and local ancestry inferred from genome-wide genetic data to estimate overall and ancestry-specific effective population size during the past hundred generations for nine admixed American populations from the Hispanic Community Health Study/Study of Latinos, and for African-American and European-American populations from two US cities. In these populations, the estimated pre-admixture effective sizes of the ancestral populations vary by sampled population, suggesting that the ancestors of different sampled populations were drawn from different sub-populations. In addition, we estimate that overall effective population sizes dropped substantially in the generations immediately after the commencement of European and African immigration, reaching a minimum around 12 generations ago, but rebounded within a small number of generations afterwards. Of the populations that we considered, the population of individuals originating from Puerto Rico has the smallest bottleneck size of one thousand, while the Pittsburgh African-American population has the largest bottleneck size of two hundred thousand.


Asunto(s)
Negro o Afroamericano/genética , Genoma Humano/genética , Hispánicos o Latinos/genética , Población Blanca/genética , Negro o Afroamericano/estadística & datos numéricos , Américas , Simulación por Computador , Estudios de Asociación Genética/métodos , Genética de Población/métodos , Haplotipos , Hispánicos o Latinos/estadística & datos numéricos , Humanos , Densidad de Población , Estados Unidos , Población Blanca/estadística & datos numéricos
17.
Breast Cancer Res ; 22(1): 108, 2020 10 21.
Artículo en Inglés | MEDLINE | ID: mdl-33087180

RESUMEN

BACKGROUND: The BRCA1 c.3331_3334delCAAG founder mutation has been reported in hereditary breast and ovarian cancer families from multiple Hispanic groups. We aimed to evaluate BRCA1 c.3331_3334delCAAG haplotype diversity in cases of European, African, and Latin American ancestry. METHODS: BC mutation carrier cases from Colombia (n = 32), Spain (n = 13), Portugal (n = 2), Chile (n = 10), Africa (n = 1), and Brazil (n = 2) were genotyped with the genome-wide single nucleotide polymorphism (SNP) arrays to evaluate haplotype diversity around BRCA1 c.3331_3334delCAAG. Additional Portuguese (n = 13) and Brazilian (n = 18) BC mutation carriers were genotyped for 15 informative SNPs surrounding BRCA1. Data were phased using SHAPEIT2, and identical by descent regions were determined using BEAGLE and GERMLINE. DMLE+ was used to date the mutation in Colombia and Iberia. RESULTS: The haplotype reconstruction revealed a shared 264.4-kb region among carriers from all six countries. The estimated mutation age was ~ 100 generations in Iberia and that it was introduced to South America early during the European colonization period. CONCLUSIONS: Our results suggest that this mutation originated in Iberia and later introduced to Colombia and South America at the time of Spanish colonization during the early 1500s. We also found that the Colombian mutation carriers had higher European ancestry, at the BRCA1 gene harboring chromosome 17, than controls, which further supported the European origin of the mutation. Understanding founder mutations in diverse populations has implications in implementing cost-effective, ancestry-informed screening.


Asunto(s)
Proteína BRCA1/genética , Neoplasias de la Mama/epidemiología , Neoplasias de la Mama/genética , Predisposición Genética a la Enfermedad , Mutación de Línea Germinal , Haplotipos , Polimorfismo de Nucleótido Simple , África/epidemiología , Brasil/epidemiología , Chile/epidemiología , Cromosomas Humanos Par 17/genética , Colombia/epidemiología , Femenino , Efecto Fundador , Estudio de Asociación del Genoma Completo/métodos , Humanos , Portugal/epidemiología , España/epidemiología
18.
Annu Rev Genet ; 46: 617-33, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22994355

RESUMEN

Short segments of identity by descent (IBD) between individuals with no known relationship can be detected using genome-wide single nucleotide polymorphism data and recently developed statistical methodology. Emerging applications for the detected IBD segments include IBD mapping, haplotype phase inference, genotype imputation, and inference of population structure. In this review, we explain the principles behind methods for IBD segment detection, describe recently developed methods, discuss approaches to comparing methods, and give an overview of applications.


Asunto(s)
Mapeo Cromosómico/métodos , Genética de Población/métodos , Linaje , Alelos , Cromosomas Humanos/genética , Biología Computacional/métodos , Simulación por Computador , Frecuencia de los Genes , Enfermedades Genéticas Congénitas/diagnóstico , Enfermedades Genéticas Congénitas/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Haplotipos , Humanos , Patrón de Herencia , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple
19.
PLoS Genet ; 13(4): e1006760, 2017 04.
Artículo en Inglés | MEDLINE | ID: mdl-28453575

RESUMEN

Prior GWAS have identified loci associated with red blood cell (RBC) traits in populations of European, African, and Asian ancestry. These studies have not included individuals with an Amerindian ancestral background, such as Hispanics/Latinos, nor evaluated the full spectrum of genomic variation beyond single nucleotide variants. Using a custom genotyping array enriched for Amerindian ancestral content and 1000 Genomes imputation, we performed GWAS in 12,502 participants of Hispanic Community Health Study and Study of Latinos (HCHS/SOL) for hematocrit, hemoglobin, RBC count, RBC distribution width (RDW), and RBC indices. Approximately 60% of previously reported RBC trait loci generalized to HCHS/SOL Hispanics/Latinos, including African ancestral alpha- and beta-globin gene variants. In addition to the known 3.8kb alpha-globin copy number variant, we identified an Amerindian ancestral association in an alpha-globin regulatory region on chromosome 16p13.3 for mean corpuscular volume and mean corpuscular hemoglobin. We also discovered and replicated three genome-wide significant variants in previously unreported loci for RDW (SLC12A2 rs17764730, PSMB5 rs941718), and hematocrit (PROX1 rs3754140). Among the proxy variants at the SLC12A2 locus we identified rs3812049, located in a bi-directional promoter between SLC12A2 (which encodes a red cell membrane ion-transport protein) and an upstream anti-sense long-noncoding RNA, LINC01184, as the likely causal variant. We further demonstrate that disruption of the regulatory element harboring rs3812049 affects transcription of SLC12A2 and LINC01184 in human erythroid progenitor cells. Together, these results reinforce the importance of genetic study of diverse ancestral populations, in particular Hispanics/Latinos.


Asunto(s)
Proteínas de Homeodominio/genética , Complejo de la Endopetidasa Proteasomal/genética , ARN Largo no Codificante/genética , Miembro 2 de la Familia de Transportadores de Soluto 12/genética , Proteínas Supresoras de Tumor/genética , Globinas alfa/genética , Recuento de Eritrocitos , Eritrocitos , Femenino , Estudio de Asociación del Genoma Completo , Hemoglobinas/genética , Hispánicos o Latinos/genética , Humanos , Masculino , Polimorfismo de Nucleótido Simple , Globinas beta/genética
20.
Hum Mol Genet ; 26(6): 1193-1204, 2017 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-28158719

RESUMEN

Circulating white blood cell (WBC) counts (neutrophils, monocytes, lymphocytes, eosinophils, basophils) differ by ethnicity. The genetic factors underlying basal WBC traits in Hispanics/Latinos are unknown. We performed a genome-wide association study of total WBC and differential counts in a large, ethnically diverse US population sample of Hispanics/Latinos ascertained by the Hispanic Community Health Study and Study of Latinos (HCHS/SOL). We demonstrate that several previously known WBC-associated genetic loci (e.g. the African Duffy antigen receptor for chemokines null variant for neutrophil count) are generalizable to WBC traits in Hispanics/Latinos. We identified and replicated common and rare germ-line variants at FLT3 (a gene often somatically mutated in leukemia) associated with monocyte count. The common FLT3 variant rs76428106 has a large allele frequency differential between African and non-African populations. We also identified several novel genetic loci involving or regulating hematopoietic transcription factors (CEBPE-SLC7A7, CEBPA and CRBN-TRNT1) associated with basophil count. The minor allele of the CEBPE variant associated with lower basophil count has been previously associated with Amerindian ancestry and higher risk of acute lymphoblastic leukemia in Hispanics. Together, these data suggest that germline genetic variation affecting transcriptional and signaling pathways that underlie WBC development and lineage specification can contribute to inter-individual as well as ethnic differences in peripheral blood cell counts (normal hematopoiesis) in addition to susceptibility to leukemia (malignant hematopoiesis).


Asunto(s)
Proteínas Potenciadoras de Unión a CCAAT/genética , Estudio de Asociación del Genoma Completo , Recuento de Leucocitos , Tirosina Quinasa 3 Similar a fms/genética , Negro o Afroamericano/genética , Basófilos/citología , Femenino , Frecuencia de los Genes , Hispánicos o Latinos/genética , Humanos , Linfocitos/citología , Masculino , Monocitos/citología , Neutrófilos/citología , Estados Unidos/epidemiología , Población Blanca/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA