RESUMEN
The Cycladic, the Minoan, and the Helladic (Mycenaean) cultures define the Bronze Age (BA) of Greece. Urbanism, complex social structures, craft and agricultural specialization, and the earliest forms of writing characterize this iconic period. We sequenced six Early to Middle BA whole genomes, along with 11 mitochondrial genomes, sampled from the three BA cultures of the Aegean Sea. The Early BA (EBA) genomes are homogeneous and derive most of their ancestry from Neolithic Aegeans, contrary to earlier hypotheses that the Neolithic-EBA cultural transition was due to massive population turnover. EBA Aegeans were shaped by relatively small-scale migration from East of the Aegean, as evidenced by the Caucasus-related ancestry also detected in Anatolians. In contrast, Middle BA (MBA) individuals of northern Greece differ from EBA populations in showing â¼50% Pontic-Caspian Steppe-related ancestry, dated at ca. 2,600-2,000 BCE. Such gene flow events during the MBA contributed toward shaping present-day Greek genomes.
Asunto(s)
Civilización/historia , Genoma Humano , Genoma Mitocondrial , Migración Humana/historia , ADN Antiguo , Antigua Grecia , Historia Antigua , HumanosRESUMEN
Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants. This effort yielded hundreds of low frequency (<5%) and rare (<1%) variants with a strong impact on blood cell phenotypes. Our data highlight general properties of the allelic architecture of complex traits, including the proportion of the heritable component of each blood trait explained by the polygenic signal across different genome regulatory domains. Finally, through Mendelian randomization, we provide evidence of shared genetic pathways linking blood cell indices with complex pathologies, including autoimmune diseases, schizophrenia, and coronary heart disease and evidence suggesting previously reported population associations between blood cell indices and cardiovascular disease may be non-causal.
Asunto(s)
Variación Genética , Estudio de Asociación del Genoma Completo , Células Madre Hematopoyéticas/metabolismo , Enfermedades del Sistema Inmune/genética , Alelos , Diferenciación Celular , Predisposición Genética a la Enfermedad , Células Madre Hematopoyéticas/patología , Humanos , Enfermedades del Sistema Inmune/patología , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Población Blanca/genéticaRESUMEN
Chromatin state variation at gene regulatory elements is abundant across individuals, yet we understand little about the genetic basis of this variability. Here, we profiled several histone modifications, the transcription factor (TF) PU.1, RNA polymerase II, and gene expression in lymphoblastoid cell lines from 47 whole-genome sequenced individuals. We observed that distinct cis-regulatory elements exhibit coordinated chromatin variation across individuals in the form of variable chromatin modules (VCMs) at sub-Mb scale. VCMs were associated with thousands of genes and preferentially cluster within chromosomal contact domains. We mapped strong proximal and weak, yet more ubiquitous, distal-acting chromatin quantitative trait loci (cQTL) that frequently explain this variation. cQTLs were associated with molecular activity at clusters of cis-regulatory elements and mapped preferentially within TF-bound regions. We propose that local, sequence-independent chromatin variation emerges as a result of genetic perturbations in cooperative interactions between cis-regulatory elements that are located within the same genomic domain.
Asunto(s)
Cromatina/química , Regulación de la Expresión Génica , Variación Genética , Genoma Humano , Cromatina/metabolismo , Cromosomas Humanos/química , Genética de Población , Humanos , Sitios de Carácter Cuantitativo , Secuencias Reguladoras de Ácidos Nucleicos , Factores de Transcripción/metabolismoRESUMEN
Rapa Nui (also known as Easter Island) is one of the most isolated inhabited places in the world. It has captured the imagination of many owing to its archaeological record, which includes iconic megalithic statues called moai1. Two prominent contentions have arisen from the extensive study of Rapa Nui. First, the history of the Rapanui has been presented as a warning tale of resource overexploitation that would have culminated in a major population collapse-the 'ecocide' theory2-4. Second, the possibility of trans-Pacific voyages to the Americas pre-dating European contact is still debated5-7. Here, to address these questions, we reconstructed the genomic history of the Rapanui on the basis of 15 ancient Rapanui individuals that we radiocarbon dated (1670-1950 CE) and whole-genome sequenced (0.4-25.6×). We find that these individuals are Polynesian in origin and most closely related to present-day Rapanui, a finding that will contribute to repatriation efforts. Through effective population size reconstructions and extensive population genetics simulations, we reject a scenario involving a severe population bottleneck during the 1600s, as proposed by the ecocide theory. Furthermore, the ancient and present-day Rapanui carry similar proportions of Native American admixture (about 10%). Using a Bayesian approach integrating genetic and radiocarbon dates, we estimate that this admixture event occurred about 1250-1430 CE.
Asunto(s)
Indio Americano o Nativo de Alaska , ADN Antiguo , Pueblo Europeo , Genética de Población , Genoma Humano , Migración Humana , Nativos de Hawái y Otras Islas del Pacífico , Femenino , Humanos , Masculino , Indio Americano o Nativo de Alaska/genética , Indio Americano o Nativo de Alaska/historia , Américas/etnología , Teorema de Bayes , ADN Antiguo/análisis , Europa (Continente)/etnología , Pueblo Europeo/genética , Pueblo Europeo/historia , Genoma Humano/genética , Historia del Siglo XVII , Historia del Siglo XVIII , Historia del Siglo XIX , Historia del Siglo XX , Historia Antigua , Historia Medieval , Migración Humana/historia , Nativos de Hawái y Otras Islas del Pacífico/genética , Nativos de Hawái y Otras Islas del Pacífico/historia , Filogenia , Polinesia/etnología , Densidad de Población , Datación Radiométrica , Secuenciación Completa del GenomaRESUMEN
Haplotype estimation, or phasing, has gained significant traction in large-scale projects due to its valuable contributions to population genetics, variant analysis, and the creation of reference panels for imputation and phasing of new samples. To scale with the growing number of samples, haplotype estimation methods designed for population scale rely on highly optimized statistical models to phase genotype data, and usually ignore read-level information. Statistical methods excel in resolving common variants, however, they still struggle at rare variants due to the lack of statistical information. In this study we introduce SAPPHIRE, a new method that leverages whole-genome sequencing data to enhance the precision of haplotype calls produced by statistical phasing. SAPPHIRE achieves this by refining haplotype estimates through the realignment of sequencing reads, particularly targeting low-confidence phase calls. Our findings demonstrate that SAPPHIRE significantly enhances the accuracy of haplotypes obtained from state of the art methods and also provides the subset of phase calls that are validated by sequencing reads. Finally, we show that our method scales to large data sets by its successful application to the extensive 3.6 Petabytes of sequencing data of the last UK Biobank 200,031 sample release.
Asunto(s)
Genética de Población , Haplotipos , Secuenciación Completa del Genoma , Secuenciación Completa del Genoma/métodos , Humanos , Genética de Población/métodos , Genoma Humano , Polimorfismo de Nucleótido Simple/genética , Estudio de Asociación del Genoma Completo/métodos , AlgoritmosRESUMEN
SUMMARY: We introduce mapache, a flexible, robust and scalable pipeline to map, quantify and impute ancient and present-day DNA in a reproducible way. Mapache is implemented in the workflow manager Snakemake and is optimized for low-space consumption, allowing to efficiently (re)map large datasets-such as reference panels and multiple extracts and libraries per sample - to one or several genomes. Mapache can easily be customized or combined with other Snakemake tools. AVAILABILITY AND IMPLEMENTATION: Mapache is freely available on GitHub (https://github.com/sneuensc/mapache). An extensive manual is provided at https://github.com/sneuensc/mapache/wiki. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
ADN Antiguo , Programas Informáticos , Genoma , Flujo de TrabajoRESUMEN
The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.
Asunto(s)
Bases de Datos Factuales , Genómica , Fenotipo , Adulto , Anciano , Alelos , Biomarcadores/sangre , Biomarcadores/orina , Estatura/genética , Encéfalo/diagnóstico por imagen , Estudios de Cohortes , Bases de Datos Genéticas , Registros Electrónicos de Salud , Familia , Femenino , Estudio de Asociación del Genoma Completo , Haplotipos/genética , Humanos , Estilo de Vida , Complejo Mayor de Histocompatibilidad/genética , Masculino , Persona de Mediana Edad , Control de Calidad , Grupos Raciales/genética , Reino UnidoRESUMEN
The combined analysis of haplotype panels with phenotype clinical cohorts is a common approach to explore the genetic architecture of human diseases. However, genetic studies are mainly based on single nucleotide variants (SNVs) and small insertions and deletions (indels). Here, we contribute to fill this gap by generating a dense haplotype map focused on the identification, characterization, and phasing of structural variants (SVs). By integrating multiple variant identification methods and Logistic Regression Models (LRMs), we present a catalogue of 35 431 441 variants, including 89 178 SVs (≥50 bp), 30 325 064 SNVs and 5 017 199 indels, across 785 Illumina high coverage (30x) whole-genomes from the Iberian GCAT Cohort, containing a median of 3.52M SNVs, 606 336 indels and 6393 SVs per individual. The haplotype panel is able to impute up to 14 360 728 SNVs/indels and 23 179 SVs, showing a 2.7-fold increase for SVs compared with available genetic variation panels. The value of this panel for SVs analysis is shown through an imputed rare Alu element located in a new locus associated with Mononeuritis of lower limb, a rare neuromuscular disease. This study represents the first deep characterization of genetic variation within the Iberian population and the first operational haplotype panel to systematically include the SVs into genome-wide genetic studies.
Asunto(s)
Genoma Humano , Haplotipos , Mutación INDEL , Aciltransferasas , Europa (Continente) , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Lipasa , Polimorfismo de Nucleótido Simple , Secuenciación Completa del Genoma/métodosRESUMEN
MOTIVATION: Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses. RESULTS: We show that xSqueezeIt (XSI) allows for a file size reduction of 4-20× compared with compressed BCF and demonstrate its potential for 'compressive genomics' on the UK Biobank whole-genome sequencing genotypes with 8× faster loading times, 5× faster run of homozygozity computation, 30× faster dot products computation and 280× faster allele counts. AVAILABILITY AND IMPLEMENTATION: The XSI file format specifications, API and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeIt. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Compresión de Datos , Programas Informáticos , Bancos de Muestras Biológicas , Genómica , GenotipoRESUMEN
Genotype imputation is the process of predicting unobserved genotypes in a sample of individuals using a reference panel of haplotypes. In the last 10 years reference panels have increased in size by more than 100 fold. Increasing reference panel size improves accuracy of markers with low minor allele frequencies but poses ever increasing computational challenges for imputation methods. Here we present IMPUTE5, a genotype imputation method that can scale to reference panels with millions of samples. This method continues to refine the observation made in the IMPUTE2 method, that accuracy is optimized via use of a custom subset of haplotypes when imputing each individual. It achieves fast, accurate, and memory-efficient imputation by selecting haplotypes using the Positional Burrows Wheeler Transform (PBWT). By using the PBWT data structure at genotyped markers, IMPUTE5 identifies locally best matching haplotypes and long identical by state segments. The method then uses the selected haplotypes as conditioning states within the IMPUTE model. Using the HRC reference panel, which has â¼65,000 haplotypes, we show that IMPUTE5 is up to 30x faster than MINIMAC4 and up to 3x faster than BEAGLE5.1, and uses less memory than both these methods. Using simulated reference panels we show that IMPUTE5 scales sub-linearly with reference panel size. For example, keeping the number of imputed markers constant, increasing the reference panel size from 10,000 to 1 million haplotypes requires less than twice the computation time. As the reference panel increases in size IMPUTE5 is able to utilize a smaller number of reference haplotypes, thus reducing computational cost.
Asunto(s)
Biología Computacional/métodos , Estudio de Asociación del Genoma Completo/métodos , Haplotipos/genética , Alelos , Predicción/métodos , Frecuencia de los Genes/genética , Genotipo , Humanos , Modelos Teóricos , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
The HLA (Human Leukocyte Antigens) genes are well-documented targets of balancing selection, and variation at these loci is associated with many disease phenotypes. Variation in expression levels also influences disease susceptibility and resistance, but little information exists about the regulation and population-level patterns of expression. This results from the difficulty in mapping short reads originated from these highly polymorphic loci, and in accounting for the existence of several paralogues. We developed a computational pipeline to accurately estimate expression for HLA genes based on RNA-seq, improving both locus-level and allele-level estimates. First, reads are aligned to all known HLA sequences in order to infer HLA genotypes, then quantification of expression is carried out using a personalized index. We use simulations to show that expression estimates obtained in this way are not biased due to divergence from the reference genome. We applied our pipeline to the GEUVADIS dataset, and compared the quantifications to those obtained with reference transcriptome. Although the personalized pipeline recovers more reads, we found that using the reference transcriptome produces estimates similar to the personalized pipeline (r ≥ 0.87) with the exception of HLA-DQA1. We describe the impact of the HLA-personalized approach on downstream analyses for nine classical HLA loci (HLA-A, HLA-C, HLA-B, HLA-DRA, HLA-DRB1, HLA-DQA1, HLA-DQB1, HLA-DPA1, HLA-DPB1). Although the influence of the HLA-personalized approach is modest for eQTL mapping, the p-values and the causality of the eQTLs obtained are better than when the reference transcriptome is used. We investigate how the eQTLs we identified explain variation in expression among lineages of HLA alleles. Finally, we discuss possible causes underlying differences between expression estimates obtained using RNA-seq, antibody-based approaches and qPCR.
Asunto(s)
Mapeo Cromosómico , Expresión Génica , Antígenos HLA/genética , Sitios de Carácter Cuantitativo , Alelos , Biología Computacional/métodos , Frecuencia de los Genes , Genotipo , Haplotipos , Humanos , TranscriptomaRESUMEN
The study of gene expression in mammalian single cells via genomic technologies now provides the possibility to investigate the patterns of allelic gene expression. We used single-cell RNA sequencing to detect the allele-specific mRNA level in 203 single human primary fibroblasts over 133,633 unique heterozygous single-nucleotide variants (hetSNVs). We observed that at the snapshot of analyses, each cell contained mostly transcripts from one allele from the majority of genes; indeed, 76.4% of the hetSNVs displayed stochastic monoallelic expression in single cells. Remarkably, adjacent hetSNVs exhibited a haplotype-consistent allelic ratio; in contrast, distant sites located in two different genes were independent of the haplotype structure. Moreover, the allele-specific expression in single cells correlated with the abundance of the cellular transcript. We observed that genes expressing both alleles in the majority of the single cells at a given time point were rare and enriched with highly expressed genes. The relative abundance of each allele in a cell was controlled by some regulatory mechanisms given that we observed related single-cell allelic profiles according to genes. Overall, these results have direct implications in cellular phenotypic variability.
Asunto(s)
Alelos , Fibroblastos/citología , Genoma Humano , Análisis de Secuencia de ARN , ADN Complementario/genética , ADN Complementario/metabolismo , Haplotipos , Heterocigoto , Humanos , Fenotipo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Análisis de la Célula IndividualRESUMEN
MOTIVATION: Large genomic datasets combining genotype and sequence data, such as for expression quantitative trait loci (eQTL) detection, require perfect matching between both data types. RESULTS: We described here MBV (Match BAM to VCF); a method to quickly solve sample mislabeling and detect cross-sample contamination and PCR amplification bias. AVAILABILITY AND IMPLEMENTATION: MBV is implemented in C ++ as an independent component of the QTLtools software package, the binary and source codes are freely available at https://qtltools.github.io/qtltools/ . CONTACT: olivier.delaneau@unige.ch or emmanouil.dermitzakis@unige.ch. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Técnicas de Genotipaje/métodos , Sitios de Carácter Cuantitativo , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Sesgo , Genómica/métodos , Genómica/normas , Técnicas de Genotipaje/normas , Humanos , Análisis de Secuencia de ADN/normasRESUMEN
MOTIVATION: There is growing recognition that estimating haplotypes from high coverage sequencing of single samples in clinical settings is an important problem. At the same time very large datasets consisting of tens and hundreds of thousands of high-coverage sequenced samples will soon be available. We describe a method that takes advantage of these huge human genetic variation resources and rare variant sharing patterns to estimate haplotypes on single sequenced samples. Sharing rare variants between two individuals is more likely to arise from a recent common ancestor and, hence, also more likely to indicate similar shared haplotypes over a substantial flanking region of sequence. RESULTS: Our method exploits this idea to select a small set of highly informative copying states within a Hidden Markov Model (HMM) phasing algorithm. Using rare variants in this way allows us to avoid iterative MCMC methods to infer haplotypes. Compared to other approaches that do not explicitly use rare variants we obtain significant gains in phasing accuracy, less variation over phasing runs and improvements in speed. For example, using a reference panel of 7420 haplotypes from the UK10K project, we are able to reduce switch error rates by up to 50% when phasing samples sequenced at high-coverage. In addition, a single step rephasing of the UK10K panel, using rare variant information, has a downstream impact on phasing performance. These results represent a proof of concept that rare variant sharing patterns can be utilized to phase large high-coverage sequencing studies such as the 100 000 Genomes Project dataset. AVAILABILITY AND IMPLEMENTATION: A webserver that includes an implementation of this new method and allows phasing of high-coverage clinical samples is available at https://phasingserver.stats.ox.ac.uk/ CONTACT: marchini@stats.ox.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Biología Computacional/métodos , Variación Genética , Haplotipos , Algoritmos , Alelos , Genotipo , HumanosRESUMEN
MOTIVATION: In order to discover quantitative trait loci, multi-dimensional genomic datasets combining DNA-seq and ChiP-/RNA-seq require methods that rapidly correlate tens of thousands of molecular phenotypes with millions of genetic variants while appropriately controlling for multiple testing. RESULTS: We have developed FastQTL, a method that implements a popular cis-QTL mapping strategy in a user- and cluster-friendly tool. FastQTL also proposes an efficient permutation procedure to control for multiple testing. The outcome of permutations is modeled using beta distributions trained from a few permutations and from which adjusted P-values can be estimated at any level of significance with little computational cost. The Geuvadis & GTEx pilot datasets can be now easily analyzed an order of magnitude faster than previous approaches. AVAILABILITY AND IMPLEMENTATION: Source code, binaries and comprehensive documentation of FastQTL are freely available to download at http://fastqtl.sourceforge.net/ CONTACT: emmanouil.dermitzakis@unige.ch or olivier.delaneau@unige.ch SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Sitios de Carácter Cuantitativo , Genómica , Fenotipo , Programas Informáticos , Distribuciones EstadísticasRESUMEN
Many existing cohorts contain a range of relatedness between genotyped individuals, either by design or by chance. Haplotype estimation in such cohorts is a central step in many downstream analyses. Using genotypes from six cohorts from isolated populations and two cohorts from non-isolated populations, we have investigated the performance of different phasing methods designed for nominally 'unrelated' individuals. We find that SHAPEIT2 produces much lower switch error rates in all cohorts compared to other methods, including those designed specifically for isolated populations. In particular, when large amounts of IBD sharing is present, SHAPEIT2 infers close to perfect haplotypes. Based on these results we have developed a general strategy for phasing cohorts with any level of implicit or explicit relatedness between individuals. First SHAPEIT2 is run ignoring all explicit family information. We then apply a novel HMM method (duoHMM) to combine the SHAPEIT2 haplotypes with any family information to infer the inheritance pattern of each meiosis at all sites across each chromosome. This allows the correction of switch errors, detection of recombination events and genotyping errors. We show that the method detects numbers of recombination events that align very well with expectations based on genetic maps, and that it infers far fewer spurious recombination events than Merlin. The method can also detect genotyping errors and infer recombination events in otherwise uninformative families, such as trios and duos. The detected recombination events can be used in association scans for recombination phenotypes. The method provides a simple and unified approach to haplotype estimation, that will be of interest to researchers in the fields of human, animal and plant genetics.
Asunto(s)
Haplotipos/genética , Mapeo Cromosómico/métodos , Efecto de Cohortes , Familia , Genotipo , Humanos , Modelos Genéticos , Linaje , Fenotipo , Recombinación Genética/genéticaRESUMEN
High-throughput sequencing technologies produce short sequence reads that can contain phase information if they span two or more heterozygote genotypes. This information is not routinely used by current methods that infer haplotypes from genotype data. We have extended the SHAPEIT2 method to use phase-informative sequencing reads to improve phasing accuracy. Our model incorporates the read information in a probabilistic model through base quality scores within each read. The method is primarily designed for high-coverage sequence data or data sets that already have genotypes called. One important application is phasing of single samples sequenced at high coverage for use in medical sequencing and studies of rare diseases. Our method can also use existing panels of reference haplotypes. We tested the method by using a mother-father-child trio sequenced at high-coverage by Illumina together with the low-coverage sequence data from the 1000 Genomes Project (1000GP). We found that use of phase-informative reads increases the mean distance between switch errors by 22% from 274.4 kb to 328.6 kb. We also used male chromosome X haplotypes from the 1000GP samples to simulate sequencing reads with varying insert size, read length, and base error rate. When using short 100 bp paired-end reads, we found that using mixtures of insert sizes produced the best results. When using longer reads with high error rates (5-20 kb read with 4%-15% error per base), phasing performance was substantially improved.
Asunto(s)
Genoma Humano , Haplotipos/genética , Análisis de Secuencia de ADN/métodos , Niño , Padre , Femenino , Genotipo , Humanos , Masculino , Modelos Genéticos , Madres , Polimorfismo de Nucleótido SimpleRESUMEN
Multiple genome-wide association studies (GWAS) have been performed in HIV-1 infected individuals, identifying common genetic influences on viral control and disease course. Similarly, common genetic correlates of acquisition of HIV-1 after exposure have been interrogated using GWAS, although in generally small samples. Under the auspices of the International Collaboration for the Genomics of HIV, we have combined the genome-wide single nucleotide polymorphism (SNP) data collected by 25 cohorts, studies, or institutions on HIV-1 infected individuals and compared them to carefully matched population-level data sets (a list of all collaborators appears in Note S1 in Text S1). After imputation using the 1,000 Genomes Project reference panel, we tested approximately 8 million common DNA variants (SNPs and indels) for association with HIV-1 acquisition in 6,334 infected patients and 7,247 population samples of European ancestry. Initial association testing identified the SNP rs4418214, the C allele of which is known to tag the HLA-B*57:01 and B*27:05 alleles, as genome-wide significant (p = 3.6 × 10⻹¹). However, restricting analysis to individuals with a known date of seroconversion suggested that this association was due to the frailty bias in studies of lethal diseases. Further analyses including testing recessive genetic models, testing for bulk effects of non-genome-wide significant variants, stratifying by sexual or parenteral transmission risk and testing previously reported associations showed no evidence for genetic influence on HIV-1 acquisition (with the exception of CCR5Δ32 homozygosity). Thus, these data suggest that genetic influences on HIV acquisition are either rare or have smaller effects than can be detected by this sample size.
Asunto(s)
Infecciones por VIH/genética , VIH-1/fisiología , Interacciones Huésped-Patógeno , Polimorfismo de Nucleótido Simple , Estudios de Casos y Controles , Estudios de Cohortes , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Infecciones por VIH/virología , Humanos , Población BlancaRESUMEN
Past genome-wide association studies (GWAS) involving individuals with AIDS have mainly identified associations in the HLA region. Using the latest software, we imputed 7 million single-nucleotide polymorphisms (SNPs)/indels of the 1000 Genomes Project from the GWAS-determined genotypes of individuals in the Genomics of Resistance to Immunodeficiency Virus AIDS nonprogression cohort and compared them with those of control cohorts. The strongest signals were in MICA, the gene encoding major histocompatibility class I polypeptide-related sequence A (P = 3.31 × 10(-12)), with a particular exonic deletion (P = 1.59 × 10(-8)) in full linkage disequilibrium with the reference HCP5 rs2395029 SNP. Haplotype analysis also revealed an additive effect between HLA-C, HLA-B, and MICA variants. These data suggest a role for MICA in progression and elite control of human immunodeficiency virus type 1 infection.
Asunto(s)
Resistencia a la Enfermedad , Infecciones por VIH/inmunología , VIH-1/inmunología , Antígenos de Histocompatibilidad Clase I/genética , Adulto , Estudios de Cohortes , Femenino , Estudios de Asociación Genética , Infecciones por VIH/virología , Haplotipos , Humanos , Desequilibrio de Ligamiento , Complejo Mayor de Histocompatibilidad/genética , Masculino , Persona de Mediana Edad , Polimorfismo de Nucleótido Simple , ARN Largo no Codificante , ARN no Traducido , Adulto JovenRESUMEN
Human-disease etiology can be better understood with phase information about diploid sequences. We present a method for estimating haplotypes, using genotype data from unrelated samples or small nuclear families, that leads to improved accuracy and speed compared to several widely used methods. The method, segmented haplotype estimation and imputation tool (SHAPEIT), scales linearly with the number of haplotypes used in each iteration and can be run efficiently on whole chromosomes.