Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 97
Filtrar
1.
Am J Hum Genet ; 111(2): 295-308, 2024 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-38232728

RESUMO

Infectious agents contribute significantly to the global burden of diseases through both acute infection and their chronic sequelae. We leveraged the UK Biobank to identify genetic loci that influence humoral immune response to multiple infections. From 45 genome-wide association studies in 9,611 participants from UK Biobank, we identified NFKB1 as a locus associated with quantitative antibody responses to multiple pathogens, including those from the herpes, retro-, and polyoma-virus families. An insertion-deletion variant thought to affect NFKB1 expression (rs28362491), was mapped as the likely causal variant and could play a key role in regulation of the immune response. Using 121 infection- and inflammation-related traits in 487,297 UK Biobank participants, we show that the deletion allele was associated with an increased risk of infection from diverse pathogens but had a protective effect against allergic disease. We propose that altered expression of NFKB1, as a result of the deletion, modulates hematopoietic pathways and likely impacts cell survival, antibody production, and inflammation. Taken together, we show that disruptions to the tightly regulated immune processes may tip the balance between exacerbated immune responses and allergy, or increased risk of infection and impaired resolution of inflammation.


Assuntos
Predisposição Genética para Doença , Hipersensibilidade , Inflamação , Humanos , Estudo de Associação Genômica Ampla , Hipersensibilidade/genética , Inflamação/genética , Subunidade p50 de NF-kappa B/genética , Biobanco do Reino Unido
2.
PLoS Biol ; 20(5): e3001669, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35639797

RESUMO

The field of population genomics has grown rapidly in response to the recent advent of affordable, large-scale sequencing technologies. As opposed to the situation during the majority of the 20th century, in which the development of theoretical and statistical population genetic insights outpaced the generation of data to which they could be applied, genomic data are now being produced at a far greater rate than they can be meaningfully analyzed and interpreted. With this wealth of data has come a tendency to focus on fitting specific (and often rather idiosyncratic) models to data, at the expense of a careful exploration of the range of possible underlying evolutionary processes. For example, the approach of directly investigating models of adaptive evolution in each newly sequenced population or species often neglects the fact that a thorough characterization of ubiquitous nonadaptive processes is a prerequisite for accurate inference. We here describe the perils of these tendencies, present our consensus views on current best practices in population genomic data analysis, and highlight areas of statistical inference and theory that are in need of further attention. Thereby, we argue for the importance of defining a biologically relevant baseline model tuned to the details of each new analysis, of skepticism and scrutiny in interpreting model fitting results, and of carefully defining addressable hypotheses and underlying uncertainties.


Assuntos
Genômica , Metagenômica , Genômica/métodos
3.
Nature ; 562(7726): 203-209, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30305743

RESUMO

The UK Biobank project is a prospective cohort study with deep genetic and phenotypic data collected on approximately 500,000 individuals from across the United Kingdom, aged between 40 and 69 at recruitment. The open resource is unique in its size and scope. A rich variety of phenotypic and health-related information is available on each participant, including biological measurements, lifestyle indicators, biomarkers in blood and urine, and imaging of the body and brain. Follow-up information is provided by linking health and medical records. Genome-wide genotype data have been collected on all participants, providing many opportunities for the discovery of new genetic associations and the genetic bases of complex traits. Here we describe the centralized analysis of the genetic data, including genotype quality, properties of population structure and relatedness of the genetic data, and efficient phasing and genotype imputation that increases the number of testable variants to around 96 million. Classical allelic variation at 11 human leukocyte antigen genes was imputed, resulting in the recovery of signals with known associations between human leukocyte antigen alleles and many diseases.


Assuntos
Bases de Dados Factuais , Genômica , Fenótipo , Adulto , Idoso , Alelos , Biomarcadores/sangue , Biomarcadores/urina , Estatura/genética , Encéfalo/diagnóstico por imagem , Estudos de Coortes , Bases de Dados Genéticas , Registros Eletrônicos de Saúde , Família , Feminino , Estudo de Associação Genômica Ampla , Haplótipos/genética , Humanos , Estilo de Vida , Complexo Principal de Histocompatibilidade/genética , Masculino , Pessoa de Meia-Idade , Controle de Qualidade , Grupos Raciais/genética , Reino Unido
4.
PLoS Genet ; 17(8): e1009723, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34437535

RESUMO

Inherited genetic variation contributes to individual risk for many complex diseases and is increasingly being used for predictive patient stratification. Previous work has shown that genetic factors are not equally relevant to human traits across age and other contexts, though the reasons for such variation are not clear. Here, we introduce methods to infer the form of the longitudinal relationship between genetic relative risk for disease and age and to test whether all genetic risk factors behave similarly. We use a proportional hazards model within an interval-based censoring methodology to estimate age-varying individual variant contributions to genetic relative risk for 24 common diseases within the British ancestry subset of UK Biobank, applying a Bayesian clustering approach to group variants by their relative risk profile over age and permutation tests for age dependency and multiplicity of profiles. We find evidence for age-varying relative risk profiles in nine diseases, including hypertension, skin cancer, atherosclerotic heart disease, hypothyroidism and calculus of gallbladder, several of which show evidence, albeit weak, for multiple distinct profiles of genetic relative risk. The predominant pattern shows genetic risk factors having the greatest relative impact on risk of early disease, with a monotonic decrease over time, at least for the majority of variants, although the magnitude and form of the decrease varies among diseases. As a consequence, for diseases where genetic relative risk decreases over age, genetic risk factors have stronger explanatory power among younger populations, compared to older ones. We show that these patterns cannot be explained by a simple model involving the presence of unobserved covariates such as environmental factors. We discuss possible models that can explain our observations and the implications for genetic risk prediction.


Assuntos
Fatores Etários , Doença/genética , Teorema de Bayes , Humanos , Modelos Estatísticos , Modelos de Riscos Proporcionais , Fatores de Risco
5.
Genome Res ; 30(8): 1154-1169, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32817236

RESUMO

The characterization of de novo mutations in regions of high sequence and structural diversity from whole-genome sequencing data remains highly challenging. Complex structural variants tend to arise in regions of high repetitiveness and low complexity, challenging both de novo assembly, in which short reads do not capture the long-range context required for resolution, and mapping approaches, in which improper alignment of reads to a reference genome that is highly diverged from that of the sample can lead to false or partial calls. Long-read technologies can potentially solve such problems but are currently unfeasible to use at scale. Here we present Corticall, a graph-based method that combines the advantages of multiple technologies and prior data sources to detect arbitrary classes of genetic variant. We construct multisample, colored de Bruijn graphs from short-read data for all samples, align long-read-derived haplotypes and multiple reference data sources to restore graph connectivity information, and call variants using graph path-finding algorithms and a model for simultaneous alignment and recombination. We validate and evaluate the approach using extensive simulations and use it to characterize the rate and spectrum of de novo mutation events in 119 progeny from four Plasmodium falciparum experimental crosses, using long-read data on the parents to inform reconstructions of the progeny and to detect several known and novel nonallelic homologous recombination events.


Assuntos
Genoma de Protozoário/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mutação/genética , Plasmodium falciparum/genética , Sequenciamento Completo do Genoma/métodos , Algoritmos , Sequência de Bases , Variação Genética/genética , Alinhamento de Sequência , Análise de Sequência de DNA/métodos , Software
6.
PLoS Biol ; 18(1): e3000586, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31951611

RESUMO

The origin and fate of new mutations within species is the fundamental process underlying evolution. However, while much attention has been focused on characterizing the presence, frequency, and phenotypic impact of genetic variation, the evolutionary histories of most variants are largely unexplored. We have developed a nonparametric approach for estimating the date of origin of genetic variants in large-scale sequencing data sets. The accuracy and robustness of the approach is demonstrated through simulation. Using data from two publicly available human genomic diversity resources, we estimated the age of more than 45 million single-nucleotide polymorphisms (SNPs) in the human genome and release the Atlas of Variant Age as a public online database. We characterize the relationship between variant age and frequency in different geographical regions and demonstrate the value of age information in interpreting variants of functional and selective importance. Finally, we use allele age estimates to power a rapid approach for inferring the ancestry shared between individual genomes and to quantify genealogical relationships at different points in the past, as well as to describe and explore the evolutionary history of modern human populations.


Assuntos
Especiação Genética , Genética Populacional/métodos , Polimorfismo de Nucleotídeo Único , Grupos Raciais/genética , Fatores Etários , Alelos , Simulação por Computador , Conjuntos de Dados como Assunto , Evolução Molecular , Frequência do Gene , Variação Genética , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Linhagem , Filogenia , Análise de Sequência de DNA , Estatística como Assunto/métodos , Fatores de Tempo
7.
PLoS Genet ; 16(5): e1008619, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32369493

RESUMO

Coalescent simulations are widely used to examine the effects of evolution and demographic history on the genetic makeup of populations. Thanks to recent progress in algorithms and data structures, simulators such as the widely-used msprime now provide genome-wide simulations for millions of individuals. However, this software relies on classic coalescent theory and its assumptions that sample sizes are small and that the region being simulated is short. Here we show that coalescent simulations of long regions of the genome exhibit large biases in identity-by-descent (IBD), long-range linkage disequilibrium (LD), and ancestry patterns, particularly when the sample size is large. We present a Wright-Fisher extension to msprime, and show that it produces more realistic distributions of IBD, LD, and ancestry proportions, while also addressing more subtle biases of the coalescent. Further, these extensions are more computationally efficient than state-of-the-art coalescent simulations when simulating long regions, including whole-genome data. For shorter regions, efficiency can be maintained via a hybrid model which simulates the recent past under the Wright-Fisher model and uses coalescent simulations in the distant past.


Assuntos
Algoritmos , Sequência de Bases/fisiologia , Genética Populacional/métodos , Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Estudos de Coortes , Simulação por Computador , Evolução Molecular , Genoma/genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Humanos , Desequilíbrio de Ligação , Recombinação Genética/fisiologia , Tamanho da Amostra
8.
PLoS Comput Biol ; 17(8): e1009287, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34411093

RESUMO

There is an abundance of malaria genetic data being collected from the field, yet using these data to understand the drivers of regional epidemiology remains a challenge. A key issue is the lack of models that relate parasite genetic diversity to epidemiological parameters. Classical models in population genetics characterize changes in genetic diversity in relation to demographic parameters, but fail to account for the unique features of the malaria life cycle. In contrast, epidemiological models, such as the Ross-Macdonald model, capture malaria transmission dynamics but do not consider genetics. Here, we have developed an integrated model encompassing both parasite evolution and regional epidemiology. We achieve this by combining the Ross-Macdonald model with an intra-host continuous-time Moran model, thus explicitly representing the evolution of individual parasite genomes in a traditional epidemiological framework. Implemented as a stochastic simulation, we use the model to explore relationships between measures of parasite genetic diversity and parasite prevalence, a widely-used metric of transmission intensity. First, we explore how varying parasite prevalence influences genetic diversity at equilibrium. We find that multiple genetic diversity statistics are correlated with prevalence, but the strength of the relationships depends on whether variation in prevalence is driven by host- or vector-related factors. Next, we assess the responsiveness of a variety of statistics to malaria control interventions, finding that those related to mixed infections respond quickly (∼months) whereas other statistics, such as nucleotide diversity, may take decades to respond. These findings provide insights into the opportunities and challenges associated with using genetic data to monitor malaria epidemiology.


Assuntos
Variação Genética , Malária Falciparum/epidemiologia , Plasmodium falciparum/patogenicidade , Animais , Humanos , Malária Falciparum/parasitologia , Malária Falciparum/transmissão , Modelos Teóricos , Plasmodium falciparum/genética , Prevalência
9.
Nature ; 526(7571): 68-74, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26432245

RESUMO

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.


Assuntos
Variação Genética/genética , Genética Populacional/normas , Genoma Humano/genética , Genômica/normas , Internacionalidade , Conjuntos de Dados como Assunto , Demografia , Suscetibilidade a Doenças , Exoma/genética , Genética Médica , Estudo de Associação Genômica Ampla , Genótipo , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL/genética , Mapeamento Físico do Cromossomo , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Doenças Raras/genética , Padrões de Referência , Análise de Sequência de DNA
10.
Genome Res ; 27(1): 157-164, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27903644

RESUMO

Improvement of variant calling in next-generation sequence data requires a comprehensive, genome-wide catalog of high-confidence variants called in a set of genomes for use as a benchmark. We generated deep, whole-genome sequence data of 17 individuals in a three-generation pedigree and called variants in each genome using a range of currently available algorithms. We used haplotype transmission information to create a phased "Platinum" variant catalog of 4.7 million single-nucleotide variants (SNVs) plus 0.7 million small (1-50 bp) insertions and deletions (indels) that are consistent with the pattern of inheritance in the parents and 11 children of this pedigree. Platinum genotypes are highly concordant with the current catalog of the National Institute of Standards and Technology for both SNVs (>99.99%) and indels (99.92%) and add a validated truth catalog that has 26% more SNVs and 45% more indels. Analysis of 334,652 SNVs that were consistent between informatics pipelines yet inconsistent with haplotype transmission ("nonplatinum") revealed that the majority of these variants are de novo and cell-line mutations or reside within previously unidentified duplications and deletions. The reference materials from this study are a resource for objective assessment of the accuracy of variant calls throughout genomes.


Assuntos
Genoma Humano/genética , Genômica , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Bases de Dados Genéticas , Exoma/genética , Genótipo , Humanos , Mutação INDEL/genética , Linhagem , Polimorfismo de Nucleotídeo Único , Software
11.
Bioinformatics ; 35(21): 4394-4396, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-30942877

RESUMO

SUMMARY: HLA*LA implements a new graph alignment model for human leukocyte antigen (HLA) type inference, based on the projection of linear alignments onto a variation graph. It enables accurate HLA type inference from whole-genome (99% accuracy) and whole-exome (93% accuracy) Illumina data; from long-read Oxford Nanopore and Pacific Biosciences data (98% accuracy for whole-genome and targeted data) and from genome assemblies. Computational requirements for a typical sample vary between 0.7 and 14 CPU hours per sample. AVAILABILITY AND IMPLEMENTATION: HLA*LA is implemented in C++ and Perl and freely available as a bioconda package or from https://github.com/DiltheyLab/HLA-LA (GPL v3). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Software , Genoma , Teste de Histocompatibilidade , Humanos , Análise de Sequência de DNA
12.
J Infect Dis ; 220(11): 1738-1749, 2019 10 22.
Artigo em Inglês | MEDLINE | ID: mdl-30668735

RESUMO

The Horn of Africa harbors the largest reservoir of Plasmodium vivax in the continent. Most of sub-Saharan Africa has remained relatively vivax-free due to a high prevalence of the human Duffy-negative trait, but the emergence of strains able to invade Duffy-negative reticulocytes poses a major public health threat. We undertook the first population genomic investigation of P. vivax from the region, comparing the genomes of 24 Ethiopian isolates against data from Southeast Asia to identify important local adaptions. The prevalence of the Duffy binding protein amplification in Ethiopia was 79%, potentially reflecting adaptation to Duffy negativity. There was also evidence of selection in a region upstream of the chloroquine resistance transporter, a putative chloroquine-resistance determinant. Strong signals of selection were observed in genes involved in immune evasion and regulation of gene expression, highlighting the need for a multifaceted intervention approach to combat P. vivax in the region.


Assuntos
Genótipo , Malária Vivax/parasitologia , Plasmodium vivax/genética , Plasmodium vivax/isolamento & purificação , Seleção Genética , Adaptação Biológica , Adolescente , Animais , Criança , Pré-Escolar , Etiópia , Feminino , Humanos , Lactente , Recém-Nascido , Masculino , Plasmodium vivax/classificação , Prevalência
13.
Hum Mol Genet ; 26(20): 3869-3882, 2017 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-29016847

RESUMO

The discovery of genetic variants influencing sleep patterns can shed light on the physiological processes underlying sleep. As part of a large clinical sequencing project, WGS500, we sequenced a family in which the two male children had severe developmental delay and a dramatically disturbed sleep-wake cycle, with very long wake and sleep durations, reaching up to 106-h awake and 48-h asleep. The most likely causal variant identified was a novel missense variant in the X-linked GRIA3 gene, which has been implicated in intellectual disability. GRIA3 encodes GluA3, a subunit of AMPA-type ionotropic glutamate receptors (AMPARs). The mutation (A653T) falls within the highly conserved transmembrane domain of the ion channel gate, immediately adjacent to the analogous residue in the Grid2 (glutamate receptor) gene, which is mutated in the mouse neurobehavioral mutant, Lurcher. In vitro, the GRIA3(A653T) mutation stabilizes the channel in a closed conformation, in contrast to Lurcher. We introduced the orthologous mutation into a mouse strain by CRISPR-Cas9 mutagenesis and found that hemizygous mutants displayed significant differences in the structure of their activity and sleep compared to wild-type littermates. Typically, mice are polyphasic, exhibiting multiple sleep bouts of sleep several minutes long within a 24-h period. The Gria3A653T mouse showed significantly fewer brief bouts of activity and sleep than the wild-types. Furthermore, Gria3A653T mice showed enhanced period lengthening under constant light compared to wild-type mice, suggesting an increased sensitivity to light. Our results suggest a role for GluA3 channel activity in the regulation of sleep behavior in both mice and humans.


Assuntos
Deficiência Intelectual/genética , Mutação Puntual , Receptores de AMPA/genética , Receptores de AMPA/metabolismo , Transtornos do Sono-Vigília/genética , Adulto , Sequência de Aminoácidos , Animais , Sequência de Bases , Modelos Animais de Doenças , Humanos , Masculino , Camundongos , Camundongos Endogâmicos C57BL
14.
Genome Res ; 26(9): 1288-99, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-27531718

RESUMO

The malaria parasite Plasmodium falciparum has a great capacity for evolutionary adaptation to evade host immunity and develop drug resistance. Current understanding of parasite evolution is impeded by the fact that a large fraction of the genome is either highly repetitive or highly variable and thus difficult to analyze using short-read sequencing technologies. Here, we describe a resource of deep sequencing data on parents and progeny from genetic crosses, which has enabled us to perform the first genome-wide, integrated analysis of SNP, indel and complex polymorphisms, using Mendelian error rates as an indicator of genotypic accuracy. These data reveal that indels are exceptionally abundant, being more common than SNPs and thus the dominant mode of polymorphism within the core genome. We use the high density of SNP and indel markers to analyze patterns of meiotic recombination, confirming a high rate of crossover events and providing the first estimates for the rate of non-crossover events and the length of conversion tracts. We observe several instances of meiotic recombination within copy number variants associated with drug resistance, demonstrating a mechanism whereby fitness costs associated with resistance mutations could be compensated and greater phenotypic plasticity could be acquired.


Assuntos
Resistência a Medicamentos/genética , Variação Genética , Malária Falciparum/genética , Plasmodium falciparum/genética , Mapeamento Cromossômico , Variações do Número de Cópias de DNA/genética , Genoma de Protozoário/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Malária Falciparum/tratamento farmacológico , Malária Falciparum/parasitologia , Meiose/genética , Plasmodium falciparum/efeitos dos fármacos , Plasmodium falciparum/patogenicidade , Polimorfismo de Nucleotídeo Único , Recombinação Genética/genética
15.
Bioinformatics ; 34(1): 9-15, 2018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-28961721

RESUMO

Motivation: The presence of multiple infecting strains of the malarial parasite Plasmodium falciparum affects key phenotypic traits, including drug resistance and risk of severe disease. Advances in protocols and sequencing technology have made it possible to obtain high-coverage genome-wide sequencing data from blood samples and blood spots taken in the field. However, analyzing and interpreting such data is challenging because of the high rate of multiple infections present. Results: We have developed a statistical method and implementation for deconvolving multiple genome sequences present in an individual with mixed infections. The software package DEploid uses haplotype structure within a reference panel of clonal isolates as a prior for haplotypes present in a given sample. It estimates the number of strains, their relative proportions and the haplotypes presented in a sample, allowing researchers to study multiple infection in malaria with an unprecedented level of detail. Availability and implementation: The open source implementation DEploid is freely available at https://github.com/mcveanlab/DEploid under the conditions of the GPLv3 license. An R version is available at https://github.com/mcveanlab/DEploid-r. Contact: joe.zhu@bdi.ox.ac.uk or gil.mcvean@bdi.ox.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Malária Falciparum/genética , Plasmodium falciparum/genética , Análise de Sequência de DNA/métodos , Software , Coinfecção , Haplótipos , Humanos , Plasmodium falciparum/patogenicidade
16.
Bioinformatics ; 34(15): 2556-2565, 2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-29554215

RESUMO

Motivation: The de Bruijn graph is a simple and efficient data structure that is used in many areas of sequence analysis including genome assembly, read error correction and variant calling. The data structure has a single parameter k, is straightforward to implement and is tractable for large genomes with high sequencing depth. It also enables representation of multiple samples simultaneously to facilitate comparison. However, unlike the string graph, a de Bruijn graph does not retain long range information that is inherent in the read data. For this reason, applications that rely on de Bruijn graphs can produce sub-optimal results given their input data. Results: We present a novel assembly graph data structure: the Linked de Bruijn Graph (LdBG). Constructed by adding annotations on top of a de Bruijn graph, it stores long range connectivity information through the graph. We show that with error-free data it is possible to losslessly store and recover sequence from a Linked de Bruijn graph. With assembly simulations we demonstrate that the LdBG data structure outperforms both our de Bruijn graph and the String Graph Assembler (SGA). Finally we apply the LdBG to Klebsiella pneumoniae short read data to make large (12 kbp) variant calls, which we validate using PacBio sequencing data, and to characterize the genomic context of drug-resistance genes. Availability and implementation: Linked de Bruijn Graphs and associated algorithms are implemented as part of McCortex, which is available under the MIT license at https://github.com/mcveanlab/mccortex. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Visualização de Dados , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Humanos , Klebsiella pneumoniae/genética
17.
PLoS Genet ; 12(7): e1006179, 2016 07.
Artigo em Inglês | MEDLINE | ID: mdl-27415776

RESUMO

Meiotic crossover frequency varies extensively along chromosomes and is typically concentrated in hotspots. As recombination increases genetic diversity, hotspots are predicted to occur at immunity genes, where variation may be beneficial. A major component of plant immunity is recognition of pathogen Avirulence (Avr) effectors by resistance (R) genes that encode NBS-LRR domain proteins. Therefore, we sought to test whether NBS-LRR genes would overlap with meiotic crossover hotspots using experimental genetics in Arabidopsis thaliana. NBS-LRR genes tend to physically cluster in plant genomes; for example, in Arabidopsis most are located in large clusters on the south arms of chromosomes 1 and 5. We experimentally mapped 1,439 crossovers within these clusters and observed NBS-LRR gene associated hotspots, which were also detected as historical hotspots via analysis of linkage disequilibrium. However, we also observed NBS-LRR gene coldspots, which in some cases correlate with structural heterozygosity. To study recombination at the fine-scale we used high-throughput sequencing to analyze ~1,000 crossovers within the RESISTANCE TO ALBUGO CANDIDA1 (RAC1) R gene hotspot. This revealed elevated intragenic crossovers, overlapping nucleosome-occupied exons that encode the TIR, NBS and LRR domains. The highest RAC1 recombination frequency was promoter-proximal and overlapped CTT-repeat DNA sequence motifs, which have previously been associated with plant crossover hotspots. Additionally, we show a significant influence of natural genetic variation on NBS-LRR cluster recombination rates, using crosses between Arabidopsis ecotypes. In conclusion, we show that a subset of NBS-LRR genes are strong hotspots, whereas others are coldspots. This reveals a complex recombination landscape in Arabidopsis NBS-LRR genes, which we propose results from varying coevolutionary pressures exerted by host-pathogen relationships, and is influenced by structural heterozygosity.


Assuntos
Arabidopsis/genética , Resistência à Doença/genética , Recombinação Genética , Alelos , Proteínas de Arabidopsis/genética , Cruzamentos Genéticos , Genes de Plantas , Variação Genética , Heterozigoto , Desequilíbrio de Ligação , Meiose , Família Multigênica , Hibridização de Ácido Nucleico , Doenças das Plantas/genética , Pólen/metabolismo
18.
Am J Hum Genet ; 97(4): 593-607, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26430804

RESUMO

Large population studies of immune system genes are essential for characterizing their role in diseases, including autoimmune conditions. Of key interest are a group of genes encoding the killer cell immunoglobulin-like receptors (KIRs), which have known and hypothesized roles in autoimmune diseases, resistance to viruses, reproductive conditions, and cancer. These genes are highly polymorphic, which makes typing expensive and time consuming. Consequently, despite their importance, KIRs have been little studied in large cohorts. Statistical imputation methods developed for other complex loci (e.g., human leukocyte antigen [HLA]) on the basis of SNP data provide an inexpensive high-throughput alternative to direct laboratory typing of these loci and have enabled important findings and insights for many diseases. We present KIR∗IMP, a method for imputation of KIR copy number. We show that KIR∗IMP is highly accurate and thus allows the study of KIRs in large cohorts and enables detailed investigation of the role of KIRs in human disease.


Assuntos
Asma/genética , Variações do Número de Cópias de DNA/genética , Dermatite Atópica/genética , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único/genética , Receptores KIR/classificação , Receptores KIR/genética , Estudos de Casos e Controles , Estudos de Coortes , Europa (Continente) , Família , Feminino , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Análise de Sequência de DNA
19.
PLoS Biol ; 13(7): e1002216, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26225775

RESUMO

The last few decades have utterly transformed genetics and genomics, but what might the next ten years bring? PLOS Biology asked eight leaders spanning a range of related areas to give us their predictions. Without exception, the predictions are for more data on a massive scale and of more diverse types. All are optimistic and predict enormous positive impact on scientific understanding, while a recurring theme is the benefit of such data for the transformation and personalization of medicine. Several also point out that the biggest changes will very likely be those that we don't foresee, even now.


Assuntos
Genômica/tendências , Previsões
20.
Nature ; 491(7422): 56-65, 2012 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-23128226

RESUMO

By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.


Assuntos
Variação Genética/genética , Genética Populacional , Genoma Humano/genética , Genômica , Alelos , Sítios de Ligação/genética , Sequência Conservada/genética , Evolução Molecular , Genética Médica , Estudo de Associação Genômica Ampla , Haplótipos/genética , Humanos , Motivos de Nucleotídeos , Polimorfismo de Nucleotídeo Único/genética , Grupos Raciais/genética , Deleção de Sequência/genética , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA