RESUMO
Although several hundred regions of the human genome harbor signals of positive natural selection, few of the relevant adaptive traits and variants have been elucidated. Using full-genome sequence variation from the 1000 Genomes (1000G) Project and the composite of multiple signals (CMS) test, we investigated 412 candidate signals and leveraged functional annotation, protein structure modeling, epigenetics, and association studies to identify and extensively annotate candidate causal variants. The resulting catalog provides a tractable list for experimental follow-up; it includes 35 high-scoring nonsynonymous variants, 59 variants associated with expression levels of a nearby coding gene or lincRNA, and numerous variants associated with susceptibility to infectious disease and other phenotypes. We experimentally characterized one candidate nonsynonymous variant in Toll-like receptor 5 (TLR5) and show that it leads to altered NF-κB signaling in response to bacterial flagellin. PAPERFLICK:
Assuntos
Técnicas Genéticas , Genoma Humano , Estudo de Associação Genômica Ampla , Mutação , Animais , Bactérias/metabolismo , Flagelina/metabolismo , Projeto HapMap , Humanos , NF-kappa B/metabolismo , Locos de Características Quantitativas , Elementos Reguladores de Transcrição , Transdução de Sinais , Receptor 5 Toll-Like/genética , Receptor 5 Toll-Like/metabolismoRESUMO
MOTIVATION: Efficient simulation of population genetic samples under a given demographic model is a prerequisite for many analyses. Coalescent theory provides an efficient framework for such simulations, but simulating longer regions and higher recombination rates remains challenging. Simulators based on a Markovian approximation to the coalescent scale well, but do not support simulation of selection. Gene conversion is not supported by any published coalescent simulators that support selection. RESULTS: We describe cosi2, an efficient simulator that supports both exact and approximate coalescent simulation with positive selection. cosi2 improves on the speed of existing exact simulators, and permits further speedup in approximate mode while retaining support for selection. cosi2 supports a wide range of demographic scenarios, including recombination hot spots, gene conversion, population size changes, population structure and migration. cosi2 implements coalescent machinery efficiently by tracking only a small subset of the Ancestral Recombination Graph, sampling only relevant recombination events, and using augmented skip lists to represent tracked genetic segments. To preserve support for selection in approximate mode, the Markov approximation is implemented not by moving along the chromosome but by performing a standard backwards-in-time coalescent simulation while restricting coalescence to node pairs with overlapping or near-overlapping genetic material. We describe the algorithms used by cosi2 and present comparisons with existing selection simulators. AVAILABILITY AND IMPLEMENTATION: A free C++ implementation of cosi2 is available at http://broadinstitute.org/mpg/cosi2.
Assuntos
Algoritmos , Genética Populacional/métodos , Seleção Genética , Cromossomos , Demografia , Conversão Gênica , Cadeias de Markov , Modelos Genéticos , Densidade Demográfica , Recombinação GenéticaRESUMO
The Plasmodium falciparum parasite's ability to adapt to environmental pressures, such as the human immune system and antimalarial drugs, makes malaria an enduring burden to public health. Understanding the genetic basis of these adaptations is critical to intervening successfully against malaria. To that end, we created a high-density genotyping array that assays over 17,000 single nucleotide polymorphisms (â¼ 1 SNP/kb), and applied it to 57 culture-adapted parasites from three continents. We characterized genome-wide genetic diversity within and between populations and identified numerous loci with signals of natural selection, suggesting their role in recent adaptation. In addition, we performed a genome-wide association study (GWAS), searching for loci correlated with resistance to thirteen antimalarials; we detected both known and novel resistance loci, including a new halofantrine resistance locus, PF10_0355. Through functional testing we demonstrated that PF10_0355 overexpression decreases sensitivity to halofantrine, mefloquine, and lumefantrine, but not to structurally unrelated antimalarials, and that increased gene copy number mediates resistance. Our GWAS and follow-on functional validation demonstrate the potential of genome-wide studies to elucidate functionally important loci in the malaria parasite genome.
Assuntos
Antimaláricos/farmacologia , Resistência a Medicamentos/genética , Loci Gênicos , Plasmodium falciparum/genética , Etanolaminas/farmacologia , Fluorenos/farmacologia , Dosagem de Genes , Expressão Gênica , Estudos de Associação Genética , Variação Genética , Genótipo , Haplótipos , Desequilíbrio de Ligação , Lumefantrina , Malária Falciparum/parasitologia , Malária Falciparum/prevenção & controle , Mefloquina/farmacologia , Fenantrenos/farmacologia , Plasmodium falciparum/efeitos dos fármacos , Polimorfismo de Nucleotídeo Único , Seleção GenéticaRESUMO
As an ancient disease with high fatality, cholera has likely exerted strong selective pressure on affected human populations. We performed a genome-wide study of natural selection in a population from the Ganges River Delta, the historic geographic epicenter of cholera. We identified 305 candidate selected regions using the composite of multiple signals (CMS) method. The regions were enriched for potassium channel genes involved in cyclic adenosine monophosphate-mediated chloride secretion and for components of the innate immune system involved in nuclear factor κB (NF-κB) signaling. We demonstrate that a number of these strongly selected genes are associated with cholera susceptibility in two separate cohorts. We further identify repeated examples of selection and association in an NF-κB/inflammasome-dependent pathway that is activated in vitro by Vibrio cholerae. Our findings shed light on the genetic basis of cholera resistance in a population from the Ganges River Delta and present a promising approach for identifying genetic factors influencing susceptibility to infectious diseases.
Assuntos
Cólera/genética , Predisposição Genética para Doença/genética , Humanos , Inflamassomos/metabolismo , NF-kappa B/genética , Rios , Seleção Genética/genética , Vibrio cholerae/patogenicidadeRESUMO
The human genome contains hundreds of regions whose patterns of genetic variation indicate recent positive natural selection, yet for most the underlying gene and the advantageous mutation remain unknown. We developed a method, composite of multiple signals (CMS), that combines tests for multiple signals of selection and increases resolution by up to 100-fold. By applying CMS to candidate regions from the International Haplotype Map, we localized population-specific selective signals to 55 kilobases (median), identifying known and novel causal variants. CMS can not just identify individual loci but implicates precise variants selected by evolution.
Assuntos
Variação Genética , Genoma Humano , Seleção Genética , Biologia Computacional/métodos , DNA Intergênico/genética , Evolução Molecular , Loci Gênicos , Haplótipos , Humanos , Polimorfismo Genético , Grupos Populacionais/genética , Sequências Reguladoras de Ácido Nucleico/genética , SoftwareRESUMO
We demonstrate that genome sequences approaching finished quality can be generated from short paired reads. Using 36 base (fragment) and 26 base (jumping) reads from five microbial genomes of varied GC composition and sizes up to 40 Mb, ALLPATHS2 generated assemblies with long, accurate contigs and scaffolds. Velvet and EULER-SR were less accurate. For example, for Escherichia coli, the fraction of 10-kb stretches that were perfect was 99.8% (ALLPATHS2), 68.7% (Velvet), and 42.1% (EULER-SR).
Assuntos
Bactérias/genética , Fungos/genética , Genoma/genética , Genômica/métodos , Software , Pareamento de Bases/genética , Reprodutibilidade dos TestesRESUMO
New DNA sequencing technologies deliver data at dramatically lower costs but demand new analytical methods to take full advantage of the very short reads that they produce. We provide an initial, theoretical solution to the challenge of de novo assembly from whole-genome shotgun "microreads." For 11 genomes of sizes up to 39 Mb, we generated high-quality assemblies from 80x coverage by paired 30-base simulated reads modeled after real Illumina-Solexa reads. The bacterial genomes of Campylobacter jejuni and Escherichia coli assemble optimally, yielding single perfect contigs, and larger genomes yield assemblies that are highly connected and accurate. Assemblies are presented in a graph form that retains intrinsic ambiguities such as those arising from polymorphism, thereby providing information that has been absent from previous genome assemblies. For both C. jejuni and E. coli, this assembly graph is a single edge encompassing the entire genome. Larger genomes produce more complicated graphs, but the vast majority of the bases in their assemblies are present in long edges that are nearly always perfect. We describe a general method for genome assembly that can be applied to all types of DNA sequence data, not only short read data, but also conventional sequence reads.