Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 38
Filter
1.
Nature ; 620(7976): 1080-1088, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37612508

ABSTRACT

Chromosomal instability (CIN) is a driver of cancer metastasis1-4, yet the extent to which this effect depends on the immune system remains unknown. Using ContactTracing-a newly developed, validated and benchmarked tool to infer the nature and conditional dependence of cell-cell interactions from single-cell transcriptomic data-we show that CIN-induced chronic activation of the cGAS-STING pathway promotes downstream signal re-wiring in cancer cells, leading to a pro-metastatic tumour microenvironment. This re-wiring is manifested by type I interferon tachyphylaxis selectively downstream of STING and a corresponding increase in cancer cell-derived endoplasmic reticulum (ER) stress response. Reversal of CIN, depletion of cancer cell STING or inhibition of ER stress response signalling abrogates CIN-dependent effects on the tumour microenvironment and suppresses metastasis in immune competent, but not severely immune compromised, settings. Treatment with STING inhibitors reduces CIN-driven metastasis in melanoma, breast and colorectal cancers in a manner dependent on tumour cell-intrinsic STING. Finally, we show that CIN and pervasive cGAS activation in micronuclei are associated with ER stress signalling, immune suppression and metastasis in human triple-negative breast cancer, highlighting a viable strategy to identify and therapeutically intervene in tumours spurred by CIN-induced inflammation.


Subject(s)
Chromosomal Instability , Disease Progression , Neoplasms , Humans , Benchmarking , Cell Communication , Colorectal Neoplasms/drug therapy , Colorectal Neoplasms/genetics , Colorectal Neoplasms/immunology , Colorectal Neoplasms/pathology , Melanoma/drug therapy , Melanoma/genetics , Melanoma/immunology , Melanoma/pathology , Tumor Microenvironment , Interferon Type I/immunology , Neoplasm Metastasis , Endoplasmic Reticulum Stress , Signal Transduction , Triple Negative Breast Neoplasms/drug therapy , Triple Negative Breast Neoplasms/genetics , Triple Negative Breast Neoplasms/immunology , Triple Negative Breast Neoplasms/pathology , Neoplasms/genetics , Neoplasms/immunology , Neoplasms/pathology
2.
Proc Natl Acad Sci U S A ; 120(10): e2213896120, 2023 03 07.
Article in English | MEDLINE | ID: mdl-36848554

ABSTRACT

DNA is replicated according to a defined spatiotemporal program that is linked to both gene regulation and genome stability. The evolutionary forces that have shaped replication timing programs in eukaryotic species are largely unknown. Here, we studied the molecular causes and consequences of replication timing evolution across 94 humans, 95 chimpanzees, and 23 rhesus macaques. Replication timing differences recapitulated the species' phylogenetic tree, suggesting continuous evolution of the DNA replication timing program in primates. Hundreds of genomic regions had significant replication timing variation between humans and chimpanzees, of which 66 showed advances in replication origin firing in humans, while 57 were delayed. Genes overlapping these regions displayed correlated changes in expression levels and chromatin structure. Many human-chimpanzee variants also exhibited interindividual replication timing variation, pointing to ongoing evolution of replication timing at these loci. Association of replication timing variation with genetic variation revealed that DNA sequence evolution can explain replication timing variation between species. Taken together, DNA replication timing shows substantial and ongoing evolution in the human lineage that is driven by sequence alterations and could impact regulatory evolution at specific genomic sites.


Subject(s)
DNA Replication Timing , Pan troglodytes , Animals , Humans , Pan troglodytes/genetics , DNA Replication Timing/genetics , Macaca mulatta/genetics , Phylogeny , Eukaryota
3.
PLoS Genet ; 16(8): e1008895, 2020 08.
Article in English | MEDLINE | ID: mdl-32760067

ABSTRACT

The sequencing of Neanderthal and Denisovan genomes has yielded many new insights about interbreeding events between extinct hominins and the ancestors of modern humans. While much attention has been paid to the relatively recent gene flow from Neanderthals and Denisovans into modern humans, other instances of introgression leave more subtle genomic evidence and have received less attention. Here, we present a major extension of the ARGweaver algorithm, called ARGweaver-D, which can infer local genetic relationships under a user-defined demographic model that includes population splits and migration events. This Bayesian algorithm probabilistically samples ancestral recombination graphs (ARGs) that specify not only tree topologies and branch lengths along the genome, but also indicate migrant lineages. The sampled ARGs can therefore be parsed to produce probabilities of introgression along the genome. We show that this method is well powered to detect the archaic migration into modern humans, even with only a few samples. We then show that the method can also detect introgressed regions stemming from older migration events, or from unsampled populations. We apply it to human, Neanderthal, and Denisovan genomes, looking for signatures of older proposed migration events, including ancient humans into Neanderthal, and unknown archaic hominins into Denisovans. We identify 3% of the Neanderthal genome that is putatively introgressed from ancient humans, and estimate that the gene flow occurred between 200-300kya. We find no convincing evidence that negative selection acted against these regions. Finally, we predict that 1% of the Denisovan genome was introgressed from an unsequenced, but highly diverged, archaic hominin ancestor. About 15% of these "super-archaic" regions-comprising at least about 4Mb-were, in turn, introgressed into modern humans and continue to exist in the genomes of people alive today.


Subject(s)
Gene Flow , Models, Genetic , Neanderthals/genetics , Population/genetics , Recombination, Genetic , Animals , Evolution, Molecular , Human Migration , Humans
4.
Proc Natl Acad Sci U S A ; 117(48): 30554-30565, 2020 12 01.
Article in English | MEDLINE | ID: mdl-33199636

ABSTRACT

Numerous studies of emerging species have identified genomic "islands" of elevated differentiation against a background of relative homogeneity. The causes of these islands remain unclear, however, with some signs pointing toward "speciation genes" that locally restrict gene flow and others suggesting selective sweeps that have occurred within nascent species after speciation. Here, we examine this question through the lens of genome sequence data for five species of southern capuchino seedeaters, finch-like birds from South America that have undergone a species radiation during the last ∼50,000 generations. By applying newly developed statistical methods for ancestral recombination graph inference and machine-learning methods for the prediction of selective sweeps, we show that previously identified islands of differentiation in these birds appear to be generally associated with relatively recent, species-specific selective sweeps, most of which are predicted to be soft sweeps acting on standing genetic variation. Many of these sweeps coincide with genes associated with melanin-based variation in plumage, suggesting a prominent role for sexual selection. At the same time, a few loci also exhibit indications of possible selection against gene flow. These observations shed light on the complex manner in which natural selection shapes genome sequences during speciation.


Subject(s)
Genomic Islands , Models, Genetic , Animals , Biodiversity , Genetic Variation , Machine Learning
5.
Nature ; 530(7591): 429-33, 2016 Feb 25.
Article in English | MEDLINE | ID: mdl-26886800

ABSTRACT

It has been shown that Neanderthals contributed genetically to modern humans outside Africa 47,000-65,000 years ago. Here we analyse the genomes of a Neanderthal and a Denisovan from the Altai Mountains in Siberia together with the sequences of chromosome 21 of two Neanderthals from Spain and Croatia. We find that a population that diverged early from other modern humans in Africa contributed genetically to the ancestors of Neanderthals from the Altai Mountains roughly 100,000 years ago. By contrast, we do not detect such a genetic contribution in the Denisovan or the two European Neanderthals. We conclude that in addition to later interbreeding events, the ancestors of Neanderthals from the Altai Mountains and early modern humans met and interbred, possibly in the Near East, many thousands of years earlier than previously thought.


Subject(s)
Gene Flow/genetics , Neanderthals/genetics , Altitude , Animals , Bayes Theorem , Chromosomes, Human, Pair 21/genetics , Croatia/ethnology , Genome, Human/genetics , Genomics , Haplotypes/genetics , Heterozygote , Humans , Hybridization, Genetic/genetics , Phylogeny , Population Density , Siberia , Spain/ethnology , Time Factors
6.
Proc Natl Acad Sci U S A ; 116(48): 24174-24183, 2019 11 26.
Article in English | MEDLINE | ID: mdl-31712408

ABSTRACT

Color pattern mimicry in Heliconius butterflies is a classic case study of complex trait adaptation via selection on a few large effect genes. Association studies have linked color pattern variation to a handful of noncoding regions, yet the presumptive cis-regulatory elements (CREs) that control color patterning remain unknown. Here we combine chromatin assays, DNA sequence associations, and genome editing to functionally characterize 5 cis-regulatory elements of the color pattern gene optix We were surprised to find that the cis-regulatory architecture of optix is characterized by pleiotropy and regulatory fragility, where deletion of individual cis-regulatory elements has broad effects on both color pattern and wing vein development. Remarkably, we found orthologous cis-regulatory elements associate with wing pattern convergence of distantly related comimics, suggesting that parallel coevolution of ancestral elements facilitated pattern mimicry. Our results support a model of color pattern evolution in Heliconius where changes to ancient, multifunctional cis-regulatory elements underlie adaptive radiation.


Subject(s)
Butterflies/physiology , Enhancer Elements, Genetic , Genetic Pleiotropy , Pigmentation/physiology , Wings, Animal/physiology , Adaptation, Physiological/genetics , Animals , CRISPR-Cas Systems , Chimera , Evolution, Molecular , Genome, Insect , Genome-Wide Association Study , Insect Proteins/genetics , Phylogeny , Pigmentation/genetics , Promoter Regions, Genetic , Regulatory Sequences, Nucleic Acid
7.
Bioinformatics ; 32(12): 1895-7, 2016 06 15.
Article in English | MEDLINE | ID: mdl-27153702

ABSTRACT

UNLABELLED: SweepFinder is a widely used program that implements a powerful likelihood-based method for detecting recent positive selection, or selective sweeps. Here, we present SweepFinder2, an extension of SweepFinder with increased sensitivity and robustness to the confounding effects of mutation rate variation and background selection. Moreover, SweepFinder2 has increased flexibility that enables the user to specify test sites, set the distance between test sites and utilize a recombination map. AVAILABILITY AND IMPLEMENTATION: SweepFinder2 is a freely-available (www.personal.psu.edu/mxd60/sf2.html) software package that is written in C and can be run from a Unix command line. CONTACT: mxd60@psu.edu.


Subject(s)
Mutation Rate , Selection, Genetic , Software , Evolution, Molecular , Humans , Likelihood Functions
8.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Article in English | MEDLINE | ID: mdl-21993624

ABSTRACT

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Subject(s)
Evolution, Molecular , Genome, Human/genetics , Genome/genetics , Mammals/genetics , Animals , Disease , Exons/genetics , Genomics , Health , Humans , Molecular Sequence Annotation , Phylogeny , RNA/classification , RNA/genetics , Selection, Genetic/genetics , Sequence Alignment , Sequence Analysis, DNA
9.
PLoS Genet ; 10(5): e1004342, 2014.
Article in English | MEDLINE | ID: mdl-24831947

ABSTRACT

The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps.


Subject(s)
Evolution, Molecular , Genome, Human , Recombination, Genetic , Selection, Genetic/genetics , Algorithms , Computer Simulation , Humans , Markov Chains , Models, Genetic , Monte Carlo Method
10.
PLoS Genet ; 9(8): e1003684, 2013.
Article in English | MEDLINE | ID: mdl-23966869

ABSTRACT

GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available.


Subject(s)
Evolution, Molecular , Gene Conversion/genetics , Pan troglodytes/genetics , Phylogeny , Selection, Genetic , Animals , Base Sequence , Chromosome Mapping , Genome , Humans , Mammals , Models, Theoretical , Recombination, Genetic , Sequence Alignment
11.
Nature ; 451(7181): 994-7, 2008 Feb 21.
Article in English | MEDLINE | ID: mdl-18288194

ABSTRACT

Quantifying the number of deleterious mutations per diploid human genome is of crucial concern to both evolutionary and medical geneticists. Here we combine genome-wide polymorphism data from PCR-based exon resequencing, comparative genomic data across mammalian species, and protein structure predictions to estimate the number of functionally consequential single-nucleotide polymorphisms (SNPs) carried by each of 15 African American (AA) and 20 European American (EA) individuals. We find that AAs show significantly higher levels of nucleotide heterozygosity than do EAs for all categories of functional SNPs considered, including synonymous, non-synonymous, predicted 'benign', predicted 'possibly damaging' and predicted 'probably damaging' SNPs. This result is wholly consistent with previous work showing higher overall levels of nucleotide variation in African populations than in Europeans. EA individuals, in contrast, have significantly more genotypes homozygous for the derived allele at synonymous and non-synonymous SNPs and for the damaging allele at 'probably damaging' SNPs than AAs do. For SNPs segregating only in one population or the other, the proportion of non-synonymous SNPs is significantly higher in the EA sample (55.4%) than in the AA sample (47.0%; P < 2.3 x 10(-37)). We observe a similar proportional excess of SNPs that are inferred to be 'probably damaging' (15.9% in EA; 12.1% in AA; P < 3.3 x 10(-11)). Using extensive simulations, we show that this excess proportion of segregating damaging alleles in Europeans is probably a consequence of a bottleneck that Europeans experienced at about the time of the migration out of Africa.


Subject(s)
Genome, Human/genetics , Polymorphism, Single Nucleotide/genetics , Africa/ethnology , Alleles , Computational Biology , Emigration and Immigration , Europe/ethnology , Exons/genetics , Heterozygote , Homozygote , Humans , Polymerase Chain Reaction , United States
12.
Science ; 383(6683): eadj1415, 2024 Feb 09.
Article in English | MEDLINE | ID: mdl-38330136

ABSTRACT

Lung adenocarcinoma (LUAD) and small cell lung cancer (SCLC) are thought to originate from different epithelial cell types in the lung. Intriguingly, LUAD can histologically transform into SCLC after treatment with targeted therapies. In this study, we designed models to follow the conversion of LUAD to SCLC and found that the barrier to histological transformation converges on tolerance to Myc, which we implicate as a lineage-specific driver of the pulmonary neuroendocrine cell. Histological transformations are frequently accompanied by activation of the Akt pathway. Manipulating this pathway permitted tolerance to Myc as an oncogenic driver, producing rare, stem-like cells that transcriptionally resemble the pulmonary basal lineage. These findings suggest that histological transformation may require the plasticity inherent to the basal stem cell, enabling tolerance to previously incompatible oncogenic driver programs.


Subject(s)
Adenocarcinoma of Lung , Lung Neoplasms , Proto-Oncogene Proteins c-akt , Proto-Oncogene Proteins c-myc , Small Cell Lung Carcinoma , Humans , Adenocarcinoma of Lung/genetics , Adenocarcinoma of Lung/pathology , Adenocarcinoma of Lung/therapy , Epithelial Cells/pathology , Lung/pathology , Lung Neoplasms/genetics , Lung Neoplasms/pathology , Lung Neoplasms/therapy , Small Cell Lung Carcinoma/genetics , Small Cell Lung Carcinoma/pathology , Small Cell Lung Carcinoma/therapy , Oncogenes , Cell Lineage , Proto-Oncogene Proteins c-myc/genetics , Proto-Oncogene Proteins c-akt/genetics , Molecular Targeted Therapy
13.
Mol Biol Evol ; 29(3): 1047-57, 2012 Mar.
Article in English | MEDLINE | ID: mdl-22075116

ABSTRACT

GC-biased gene conversion (gBGC) is a recombination-associated evolutionary process that accelerates the fixation of guanine or cytosine alleles, regardless of their effects on fitness. gBGC can increase the overall rate of substitutions, a hallmark of positive selection. Many fast-evolving genes and noncoding sequences in the human genome have GC-biased substitution patterns, suggesting that gBGC-in contrast to adaptive processes-may have driven the human changes in these sequences. To investigate this hypothesis, we developed a substitution model for DNA sequence evolution that quantifies the nonlinear interacting effects of selection and gBGC on substitution rates and patterns. Based on this model, we used a series of lineage-specific likelihood ratio tests to evaluate sequence alignments for evidence of changes in mode of selection, action of gBGC, or both. With a false positive rate of less than 5% for individual tests, we found that the majority (76%) of previously identified human accelerated regions are best explained without gBGC, whereas a substantial minority (19%) are best explained by the action of gBGC alone. Further, more than half (55%) have substitution rates that significantly exceed local estimates of the neutral rate, suggesting that these regions may have been shaped by positive selection rather than by relaxation of constraint. By distinguishing the effects of gBGC, relaxation of constraint, and positive selection we provide an integrated analysis of the evolutionary forces that shaped the fastest evolving regions of the human genome, which facilitates the design of targeted functional studies of adaptation in humans.


Subject(s)
Evolution, Molecular , Gene Conversion/genetics , Genome, Human/genetics , Models, Genetic , Selection, Genetic , Base Composition/genetics , Base Sequence , Humans , Likelihood Functions , Sequence Alignment
14.
Mol Biol Evol ; 29(11): 3309-20, 2012 Nov.
Article in English | MEDLINE | ID: mdl-22617954

ABSTRACT

The prominent role of Horizontal Gene Transfer (HGT) in the evolution of bacteria is now well documented, but few studies have differentiated between evolutionary events that predominantly cause genes in one lineage to be replaced by homologs from another lineage ("replacing HGT") and events that result in the addition of substantial new genomic material ("additive HGT"). Here in, we make use of the distinct phylogenetic signatures of replacing and additive HGTs in a genome-wide study of the important human pathogen Streptococcus pyogenes (SPY) and its close relatives S. dysgalactiae subspecies equisimilis (SDE) and S. dysgalactiae subspecies dysgalactiae (SDD). Using recently developed statistical models and computational methods, we find evidence for abundant gene flow of both kinds within each of the SPY and SDE clades and of reduced levels of exchange between SPY and SDD. In addition, our analysis strongly supports a pronounced asymmetry in SPY-SDE gene flow, favoring the SPY-to-SDE direction. This finding is of particular interest in light of the recent increase in virulence of pathogenic SDE. We find much stronger evidence for SPY-SDE gene flow among replacing than among additive transfers, suggesting a primary influence from homologous recombination between co-occurring SPY and SDE cells in human hosts. Putative virulence genes are correlated with transfer events, but this correlation is found to be driven by additive, not replacing, HGTs. The genes affected by additive HGTs are enriched for functions having to do with transposition, recombination, and DNA integration, consistent with previous findings, whereas replacing HGTs seen to influence a more diverse set of genes. Additive transfers are also found to be associated with evidence of positive selection. These findings shed new light on the manner in which HGT has shaped pathogenic bacterial genomes.


Subject(s)
Gene Transfer, Horizontal/genetics , Phylogeny , Streptococcus/genetics , Gene Duplication/genetics , Genes, Bacterial/genetics , Genes, Essential/genetics , Humans , Models, Genetic , Selection, Genetic
15.
Genome Res ; 20(1): 110-21, 2010 Jan.
Article in English | MEDLINE | ID: mdl-19858363

ABSTRACT

Methods for detecting nucleotide substitution rates that are faster or slower than expected under neutral drift are widely used to identify candidate functional elements in genomic sequences. However, most existing methods consider either reductions (conservation) or increases (acceleration) in rate but not both, or assume that selection acts uniformly across the branches of a phylogeny. Here we examine the more general problem of detecting departures from the neutral rate of substitution in either direction, possibly in a clade-specific manner. We consider four statistical, phylogenetic tests for addressing this problem: a likelihood ratio test, a score test, a test based on exact distributions of numbers of substitutions, and the genomic evolutionary rate profiling (GERP) test. All four tests have been implemented in a freely available program called phyloP. Based on extensive simulation experiments, these tests are remarkably similar in statistical power. With 36 mammalian species, they all appear to be capable of fairly good sensitivity with low false-positive rates in detecting strong selection at individual nucleotides, moderate selection in 3-bp elements, and weaker or clade-specific selection in longer elements. By applying phyloP to mammalian multiple alignments from the ENCODE project, we shed light on patterns of conservation/acceleration in known and predicted functional elements, approximate fractions of sites subject to constraint, and differences in clade-specific selection in the primate and glires clades. We also describe new "Conservation" tracks in the UCSC Genome Browser that display both phyloP and phastCons scores for genome-wide alignments of 44 vertebrate species.


Subject(s)
Base Sequence , Evolution, Molecular , Mammals/genetics , Phylogeny , Selection, Genetic , Animals , Computer Simulation , Conserved Sequence , Humans , Likelihood Functions , Mammals/classification , Models, Genetic , Models, Statistical , Primates/genetics , Sequence Alignment , Software , Species Specificity
16.
Brief Bioinform ; 12(1): 41-51, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21278375

ABSTRACT

The PHylogenetic Analysis with Space/Time models (PHAST) software package consists of a collection of command-line programs and supporting libraries for comparative genomics. PHAST is best known as the engine behind the Conservation tracks in the University of California, Santa Cruz (UCSC) Genome Browser. However, it also includes several other tools for phylogenetic modeling and functional element identification, as well as utilities for manipulating alignments, trees and genomic annotations. PHAST has been in development since 2002 and has now been downloaded more than 1000 times, but so far it has been released only as provisional ('beta') software. Here, we describe the first official release (v1.0) of PHAST, with improved stability, portability and documentation and several new features. We outline the components of the package and detail recent improvements. In addition, we introduce a new interface to the PHAST libraries from the R statistical computing environment, called RPHAST, and illustrate its use in a series of vignettes. We demonstrate that RPHAST can be particularly useful in applications involving both large-scale phylogenomics and complex statistical analyses. The R interface also makes the PHAST libraries acccessible to non-C programmers, and is useful for rapid prototyping. PHAST v1.0 and RPHAST v1.0 are available for download at http://compgen.bscb.cornell.edu/phast, under the terms of an unrestrictive BSD-style license. RPHAST can also be obtained from the Comprehensive R Archive Network (CRAN; http://cran.r-project.org).


Subject(s)
Genomics/methods , Phylogeny , Software , Databases, Genetic , Genome , Information Storage and Retrieval/methods , Internet
17.
PLoS Biol ; 8(8): e1000451, 2010 Aug 10.
Article in English | MEDLINE | ID: mdl-20711490

ABSTRACT

Domestic dogs exhibit tremendous phenotypic diversity, including a greater variation in body size than any other terrestrial mammal. Here, we generate a high density map of canine genetic variation by genotyping 915 dogs from 80 domestic dog breeds, 83 wild canids, and 10 outbred African shelter dogs across 60,968 single-nucleotide polymorphisms (SNPs). Coupling this genomic resource with external measurements from breed standards and individuals as well as skeletal measurements from museum specimens, we identify 51 regions of the dog genome associated with phenotypic variation among breeds in 57 traits. The complex traits include average breed body size and external body dimensions and cranial, dental, and long bone shape and size with and without allometric scaling. In contrast to the results from association mapping of quantitative traits in humans and domesticated plants, we find that across dog breeds, a small number of quantitative trait loci (< or = 3) explain the majority of phenotypic variation for most of the traits we studied. In addition, many genomic regions show signatures of recent selection, with most of the highly differentiated regions being associated with breed-defining traits such as body size, coat characteristics, and ear floppiness. Our results demonstrate the efficacy of mapping multiple traits in the domestic dog using a database of genotyped individuals and highlight the important role human-directed selection has played in altering the genetic architecture of key traits in this important species.


Subject(s)
Animals, Domestic/anatomy & histology , Animals, Domestic/genetics , Dogs/anatomy & histology , Genetic Variation , Animals , Body Size , Genome , Genome-Wide Association Study , Phenotype , Polymorphism, Single Nucleotide , Quantitative Trait Loci
18.
Nature ; 433(7023): E6; discussion E7-8, 2005 Jan 20.
Article in English | MEDLINE | ID: mdl-15662372

ABSTRACT

Positive selection at the molecular level is usually indicated by an increase in the ratio of non-synonymous to synonymous substitutions (dN/dS) in comparative data. However, Plotkin et al. describe a new method for detecting positive selection based on a single nucleotide sequence. We show here that this method is particularly sensitive to assumptions regarding the underlying mutational processes and does not provide a reliable way to identify positive selection.


Subject(s)
Biological Evolution , Codon/genetics , Genomics/methods , Models, Genetic , Selection, Genetic , Animals , Genome , Mutagenesis/genetics , Mutation, Missense/genetics , Plasmodium falciparum/genetics , Reproducibility of Results
19.
PLoS Genet ; 4(8): e1000144, 2008 Aug 01.
Article in English | MEDLINE | ID: mdl-18670650

ABSTRACT

Genome-wide scans for positively selected genes (PSGs) in mammals have provided insight into the dynamics of genome evolution, the genetic basis of differences between species, and the functions of individual genes. However, previous scans have been limited in power and accuracy owing to small numbers of available genomes. Here we present the most comprehensive examination of mammalian PSGs to date, using the six high-coverage genome assemblies now available for eutherian mammals. The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied. Of approximately 16,500 human genes with high-confidence orthologs in at least two other species, 400 genes showed significant evidence of positive selection (FDR<0.05), according to a standard likelihood ratio test. An additional 144 genes showed evidence of positive selection on particular lineages or clades. As in previous studies, the identified PSGs were enriched for roles in defense/immunity, chemosensory perception, and reproduction, but enrichments were also evident for more specific functions, such as complement-mediated immunity and taste perception. Several pathways were strongly enriched for PSGs, suggesting possible co-evolution of interacting genes. A novel Bayesian analysis of the possible "selection histories" of each gene indicated that most PSGs have switched multiple times between positive selection and nonselection, suggesting that positive selection is often episodic. A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs. Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes. This study provides additional evidence for widespread positive selection in mammalian evolution and new genome-wide insights into the functional implications of positive selection.


Subject(s)
Evolution, Molecular , Genome , Mammals/genetics , Selection, Genetic , Animals , Bayes Theorem , Databases, Genetic , Dogs , Gene Expression , Humans , Likelihood Functions , Macaca mulatta , Mammals/classification , Mice , Pan troglodytes , Phylogeny , Primates , Rats , Rodentia , Sequence Alignment
20.
Mol Biol Evol ; 26(12): 2755-64, 2009 Dec.
Article in English | MEDLINE | ID: mdl-19713326

ABSTRACT

Balancing selection is potentially an important biological force for maintaining advantageous genetic diversity in populations, including variation that is responsible for long-term adaptation to the environment. By serving as a means to maintain genetic variation, it may be particularly relevant to maintaining phenotypic variation in natural populations. Nevertheless, its prevalence and specific targets in the human genome remain largely unknown. We have analyzed the patterns of diversity and divergence of 13,400 genes in two human populations using an unbiased single-nucleotide polymorphism data set, a genome-wide approach, and a method that incorporates demography in neutrality tests. We identified an unbiased catalog of genes with signatures of long-term balancing selection, which includes immunity genes as well as genes encoding keratins and membrane channels; the catalog also shows enrichment in functional categories involved in cellular structure. Patterns are mostly concordant in the two populations, with a small fraction of genes showing population-specific signatures of selection. Power considerations indicate that our findings represent a subset of all targets in the genome, suggesting that although balancing selection may not have an obvious impact on a large proportion of human genes, it is a key force affecting the evolution of a number of genes in humans.


Subject(s)
Genome, Human/genetics , Selection, Genetic , Alleles , Chromosome Segregation/genetics , Demography , Haplotypes/genetics , Humans , Quantitative Trait, Heritable , Sequence Analysis, DNA
SELECTION OF CITATIONS
SEARCH DETAIL