Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 38
Filter
Add more filters










Publication year range
1.
Science ; 383(6683): eadj1415, 2024 Feb 09.
Article in English | MEDLINE | ID: mdl-38330136

ABSTRACT

Lung adenocarcinoma (LUAD) and small cell lung cancer (SCLC) are thought to originate from different epithelial cell types in the lung. Intriguingly, LUAD can histologically transform into SCLC after treatment with targeted therapies. In this study, we designed models to follow the conversion of LUAD to SCLC and found that the barrier to histological transformation converges on tolerance to Myc, which we implicate as a lineage-specific driver of the pulmonary neuroendocrine cell. Histological transformations are frequently accompanied by activation of the Akt pathway. Manipulating this pathway permitted tolerance to Myc as an oncogenic driver, producing rare, stem-like cells that transcriptionally resemble the pulmonary basal lineage. These findings suggest that histological transformation may require the plasticity inherent to the basal stem cell, enabling tolerance to previously incompatible oncogenic driver programs.


Subject(s)
Adenocarcinoma of Lung , Lung Neoplasms , Proto-Oncogene Proteins c-akt , Proto-Oncogene Proteins c-myc , Small Cell Lung Carcinoma , Humans , Adenocarcinoma of Lung/genetics , Adenocarcinoma of Lung/pathology , Adenocarcinoma of Lung/therapy , Epithelial Cells/pathology , Lung/pathology , Lung Neoplasms/genetics , Lung Neoplasms/pathology , Lung Neoplasms/therapy , Small Cell Lung Carcinoma/genetics , Small Cell Lung Carcinoma/pathology , Small Cell Lung Carcinoma/therapy , Oncogenes , Cell Lineage , Proto-Oncogene Proteins c-myc/genetics , Proto-Oncogene Proteins c-akt/genetics , Molecular Targeted Therapy
2.
Nature ; 620(7976): 1080-1088, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37612508

ABSTRACT

Chromosomal instability (CIN) is a driver of cancer metastasis1-4, yet the extent to which this effect depends on the immune system remains unknown. Using ContactTracing-a newly developed, validated and benchmarked tool to infer the nature and conditional dependence of cell-cell interactions from single-cell transcriptomic data-we show that CIN-induced chronic activation of the cGAS-STING pathway promotes downstream signal re-wiring in cancer cells, leading to a pro-metastatic tumour microenvironment. This re-wiring is manifested by type I interferon tachyphylaxis selectively downstream of STING and a corresponding increase in cancer cell-derived endoplasmic reticulum (ER) stress response. Reversal of CIN, depletion of cancer cell STING or inhibition of ER stress response signalling abrogates CIN-dependent effects on the tumour microenvironment and suppresses metastasis in immune competent, but not severely immune compromised, settings. Treatment with STING inhibitors reduces CIN-driven metastasis in melanoma, breast and colorectal cancers in a manner dependent on tumour cell-intrinsic STING. Finally, we show that CIN and pervasive cGAS activation in micronuclei are associated with ER stress signalling, immune suppression and metastasis in human triple-negative breast cancer, highlighting a viable strategy to identify and therapeutically intervene in tumours spurred by CIN-induced inflammation.


Subject(s)
Chromosomal Instability , Disease Progression , Neoplasms , Humans , Benchmarking , Cell Communication , Colorectal Neoplasms/drug therapy , Colorectal Neoplasms/genetics , Colorectal Neoplasms/immunology , Colorectal Neoplasms/pathology , Melanoma/drug therapy , Melanoma/genetics , Melanoma/immunology , Melanoma/pathology , Tumor Microenvironment , Interferon Type I/immunology , Neoplasm Metastasis , Endoplasmic Reticulum Stress , Signal Transduction , Triple Negative Breast Neoplasms/drug therapy , Triple Negative Breast Neoplasms/genetics , Triple Negative Breast Neoplasms/immunology , Triple Negative Breast Neoplasms/pathology , Neoplasms/genetics , Neoplasms/immunology , Neoplasms/pathology
3.
Proc Natl Acad Sci U S A ; 120(10): e2213896120, 2023 03 07.
Article in English | MEDLINE | ID: mdl-36848554

ABSTRACT

DNA is replicated according to a defined spatiotemporal program that is linked to both gene regulation and genome stability. The evolutionary forces that have shaped replication timing programs in eukaryotic species are largely unknown. Here, we studied the molecular causes and consequences of replication timing evolution across 94 humans, 95 chimpanzees, and 23 rhesus macaques. Replication timing differences recapitulated the species' phylogenetic tree, suggesting continuous evolution of the DNA replication timing program in primates. Hundreds of genomic regions had significant replication timing variation between humans and chimpanzees, of which 66 showed advances in replication origin firing in humans, while 57 were delayed. Genes overlapping these regions displayed correlated changes in expression levels and chromatin structure. Many human-chimpanzee variants also exhibited interindividual replication timing variation, pointing to ongoing evolution of replication timing at these loci. Association of replication timing variation with genetic variation revealed that DNA sequence evolution can explain replication timing variation between species. Taken together, DNA replication timing shows substantial and ongoing evolution in the human lineage that is driven by sequence alterations and could impact regulatory evolution at specific genomic sites.


Subject(s)
DNA Replication Timing , Pan troglodytes , Animals , Humans , Pan troglodytes/genetics , DNA Replication Timing/genetics , Macaca mulatta/genetics , Phylogeny , Eukaryota
4.
Proc Natl Acad Sci U S A ; 117(48): 30554-30565, 2020 12 01.
Article in English | MEDLINE | ID: mdl-33199636

ABSTRACT

Numerous studies of emerging species have identified genomic "islands" of elevated differentiation against a background of relative homogeneity. The causes of these islands remain unclear, however, with some signs pointing toward "speciation genes" that locally restrict gene flow and others suggesting selective sweeps that have occurred within nascent species after speciation. Here, we examine this question through the lens of genome sequence data for five species of southern capuchino seedeaters, finch-like birds from South America that have undergone a species radiation during the last ∼50,000 generations. By applying newly developed statistical methods for ancestral recombination graph inference and machine-learning methods for the prediction of selective sweeps, we show that previously identified islands of differentiation in these birds appear to be generally associated with relatively recent, species-specific selective sweeps, most of which are predicted to be soft sweeps acting on standing genetic variation. Many of these sweeps coincide with genes associated with melanin-based variation in plumage, suggesting a prominent role for sexual selection. At the same time, a few loci also exhibit indications of possible selection against gene flow. These observations shed light on the complex manner in which natural selection shapes genome sequences during speciation.


Subject(s)
Genomic Islands , Models, Genetic , Animals , Biodiversity , Genetic Variation , Machine Learning
5.
PLoS Genet ; 16(8): e1008895, 2020 08.
Article in English | MEDLINE | ID: mdl-32760067

ABSTRACT

The sequencing of Neanderthal and Denisovan genomes has yielded many new insights about interbreeding events between extinct hominins and the ancestors of modern humans. While much attention has been paid to the relatively recent gene flow from Neanderthals and Denisovans into modern humans, other instances of introgression leave more subtle genomic evidence and have received less attention. Here, we present a major extension of the ARGweaver algorithm, called ARGweaver-D, which can infer local genetic relationships under a user-defined demographic model that includes population splits and migration events. This Bayesian algorithm probabilistically samples ancestral recombination graphs (ARGs) that specify not only tree topologies and branch lengths along the genome, but also indicate migrant lineages. The sampled ARGs can therefore be parsed to produce probabilities of introgression along the genome. We show that this method is well powered to detect the archaic migration into modern humans, even with only a few samples. We then show that the method can also detect introgressed regions stemming from older migration events, or from unsampled populations. We apply it to human, Neanderthal, and Denisovan genomes, looking for signatures of older proposed migration events, including ancient humans into Neanderthal, and unknown archaic hominins into Denisovans. We identify 3% of the Neanderthal genome that is putatively introgressed from ancient humans, and estimate that the gene flow occurred between 200-300kya. We find no convincing evidence that negative selection acted against these regions. Finally, we predict that 1% of the Denisovan genome was introgressed from an unsequenced, but highly diverged, archaic hominin ancestor. About 15% of these "super-archaic" regions-comprising at least about 4Mb-were, in turn, introgressed into modern humans and continue to exist in the genomes of people alive today.


Subject(s)
Gene Flow , Models, Genetic , Neanderthals/genetics , Population/genetics , Recombination, Genetic , Animals , Evolution, Molecular , Human Migration , Humans
6.
Proc Natl Acad Sci U S A ; 116(48): 24174-24183, 2019 11 26.
Article in English | MEDLINE | ID: mdl-31712408

ABSTRACT

Color pattern mimicry in Heliconius butterflies is a classic case study of complex trait adaptation via selection on a few large effect genes. Association studies have linked color pattern variation to a handful of noncoding regions, yet the presumptive cis-regulatory elements (CREs) that control color patterning remain unknown. Here we combine chromatin assays, DNA sequence associations, and genome editing to functionally characterize 5 cis-regulatory elements of the color pattern gene optix We were surprised to find that the cis-regulatory architecture of optix is characterized by pleiotropy and regulatory fragility, where deletion of individual cis-regulatory elements has broad effects on both color pattern and wing vein development. Remarkably, we found orthologous cis-regulatory elements associate with wing pattern convergence of distantly related comimics, suggesting that parallel coevolution of ancestral elements facilitated pattern mimicry. Our results support a model of color pattern evolution in Heliconius where changes to ancient, multifunctional cis-regulatory elements underlie adaptive radiation.


Subject(s)
Butterflies/physiology , Enhancer Elements, Genetic , Genetic Pleiotropy , Pigmentation/physiology , Wings, Animal/physiology , Adaptation, Physiological/genetics , Animals , CRISPR-Cas Systems , Chimera , Evolution, Molecular , Genome, Insect , Genome-Wide Association Study , Insect Proteins/genetics , Phylogeny , Pigmentation/genetics , Promoter Regions, Genetic , Regulatory Sequences, Nucleic Acid
7.
Bioinformatics ; 32(12): 1895-7, 2016 06 15.
Article in English | MEDLINE | ID: mdl-27153702

ABSTRACT

UNLABELLED: SweepFinder is a widely used program that implements a powerful likelihood-based method for detecting recent positive selection, or selective sweeps. Here, we present SweepFinder2, an extension of SweepFinder with increased sensitivity and robustness to the confounding effects of mutation rate variation and background selection. Moreover, SweepFinder2 has increased flexibility that enables the user to specify test sites, set the distance between test sites and utilize a recombination map. AVAILABILITY AND IMPLEMENTATION: SweepFinder2 is a freely-available (www.personal.psu.edu/mxd60/sf2.html) software package that is written in C and can be run from a Unix command line. CONTACT: mxd60@psu.edu.


Subject(s)
Mutation Rate , Selection, Genetic , Software , Evolution, Molecular , Humans , Likelihood Functions
8.
Nature ; 530(7591): 429-33, 2016 Feb 25.
Article in English | MEDLINE | ID: mdl-26886800

ABSTRACT

It has been shown that Neanderthals contributed genetically to modern humans outside Africa 47,000-65,000 years ago. Here we analyse the genomes of a Neanderthal and a Denisovan from the Altai Mountains in Siberia together with the sequences of chromosome 21 of two Neanderthals from Spain and Croatia. We find that a population that diverged early from other modern humans in Africa contributed genetically to the ancestors of Neanderthals from the Altai Mountains roughly 100,000 years ago. By contrast, we do not detect such a genetic contribution in the Denisovan or the two European Neanderthals. We conclude that in addition to later interbreeding events, the ancestors of Neanderthals from the Altai Mountains and early modern humans met and interbred, possibly in the Near East, many thousands of years earlier than previously thought.


Subject(s)
Gene Flow/genetics , Neanderthals/genetics , Altitude , Animals , Bayes Theorem , Chromosomes, Human, Pair 21/genetics , Croatia/ethnology , Genome, Human/genetics , Genomics , Haplotypes/genetics , Heterozygote , Humans , Hybridization, Genetic/genetics , Phylogeny , Population Density , Siberia , Spain/ethnology , Time Factors
9.
Nat Genet ; 47(3): 276-83, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25599402

ABSTRACT

We describe a new computational method for estimating the probability that a point mutation at each position in a genome will influence fitness. These 'fitness consequence' (fitCons) scores serve as evolution-based measures of potential genomic function. Our approach is to cluster genomic positions into groups exhibiting distinct 'fingerprints' on the basis of high-throughput functional genomic data, then to estimate a probability of fitness consequences for each group from associated patterns of genetic polymorphism and divergence. We have generated fitCons scores for three human cell types on the basis of public data from ENCODE. In comparison with conventional conservation scores, fitCons scores show considerably improved prediction power for cis regulatory elements. In addition, fitCons scores indicate that 4.2-7.5% of nucleotides in the human genome have influenced fitness since the human-chimpanzee divergence, and they suggest that recent evolutionary turnover has had limited impact on the functional content of the genome.


Subject(s)
Genetic Fitness , Genome, Human , Point Mutation , Animals , Cell Line , Evolution, Molecular , Human Umbilical Vein Endothelial Cells , Humans , Pan troglodytes/genetics , Polymorphism, Genetic , Probability , Regulatory Sequences, Nucleic Acid
10.
Curr Opin Genet Dev ; 29: 15-21, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25156517

ABSTRACT

Human accelerated regions (HARs) are DNA sequences that changed very little throughout mammalian evolution, but then experienced a burst of changes in humans since divergence from chimpanzees. This unexpected evolutionary signature is suggestive of deeply conserved function that was lost or changed on the human lineage. Since their discovery, the actual roles of HARs in human evolution have remained somewhat elusive, due to their being almost exclusively non-coding sequences with no annotation. Ongoing research is beginning to crack this problem by leveraging new genome sequences, functional genomics data, computational approaches, and genetic assays to reveal that many HARs are developmental gene regulatory elements and RNA genes, most of which evolved their uniquely human mutations through positive selection before divergence of archaic hominins and diversification of modern humans.


Subject(s)
Conserved Sequence/genetics , DNA/genetics , Evolution, Molecular , Genome, Human/genetics , Hominidae/genetics , Animals , Chromosome Mapping , DNA/classification , Humans , Models, Genetic , Phylogeny
11.
PLoS Genet ; 10(5): e1004342, 2014.
Article in English | MEDLINE | ID: mdl-24831947

ABSTRACT

The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of [Formula: see text] chromosomes conditional on an ARG of [Formula: see text] chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. The patterns we observe near protein-coding genes are consistent with a primary influence from background selection rather than hitchhiking, although we cannot rule out a contribution from recurrent selective sweeps.


Subject(s)
Evolution, Molecular , Genome, Human , Recombination, Genetic , Selection, Genetic/genetics , Algorithms , Computer Simulation , Humans , Markov Chains , Models, Genetic , Monte Carlo Method
12.
PLoS Genet ; 9(8): e1003684, 2013.
Article in English | MEDLINE | ID: mdl-23966869

ABSTRACT

GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available.


Subject(s)
Evolution, Molecular , Gene Conversion/genetics , Pan troglodytes/genetics , Phylogeny , Selection, Genetic , Animals , Base Sequence , Chromosome Mapping , Genome , Humans , Mammals , Models, Theoretical , Recombination, Genetic , Sequence Alignment
13.
Nat Genet ; 45(7): 723-9, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23749186

ABSTRACT

For decades, it has been hypothesized that gene regulation has had a central role in human evolution, yet much remains unknown about the genome-wide impact of regulatory mutations. Here we use whole-genome sequences and genome-wide chromatin immunoprecipitation and sequencing data to demonstrate that natural selection has profoundly influenced human transcription factor binding sites since the divergence of humans from chimpanzees 4-6 million years ago. Our analysis uses a new probabilistic method, called INSIGHT, for measuring the influence of selection on collections of short, interspersed noncoding elements. We find that, on average, transcription factor binding sites have experienced somewhat weaker selection than protein-coding genes. However, the binding sites of several transcription factors show clear evidence of adaptation. Several measures of selection are strongly correlated with predicted binding affinity. Overall, regulatory elements seem to contribute substantially to both adaptive substitutions and deleterious polymorphisms with key implications for human evolution and disease.


Subject(s)
Genome, Human , Selection, Genetic/genetics , Transcription Factors/metabolism , Animals , Base Sequence , Binding Sites/genetics , Chromosome Mapping , Computer Simulation , Genome, Human/genetics , Genome-Wide Association Study , Humans , Models, Genetic , Models, Statistical , Mutation/physiology , Regulatory Sequences, Nucleic Acid/genetics , Substrate Specificity
14.
Mol Biol Evol ; 29(11): 3309-20, 2012 Nov.
Article in English | MEDLINE | ID: mdl-22617954

ABSTRACT

The prominent role of Horizontal Gene Transfer (HGT) in the evolution of bacteria is now well documented, but few studies have differentiated between evolutionary events that predominantly cause genes in one lineage to be replaced by homologs from another lineage ("replacing HGT") and events that result in the addition of substantial new genomic material ("additive HGT"). Here in, we make use of the distinct phylogenetic signatures of replacing and additive HGTs in a genome-wide study of the important human pathogen Streptococcus pyogenes (SPY) and its close relatives S. dysgalactiae subspecies equisimilis (SDE) and S. dysgalactiae subspecies dysgalactiae (SDD). Using recently developed statistical models and computational methods, we find evidence for abundant gene flow of both kinds within each of the SPY and SDE clades and of reduced levels of exchange between SPY and SDD. In addition, our analysis strongly supports a pronounced asymmetry in SPY-SDE gene flow, favoring the SPY-to-SDE direction. This finding is of particular interest in light of the recent increase in virulence of pathogenic SDE. We find much stronger evidence for SPY-SDE gene flow among replacing than among additive transfers, suggesting a primary influence from homologous recombination between co-occurring SPY and SDE cells in human hosts. Putative virulence genes are correlated with transfer events, but this correlation is found to be driven by additive, not replacing, HGTs. The genes affected by additive HGTs are enriched for functions having to do with transposition, recombination, and DNA integration, consistent with previous findings, whereas replacing HGTs seen to influence a more diverse set of genes. Additive transfers are also found to be associated with evidence of positive selection. These findings shed new light on the manner in which HGT has shaped pathogenic bacterial genomes.


Subject(s)
Gene Transfer, Horizontal/genetics , Phylogeny , Streptococcus/genetics , Gene Duplication/genetics , Genes, Bacterial/genetics , Genes, Essential/genetics , Humans , Models, Genetic , Selection, Genetic
15.
Mol Biol Evol ; 29(3): 1047-57, 2012 Mar.
Article in English | MEDLINE | ID: mdl-22075116

ABSTRACT

GC-biased gene conversion (gBGC) is a recombination-associated evolutionary process that accelerates the fixation of guanine or cytosine alleles, regardless of their effects on fitness. gBGC can increase the overall rate of substitutions, a hallmark of positive selection. Many fast-evolving genes and noncoding sequences in the human genome have GC-biased substitution patterns, suggesting that gBGC-in contrast to adaptive processes-may have driven the human changes in these sequences. To investigate this hypothesis, we developed a substitution model for DNA sequence evolution that quantifies the nonlinear interacting effects of selection and gBGC on substitution rates and patterns. Based on this model, we used a series of lineage-specific likelihood ratio tests to evaluate sequence alignments for evidence of changes in mode of selection, action of gBGC, or both. With a false positive rate of less than 5% for individual tests, we found that the majority (76%) of previously identified human accelerated regions are best explained without gBGC, whereas a substantial minority (19%) are best explained by the action of gBGC alone. Further, more than half (55%) have substitution rates that significantly exceed local estimates of the neutral rate, suggesting that these regions may have been shaped by positive selection rather than by relaxation of constraint. By distinguishing the effects of gBGC, relaxation of constraint, and positive selection we provide an integrated analysis of the evolutionary forces that shaped the fastest evolving regions of the human genome, which facilitates the design of targeted functional studies of adaptation in humans.


Subject(s)
Evolution, Molecular , Gene Conversion/genetics , Genome, Human/genetics , Models, Genetic , Selection, Genetic , Base Composition/genetics , Base Sequence , Humans , Likelihood Functions , Sequence Alignment
16.
Nature ; 478(7370): 476-82, 2011 Oct 12.
Article in English | MEDLINE | ID: mdl-21993624

ABSTRACT

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.


Subject(s)
Evolution, Molecular , Genome, Human/genetics , Genome/genetics , Mammals/genetics , Animals , Disease , Exons/genetics , Genomics , Health , Humans , Molecular Sequence Annotation , Phylogeny , RNA/classification , RNA/genetics , Selection, Genetic/genetics , Sequence Alignment , Sequence Analysis, DNA
17.
Nat Genet ; 43(10): 1031-4, 2011 Sep 18.
Article in English | MEDLINE | ID: mdl-21926973

ABSTRACT

Whole-genome sequences provide a rich source of information about human evolution. Here we describe an effort to estimate key evolutionary parameters based on the whole-genome sequences of six individuals from diverse human populations. We used a Bayesian, coalescent-based approach to obtain information about ancestral population sizes, divergence times and migration rates from inferred genealogies at many neutrally evolving loci across the genome. We introduce new methods for accommodating gene flow between populations and integrating over possible phasings of diploid genotypes. We also describe a custom pipeline for genotype inference to mitigate biases from heterogeneous sequencing technologies and coverage levels. Our analysis indicates that the San population of southern Africa diverged from other human populations approximately 108-157 thousand years ago, that Eurasians diverged from an ancestral African population 38-64 thousand years ago, and that the effective population size of the ancestors of all modern humans was ∼9,000.


Subject(s)
Genetics, Population , Genome, Human , Population Density , Bayes Theorem , Chromosome Mapping , Evolution, Molecular , Gene Flow , Genetic Drift , Genetic Variation , Humans , Models, Genetic , Population Dynamics , Sequence Alignment , Validation Studies as Topic
18.
PLoS One ; 6(2): e17034, 2011 Feb 14.
Article in English | MEDLINE | ID: mdl-21340033

ABSTRACT

The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to ∼2× coverage. Here we examine the extent of sequencing error in these 2× assemblies, and its potential impact in downstream analyses. By comparing 2× assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1-4 errors per kilobase. While this error rate is fairly modest, sequencing error can still have surprising effects. For example, an apparent lineage-specific insertion in a coding region is more likely to reflect sequencing error than a true biological event, and the length distribution of coding indels is strongly distorted by error. We find that most errors are contributed by a small fraction of bases with low quality scores, in particular, by the ends of reads in regions of single-read coverage in the assembly. We explore several approaches for automatic sequencing error mitigation (SEM), making use of the localized nature of sequencing error, the fact that it is well predicted by quality scores, and information about errors that comes from comparisons across species. Our automatic methods for error mitigation cannot replace the need for additional sequencing, but they do allow substantial fractions of errors to be masked or eliminated at the cost of modest amounts of over-correction, and they can reduce the impact of error in downstream phylogenomic analyses. Our error-mitigated alignments are available for download.


Subject(s)
Databases, Nucleic Acid/standards , Molecular Sequence Annotation/standards , Research Design , Sequence Analysis, DNA/standards , Animals , Chromosome Mapping/methods , Genome/genetics , Genomics/methods , Humans , Mammals/genetics , Molecular Sequence Annotation/methods , Sequence Analysis, DNA/methods
19.
Brief Bioinform ; 12(1): 41-51, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21278375

ABSTRACT

The PHylogenetic Analysis with Space/Time models (PHAST) software package consists of a collection of command-line programs and supporting libraries for comparative genomics. PHAST is best known as the engine behind the Conservation tracks in the University of California, Santa Cruz (UCSC) Genome Browser. However, it also includes several other tools for phylogenetic modeling and functional element identification, as well as utilities for manipulating alignments, trees and genomic annotations. PHAST has been in development since 2002 and has now been downloaded more than 1000 times, but so far it has been released only as provisional ('beta') software. Here, we describe the first official release (v1.0) of PHAST, with improved stability, portability and documentation and several new features. We outline the components of the package and detail recent improvements. In addition, we introduce a new interface to the PHAST libraries from the R statistical computing environment, called RPHAST, and illustrate its use in a series of vignettes. We demonstrate that RPHAST can be particularly useful in applications involving both large-scale phylogenomics and complex statistical analyses. The R interface also makes the PHAST libraries acccessible to non-C programmers, and is useful for rapid prototyping. PHAST v1.0 and RPHAST v1.0 are available for download at http://compgen.bscb.cornell.edu/phast, under the terms of an unrestrictive BSD-style license. RPHAST can also be obtained from the Comprehensive R Archive Network (CRAN; http://cran.r-project.org).


Subject(s)
Genomics/methods , Phylogeny , Software , Databases, Genetic , Genome , Information Storage and Retrieval/methods , Internet
20.
PLoS Biol ; 8(8): e1000451, 2010 Aug 10.
Article in English | MEDLINE | ID: mdl-20711490

ABSTRACT

Domestic dogs exhibit tremendous phenotypic diversity, including a greater variation in body size than any other terrestrial mammal. Here, we generate a high density map of canine genetic variation by genotyping 915 dogs from 80 domestic dog breeds, 83 wild canids, and 10 outbred African shelter dogs across 60,968 single-nucleotide polymorphisms (SNPs). Coupling this genomic resource with external measurements from breed standards and individuals as well as skeletal measurements from museum specimens, we identify 51 regions of the dog genome associated with phenotypic variation among breeds in 57 traits. The complex traits include average breed body size and external body dimensions and cranial, dental, and long bone shape and size with and without allometric scaling. In contrast to the results from association mapping of quantitative traits in humans and domesticated plants, we find that across dog breeds, a small number of quantitative trait loci (< or = 3) explain the majority of phenotypic variation for most of the traits we studied. In addition, many genomic regions show signatures of recent selection, with most of the highly differentiated regions being associated with breed-defining traits such as body size, coat characteristics, and ear floppiness. Our results demonstrate the efficacy of mapping multiple traits in the domestic dog using a database of genotyped individuals and highlight the important role human-directed selection has played in altering the genetic architecture of key traits in this important species.


Subject(s)
Animals, Domestic/anatomy & histology , Animals, Domestic/genetics , Dogs/anatomy & histology , Genetic Variation , Animals , Body Size , Genome , Genome-Wide Association Study , Phenotype , Polymorphism, Single Nucleotide , Quantitative Trait Loci
SELECTION OF CITATIONS
SEARCH DETAIL
...