RESUMEN
Mycobacterium abscessus (Mab) is a multidrug-resistant pathogen increasingly responsible for severe pulmonary infections. Analysis of whole-genome sequences (WGS) of Mab demonstrates dense genetic clustering of clinical isolates collected from disparate geographic locations. This has been interpreted as supporting patient-to-patient transmission, but epidemiological studies have contradicted this interpretation. Here, we present evidence for a slowing of the Mab molecular clock rate coincident with the emergence of phylogenetic clusters. We performed phylogenetic inference using publicly available WGS from 483 Mab patient isolates. We implement a subsampling approach in combination with coalescent analysis to estimate the molecular clock rate along the long internal branches of the tree, indicating a faster long-term molecular clock rate compared to branches within phylogenetic clusters. We used ancestry simulation to predict the effects of clock rate variation on phylogenetic clustering and found that the degree of clustering in the observed phylogeny is more easily explained by a clock rate slowdown than by transmission. We also find that phylogenetic clusters are enriched in mutations affecting DNA repair machinery and report that clustered isolates have lower spontaneous mutation rates in vitro. We propose that Mab adaptation to the host environment through variation in DNA repair genes affects the organism's mutation rate and that this manifests as phylogenetic clustering. These results challenge the model that phylogenetic clustering in Mab is explained by person-to-person transmission and inform our understanding of transmission inference in emerging, facultative pathogens.
Asunto(s)
Mycobacterium abscessus , Humanos , Mycobacterium abscessus/genética , Tasa de Mutación , Filogenia , MutaciónRESUMEN
The rate at which plants grow is a major functional trait in plant ecology. However, little is known about its evolution in natural populations. Here, we investigate evolutionary and environmental factors shaping variation in the growth rate of Arabidopsis thaliana. We used plant diameter as a proxy to monitor plant growth over time in environments that mimicked latitudinal differences in the intensity of natural light radiation, across a set of 278 genotypes sampled within four broad regions, including an outgroup set of genotypes from China. A field experiment conducted under natural conditions confirmed the ecological relevance of the observed variation. All genotypes markedly expanded their rosette diameter when the light supply was decreased, demonstrating that environmental plasticity is a predominant source of variation to adapt plant size to prevailing light conditions. Yet, we detected significant levels of genetic variation both in growth rate and growth plasticity. Genome-wide association studies revealed that only 2 single nucleotide polymorphisms associate with genetic variation for growth above Bonferroni confidence levels. However, marginally associated variants were significantly enriched among genes with an annotated role in growth and stress reactions. Polygenic scores computed from marginally associated variants confirmed the polygenic basis of growth variation. For both light regimes, phenotypic divergence between the most distantly related population (China) and the various regions in Europe is smaller than the variation observed within Europe, indicating that the evolution of growth rate is likely to be constrained by stabilizing selection. We observed that Spanish genotypes, however, reach a significantly larger size than Northern European genotypes. Tests of adaptive divergence and analysis of the individual burden of deleterious mutations reveal that adaptive processes have played a more important role in shaping regional differences in rosette growth than maladaptive evolution.
Asunto(s)
Adaptación Fisiológica/genética , Arabidopsis/genética , Herencia Multifactorial/genética , Selección Genética , Aclimatación/genética , Arabidopsis/crecimiento & desarrollo , China , Europa (Continente) , Variación Genética/genética , Genética de Población , Genotipo , Fenotipo , Desarrollo de la Planta/genéticaRESUMEN
During range expansion, edge populations are expected to face increased genetic drift, which in turn can alter and potentially compromise adaptive dynamics, preventing the removal of deleterious mutations and slowing down adaptation. Here, we contrast populations of the European subspecies Arabidopsis lyrata ssp. petraea, which expanded its Northern range after the last glaciation. We document a sharp decline in effective population size in the range-edge population and observe that nonsynonymous variants segregate at higher frequencies. We detect a 4.9% excess of derived nonsynonymous variants per individual in the range-edge population, suggesting an increase of the genomic burden of deleterious mutations. Inference of the fitness effects of mutations and modeling of allele frequencies under the explicit demographic history of each population predicts a depletion of rare deleterious variants in the range-edge population, but an enrichment for fixed ones, consistent with the bottleneck effect. However, the demographic history of the range-edge population predicts a small net decrease in per-individual fitness. Consistent with this prediction, the range-edge population is not impaired in its growth and survival measured in a common garden experiment. We further observe that the allelic diversity at the self-incompatibility locus, which ensures strict outcrossing and evolves under negative frequency-dependent selection, has remained unchanged. Genomic footprints indicative of selective sweeps are broader in the Northern population but not less frequent. We conclude that the outcrossing species A. lyrata ssp. petraea shows a strong resilience to the effect of range expansion.
Asunto(s)
Arabidopsis/genética , Carga Genética , Dispersión de las Plantas , Flujo Génico , Genes Recesivos , Aptitud Genética , Genoma de Planta , Dinámica Poblacional , Selección GenéticaRESUMEN
Knowledge of mutation rates is crucial for calibrating population genetics models of demographic history in units of years. However, mutation rates remain challenging to estimate because of the need to identify extremely rare events. We estimated the nuclear mutation rate in wolves by identifying de novo mutations in a pedigree of seven wolves. Putative de novo mutations were discovered by whole-genome sequencing and were verified by Sanger sequencing of parents and offspring. Using stringent filters and an estimate of the false negative rate in the remaining observable genome, we obtain an estimate of â¼4.5 × 10-9 per base pair per generation and provide conservative bounds between 2.6 × 10-9 and 7.1 × 10-9. Although our estimate is consistent with recent mutation rate estimates from ancient DNA (4.0 × 10-9 and 3.0-4.5 × 10-9), it suggests a wider possible range. We also examined the consequences of our rate and the accompanying interval for dating several critical events in canid demographic history. For example, applying our full range of rates to coalescent models of dog and wolf demographic history implies a wide set of possible divergence times between the ancestral populations of dogs and extant Eurasian wolves (16,000-64,000 years ago) although our point estimate indicates a date between 25,000 and 33,000 years ago. Aside from one study in mice, ours provides the only direct mammalian mutation rate outside of primates and is likely to be vital to future investigations of mutation rate evolution.
RESUMEN
De novo mutations occur at substantially different rates depending on genomic location, sequence context and DNA strand. The success of methods to estimate selection intensity, infer demographic history and map rare disease genes, depends strongly on assumptions about the local mutation rate. Here we present Roulette, a genome-wide mutation rate model at basepair resolution that incorporates known determinants of local mutation rate. Roulette is shown to be more accurate than existing models. We use Roulette to refine the estimates of population growth within Europe by incorporating the full range of human mutation rates. The analysis of significant deviations from the model predictions revealed a tenfold increase in mutation rate in nearly all genes transcribed by polymerase III (Pol III), suggesting a new mutagenic mechanism. We also detected an elevated mutation rate within transcription factor binding sites restricted to sites actively used in testis and residing in promoters.
Asunto(s)
Mutágenos , Tasa de Mutación , ARN Polimerasa III , Transcripción Genética , Humanos , Masculino , ADN/genética , Mutagénesis , Mutación , Nucleotidiltransferasas , Regiones Promotoras Genéticas/genética , Transcripción Genética/genética , ARN Polimerasa III/metabolismoRESUMEN
A multitude of demographic, health, and genetic factors are associated with the risk of developing severe COVID-19 following infection by the SARS-CoV-2. There is a need to perform studies across human societies and to investigate the full spectrum of genetic variation of the virus. Using data from 869 COVID-19 patients in Bahrain between March 2020 and March 2021, we analyzed paired viral sequencing and non-genetic host data to understand host and viral determinants of severe COVID-19. We estimated the effects of demographic variables specific to the Bahrain population and found that the impact of health factors are largely consistent with other populations. To extend beyond the common variants of concern in the Spike protein analyzed by previous studies, we used a viral burden approach and detected a protective effect of low-frequency missense viral mutations in the RNA-dependent RNA polymerase (Pol) gene on disease severity. Our results contribute to the survey of severe COVID-19 in diverse populations and highlight the benefits of studying rare viral mutations.
RESUMEN
The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship of causal disease effect sizes between proximal SNPs, which have largely been assumed to be independent. We introduce a new method, LD SNP-pair effect correlation regression (LDSPEC), to estimate the correlation of causal disease effect sizes of derived alleles between proximal SNPs, depending on their allele frequencies, LD, and functional annotations; LDSPEC produced robust estimates in simulations across various genetic architectures. We applied LDSPEC to 70 diseases and complex traits from the UK Biobank (average N=306K), meta-analyzing results across diseases/traits. We detected significantly nonzero effect correlations for proximal SNP pairs (e.g., -0.37±0.09 for low-frequency positive-LD 0-100bp SNP pairs) that decayed with distance (e.g., -0.07±0.01 for low-frequency positive-LD 1-10kb), varied with allele frequency (e.g., -0.15±0.04 for common positive-LD 0-100bp), and varied with LD between SNPs (e.g., +0.12±0.05 for common negative-LD 0-100bp) (because we consider derived alleles, positive-LD and negative-LD SNP pairs may yield very different results). We further determined that SNP pairs with shared functions had stronger effect correlations that spanned longer genomic distances, e.g., -0.37±0.08 for low-frequency positive-LD same-gene promoter SNP pairs (average genomic distance of 47kb (due to alternative splicing)) and -0.32±0.04 for low-frequency positive-LD H3K27ac 0-1kb SNP pairs. Consequently, SNP-heritability estimates were substantially smaller than estimates of the sum of causal effect size variances across all SNPs (ratio of 0.87±0.02 across diseases/traits), particularly for certain functional annotations (e.g., 0.78±0.01 for common Super enhancer SNPs)-even though these quantities are widely assumed to be equal. We recapitulated our findings via forward simulations with an evolutionary model involving stabilizing selection, implicating the action of linkage masking, whereby haplotypes containing linked SNPs with opposite effects on disease have reduced effects on fitness and escape negative selection.
RESUMEN
The genetic architecture of human diseases and complex traits has been extensively studied, but little is known about the relationship of causal disease effect sizes between proximal SNPs, which have largely been assumed to be independent. We introduce a new method, LD SNP-pair effect correlation regression (LDSPEC), to estimate the correlation of causal disease effect sizes of derived alleles between proximal SNPs, depending on their allele frequencies, LD, and functional annotations; LDSPEC produced robust estimates in simulations across various genetic architectures. We applied LDSPEC to 70 diseases and complex traits from the UK Biobank (average N=306K), meta-analyzing results across diseases/traits. We detected significantly nonzero effect correlations for proximal SNP pairs (e.g., -0.37±0.09 for low-frequency positive-LD 0-100bp SNP pairs) that decayed with distance (e.g., -0.07±0.01 for low-frequency positive-LD 1-10kb), varied with allele frequency (e.g., -0.15±0.04 for common positive-LD 0-100bp), and varied with LD between SNPs (e.g., +0.12±0.05 for common negative-LD 0-100bp) (because we consider derived alleles, positive-LD and negative-LD SNP pairs may yield very different results). We further determined that SNP pairs with shared functions had stronger effect correlations that spanned longer genomic distances, e.g., -0.37±0.08 for low-frequency positive-LD same-gene promoter SNP pairs (average genomic distance of 47kb (due to alternative splicing)) and -0.32±0.04 for low-frequency positive-LD H3K27ac 0-1kb SNP pairs. Consequently, SNP-heritability estimates were substantially smaller than estimates of the sum of causal effect size variances across all SNPs (ratio of 0.87±0.02 across diseases/traits), particularly for certain functional annotations (e.g., 0.78±0.01 for common Super enhancer SNPs)-even though these quantities are widely assumed to be equal. We recapitulated our findings via forward simulations with an evolutionary model involving stabilizing selection, implicating the action of linkage masking, whereby haplotypes containing linked SNPs with opposite effects on disease have reduced effects on fitness and escape negative selection.
RESUMEN
Numerous studies have found evidence that GWAS loci experience negative selection, which increases in intensity with the effect size of identified variants. However, there is also accumulating evidence that this selection is not entirely mediated by the focal trait and contains a substantial pleiotropic component. Understanding how selective constraint shapes phenotypic variation requires advancing models capable of balancing these and other components of selection, as well as empirical analyses capable of inferring this balance and how it is generated by the underlying biology. We first review the classic theory connecting phenotypic selection to selection at individual loci as well as approaches and findings from recent analyses of negative selection in GWAS data. We then discuss geometric theories of pleiotropic selection with the potential to guide future modeling efforts. Recent findings revealing the nature of pleiotropic genetic variation provide clues to which genetic relationships are important and should be incorporated into analyses of selection, while findings that effect sizes vary between populations indicate that GWAS measurements could be misleading if effect sizes have also changed throughout human history.
RESUMEN
Many diseases exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting trans-ethnic polygenic risk prediction. We develop a new method, S-LDXR, for stratifying squared trans-ethnic genetic correlation across genomic annotations, and apply S-LDXR to genome-wide summary statistics for 31 diseases and complex traits in East Asians (average N = 90K) and Europeans (average N = 267K) with an average trans-ethnic genetic correlation of 0.85. We determine that squared trans-ethnic genetic correlation is 0.82× (s.e. 0.01) depleted in the top quintile of background selection statistic, implying more population-specific causal effect sizes. Accordingly, causal effect sizes are more population-specific in functionally important regions, including conserved and regulatory regions. In regions surrounding specifically expressed genes, causal effect sizes are most population-specific for skin and immune genes, and least population-specific for brain genes. Our results could potentially be explained by stronger gene-environment interaction at loci impacted by selection, particularly positive selection.
Asunto(s)
Genética de Población/métodos , Estudio de Asociación del Genoma Completo/métodos , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple , Selección Genética , Algoritmos , Pueblo Asiatico/genética , Genómica/métodos , Haplotipos/genética , Humanos , Modelos Genéticos , Población Blanca/genéticaRESUMEN
Neutral models for quantitative trait evolution are useful for identifying phenotypes under selection. These models often assume normally distributed phenotypes. This assumption may be violated when a trait is affected by relatively few variants or when the effects of those variants arise from skewed or heavy tailed distributions. Molecular phenotypes such as gene expression levels may have these properties. To accommodate deviations from normality, models making fewer assumptions about the underlying genetics and patterns of variation are needed. Here, we develop a general neutral model for quantitative trait variation using a coalescent approach. This model allows interpretation of trait distributions in terms of familiar population genetic parameters because it is based on the coalescent. We show how the normal distribution resulting from the infinitesimal limit, where the number of loci grows large as the effect size per mutation becomes small, depends only on expected pairwise coalescent times. We then demonstrate how deviations from normality depend on demography through the distribution of coalescence times as well as through genetic parameters. In particular, population growth events exacerbate deviations while bottlenecks reduce them. We demonstrate the practical applications of this model by showing how to sample from the neutral distribution of [Formula: see text], the ratio of the variance between subpopulations to that in the overall population. We further show it is likely impossible to distinguish sparsity from skewed or heavy tailed mutational effects using only sampled trait values. The model analyzed here greatly expands the parameter space for neutral trait models.
Asunto(s)
Demografía/estadística & datos numéricos , Modelos Genéticos , Población/genética , Carácter Cuantitativo Heredable , Evolución Molecular , HumanosRESUMEN
St. Louis encephalitis virus (SLEV; Flaviviridae; Flavivirus) is a member of the Japanese encephalitis serocomplex and a close relative of West Nile virus (WNV). Although SLEV remains endemic to the US, both levels of activity and geographical dispersal are relatively constrained when compared to the widespread distribution of WNV. In recent years, WNV appears to have displaced SLEV in California, yet both viruses currently coexist in Texas and several other states. It has become clear that viral swarm characterization is required if we are to fully evaluate the relationship between viral genomes, viral evolution, and epidemiology. Mutant swarm size and composition may be particularly important for arboviruses, which require replication not only in diverse tissues but also divergent hosts. In order to evaluate temporal, spatial, and host-specific patterns in the SLEV mutant swarm, we determined the size, composition, and phylogeny of the intrahost swarm within primary mosquito isolates from both Texas and California. Results indicate a general trend of decreasing intrahost diversity over time in both locations, with recent isolates being highly genetically homogeneous. Additionally, phylogenic analyses provide detailed information on the relatedness of minority variants both within and among strains and demonstrate how both geographic isolation and seasonal maintenance have shaped the viral swarm. Overall, these data generally provide insight into how time, space, and unique transmission cycles influence the SLEV mutant swarm and how understanding these processes can ultimately lead to a better understanding of arbovirus evolution and epidemiology.