Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 82
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 626(7997): 119-127, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38200310

RESUMEN

The evolution of reproductive barriers is the first step in the formation of new species and can help us understand the diversification of life on Earth. These reproductive barriers often take the form of hybrid incompatibilities, in which alleles derived from two different species no longer interact properly in hybrids1-3. Theory predicts that hybrid incompatibilities may be more likely to arise at rapidly evolving genes4-6 and that incompatibilities involving multiple genes should be common7,8, but there has been sparse empirical data to evaluate these predictions. Here we describe a mitonuclear incompatibility involving three genes whose protein products are in physical contact within respiratory complex I of naturally hybridizing swordtail fish species. Individuals homozygous for mismatched protein combinations do not complete embryonic development or die as juveniles, whereas those heterozygous for the incompatibility have reduced complex I function and unbalanced representation of parental alleles in the mitochondrial proteome. We find that the effects of different genetic interactions on survival are non-additive, highlighting subtle complexity in the genetic architecture of hybrid incompatibilities. Finally, we document the evolutionary history of the genes involved, showing signals of accelerated evolution and evidence that an incompatibility has been transferred between species via hybridization.


Asunto(s)
Núcleo Celular , Complejo I de Transporte de Electrón , Peces , Genes Letales , Especiación Genética , Hibridación Genética , Proteínas Mitocondriales , Animales , Alelos , Complejo I de Transporte de Electrón/genética , Peces/clasificación , Peces/embriología , Peces/genética , Peces/crecimiento & desarrollo , Homocigoto , Genes Letales/genética , Especificidad de la Especie , Desarrollo Embrionario/genética , Proteínas Mitocondriales/genética , Núcleo Celular/genética , Heterocigoto , Evolución Molecular
2.
Nature ; 609(7929): 994-997, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35952714

RESUMEN

Accurate and timely detection of recombinant lineages is crucial for interpreting genetic variation, reconstructing epidemic spread, identifying selection and variants of interest, and accurately performing phylogenetic analyses1-4. During the SARS-CoV-2 pandemic, genomic data generation has exceeded the capacities of existing analysis platforms, thereby crippling real-time analysis of viral evolution5. Here, we use a new phylogenomic method to search a nearly comprehensive SARS-CoV-2 phylogeny for recombinant lineages. In a 1.6 million sample tree from May 2021, we identify 589 recombination events, which indicate that around 2.7% of sequenced SARS-CoV-2 genomes have detectable recombinant ancestry. Recombination breakpoints are inferred to occur disproportionately in the 3' portion of the genome that contains the spike protein. Our results highlight the need for timely analyses of recombination for pinpointing the emergence of recombinant lineages with the potential to increase transmissibility or virulence of the virus. We anticipate that this approach will empower comprehensive real-time tracking of viral recombination during the SARS-CoV-2 pandemic and beyond.


Asunto(s)
COVID-19 , Genoma Viral , Pandemias , Filogenia , Recombinación Genética , SARS-CoV-2 , COVID-19/epidemiología , COVID-19/transmisión , COVID-19/virología , Genoma Viral/genética , Humanos , Mutación , Recombinación Genética/genética , SARS-CoV-2/genética , SARS-CoV-2/patogenicidad , Selección Genética/genética , Glicoproteína de la Espiga del Coronavirus/genética , Virulencia/genética
3.
PLoS Genet ; 19(11): e1011062, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-38015992

RESUMEN

Admixture, the exchange of genetic information between distinct source populations, is thought to be a major source of adaptive genetic variation. Unlike mutation events, which periodically generate single alleles, admixture can introduce many selected alleles simultaneously. As such, the effects of linkage between selected alleles may be especially pronounced in admixed populations. However, existing tools for identifying selected mutations within admixed populations only account for selection at a single site, overlooking phenomena such as linkage among proximal selected alleles. Here, we develop and extensively validate a method for identifying and quantifying the individual effects of multiple linked selected sites on a chromosome in admixed populations. Our approach numerically calculates the expected local ancestry landscape in an admixed population for a given multi-locus selection model, and then maximizes the likelihood of the model. After applying this method to admixed populations of Drosophila melanogaster and Passer italiae, we found that the impacts between linked sites may be an important contributor to natural selection in admixed populations. Furthermore, for the situations we considered, the selection coefficients and number of selected sites are overestimated in analyses that do not consider the effects of linkage among selected sites. Our results imply that linkage among selected sites may be an important evolutionary force in admixed populations. This tool provides a powerful generalized method to investigate these crucial phenomena in diverse populations.


Asunto(s)
Drosophila melanogaster , Genética de Población , Animales , Drosophila melanogaster/genética , Selección Genética
4.
Proc Natl Acad Sci U S A ; 120(33): e2301411120, 2023 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-37552755

RESUMEN

The acquisition of novel sexually dimorphic traits poses an evolutionary puzzle: How do new traits arise and become sex-limited? Recently acquired color vision, sexually dimorphic in animals like primates and butterflies, presents a compelling model for understanding how traits become sex-biased. For example, some Heliconius butterflies uniquely possess UV (ultraviolet) color vision, which correlates with the expression of two differentially tuned UV-sensitive rhodopsins, UVRh1 and UVRh2. To discover how such traits become sexually dimorphic, we studied Heliconius charithonia, which exhibits female-specific UVRh1 expression. We demonstrate that females, but not males, discriminate different UV wavelengths. Through whole-genome shotgun sequencing and assembly of the H. charithonia genome, we discovered that UVRh1 is present on the W chromosome, making it obligately female-specific. By knocking out UVRh1, we show that UVRh1 protein expression is absent in mutant female eye tissue, as in wild-type male eyes. A PCR survey of UVRh1 sex-linkage across the genus shows that species with female-specific UVRh1 expression lack UVRh1 gDNA in males. Thus, acquisition of sex linkage is sufficient to achieve female-specific expression of UVRh1, though this does not preclude other mechanisms, like cis-regulatory evolution from also contributing. Moreover, both this event, and mutations leading to differential UV opsin sensitivity, occurred early in the history of Heliconius. These results suggest a path for acquiring sexual dimorphism distinct from existing mechanistic models. We propose a model where gene traffic to heterosomes (the W or the Y) genetically partitions a trait by sex before a phenotype shifts (spectral tuning of UV sensitivity).


Asunto(s)
Mariposas Diurnas , Visión de Colores , Animales , Femenino , Visión de Colores/genética , Mariposas Diurnas/genética , Mariposas Diurnas/metabolismo , Ojo/metabolismo , Opsinas/genética , Opsinas/metabolismo , Rodopsina/metabolismo
5.
Genome Res ; 32(11-12): 2092-2106, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36351772

RESUMEN

High-throughput short-read sequencing has taken on a central role in research and diagnostics. Hundreds of different assays take advantage of Illumina short-read sequencers, the predominant short-read sequencing technology available today. Although other short-read sequencing technologies exist, the ubiquity of Illumina sequencers in sequencing core facilities and the high capital costs of these technologies have limited their adoption. Among a new generation of sequencing technologies, Oxford Nanopore Technologies (ONT) holds a unique position because the ONT MinION, an error-prone long-read sequencer, is associated with little to no capital cost. Here we show that we can make short-read Illumina libraries compatible with the ONT MinION by using the rolling circle to concatemeric consensus (R2C2) method to circularize and amplify the short library molecules. This results in longer DNA molecules containing tandem repeats of the original short library molecules. This longer DNA is ideally suited for the ONT MinION, and after sequencing, the tandem repeats in the resulting raw reads can be converted into high-accuracy consensus reads with similar error rates to that of the Illumina MiSeq. We highlight this capability by producing and benchmarking RNA-seq, ChIP-seq, and regular and target-enriched Tn5 libraries. We also explore the use of this approach for rapid evaluation of sequencing library metrics by implementing a real-time analysis workflow.


Asunto(s)
Nanoporos , Análisis de Secuencia de ADN/métodos , Biblioteca de Genes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación de Inmunoprecipitación de Cromatina
6.
Proc Natl Acad Sci U S A ; 119(14): e2115608119, 2022 04 05.
Artículo en Inglés | MEDLINE | ID: mdl-35349333

RESUMEN

SignificanceIn marine ecosystems, transmission of microbial symbionts between host generations occurs predominantly through the environment. Yet, it remains largely unknown how host genetics, symbiont competition, environmental conditions, and geography shape the composition of symbionts acquired by individual hosts. To address this question, we applied population genomic approaches to four species of deep-sea hydrothermal vent snails that live in association with chemosynthetic bacteria. Our analyses show that environment is more important to strain-level symbiont composition than host genetics and that symbiont strains show genetic variation indicative of adaptation to the distinct geochemical conditions at each vent site. This corroborates a long-standing hypothesis that hydrothermal vent invertebrates affiliate with locally adapted symbiont strains to cope with the variable conditions characterizing their habitats.


Asunto(s)
Respiraderos Hidrotermales , Bacterias/genética , Ecosistema , Respiraderos Hidrotermales/microbiología , Metagenómica , Simbiosis/genética
7.
Proc Natl Acad Sci U S A ; 119(48): e2209766119, 2022 11 29.
Artículo en Inglés | MEDLINE | ID: mdl-36417430

RESUMEN

There is massive variation in intron numbers across eukaryotic genomes, yet the major drivers of intron content during evolution remain elusive. Rapid intron loss and gain in some lineages contrast with long-term evolutionary stasis in others. Episodic intron gain could be explained by recently discovered specialized transposons called Introners, but so far Introners are only known from a handful of species. Here, we performed a systematic search across 3,325 eukaryotic genomes and identified 27,563 Introner-derived introns in 175 genomes (5.2%). Species with Introners span remarkable phylogenetic diversity, from animals to basal protists, representing lineages whose last common ancestor dates to over 1.7 billion years ago. Aquatic organisms were 6.5 times more likely to contain Introners than terrestrial organisms. Introners exhibit mechanistic diversity but most are consistent with DNA transposition, indicating that Introners have evolved convergently hundreds of times from nonautonomous transposable elements. Transposable elements and aquatic taxa are associated with high rates of horizontal gene transfer, suggesting that this combination of factors may explain the punctuated and biased diversity of species containing Introners. More generally, our data suggest that Introners may explain the episodic nature of intron gain across the eukaryotic tree of life. These results illuminate the major source of ongoing intron creation in eukaryotic genomes.


Asunto(s)
Elementos Transponibles de ADN , Eucariontes , Animales , Intrones/genética , Eucariontes/genética , Elementos Transponibles de ADN/genética , Filogenia , Células Eucariotas
8.
Proc Natl Acad Sci U S A ; 119(19): e2119382119, 2022 05 10.
Artículo en Inglés | MEDLINE | ID: mdl-35512091

RESUMEN

Sex chromosomes play a special role in the evolution of reproductive barriers between species. Here we describe conflicting roles of nascent sex chromosomes on patterns of introgression in an experimental hybrid swarm. Drosophila nasuta and Drosophila albomicans are recently diverged, fully fertile sister species that have different sex chromosome systems. The fusion between an autosome (Muller CD) with the ancestral X and Y gave rise to neo-sex chromosomes in D. albomicans, while Muller CD remains unfused in D. nasuta. We found that a large block containing overlapping inversions on the neo-sex chromosome stood out as the strongest barrier to introgression. Intriguingly, the neo-sex chromosome introgression barrier is asymmetrical and sex-dependent. Female hybrids showed significant D. albomicans­biased introgression on Muller CD (neo-X excess), while males showed heterosis with excessive (neo-X, D. nasuta Muller CD) genotypes. We used a population genetic model to dissect the interplay of sex chromosome drive, heterospecific pairing incompatibility between the neo-sex chromosomes and unfused Muller CD, neo-Y disadvantage, and neo-X advantage in generating the observed sex chromosome genotypes in females and males. We show that moderate neo-Y disadvantage and D. albomicans specific meiotic drive are required to observe female-specific D. albomicans­biased introgression in this system, together with pairing incompatibility and neo-X advantage. In conclusion, this hybrid swarm between a young species pair sheds light onto the multifaceted roles of neo-sex chromosomes in a sex-dependent asymmetrical introgression barrier at a species boundary.


Asunto(s)
Cromosomas Sexuales , Cromosoma Y , Animales , Drosophila/genética , Evolución Molecular , Cromosomas Sexuales/genética
9.
Bioinformatics ; 39(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-36453872

RESUMEN

SUMMARY: Treenome Browser is a web browser tool to interactively visualize millions of genomes alongside huge phylogenetic trees. AVAILABILITY AND IMPLEMENTATION: Treenome Browser for SARS-CoV-2 can be accessed at cov2tree.org, or at taxonium.org for user-provided trees. Source code and documentation are available at github.com/theosanderson/taxonium and docs.taxonium.org/en/latest/treenome.html. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
COVID-19 , Genómica , Humanos , Filogenia , SARS-CoV-2/genética , Genoma , Programas Informáticos
10.
Mol Ecol ; : e17362, 2024 Apr 29.
Artículo en Inglés | MEDLINE | ID: mdl-38682494

RESUMEN

The black abalone, Haliotis cracherodii, is a large, long-lived marine mollusc that inhabits rocky intertidal habitats along the coast of California and Mexico. In 1985, populations were impacted by a bacterial disease known as withering syndrome (WS) that wiped out >90% of individuals, leading to the closure of all U.S. black abalone fisheries since 1993. Current conservation strategies include restoring diminished populations by translocating healthy individuals. However, population collapse on this scale may have dramatically lowered genetic diversity and strengthened geographic differentiation, making translocation-based recovery contentious. Additionally, the current prevalence of WS remains unknown. To address these uncertainties, we sequenced and analysed the genomes of 133 black abalone individuals from across their present range. We observed no spatial genetic structure among black abalone, with the exception of a single chromosomal inversion that increases in frequency with latitude. Outside the inversion, genetic differentiation between sites is minimal and does not scale with either geographic distance or environmental dissimilarity. Genetic diversity appears uniformly high across the range. Demographic inference does indicate a severe population bottleneck beginning just 15 generations in the past, but this decline is short lived, with present-day size far exceeding the pre-bottleneck status quo. Finally, we find the bacterial agent of WS is equally present across the sampled range, but only in 10% of individuals. The lack of population genetic structure, uniform diversity and prevalence of WS bacteria indicates that translocation could be a valid and low-risk means of population restoration for black abalone species' recovery.

11.
Syst Biol ; 72(5): 1039-1051, 2023 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-37232476

RESUMEN

Phylogenetics has been foundational to SARS-CoV-2 research and public health policy, assisting in genomic surveillance, contact tracing, and assessing emergence and spread of new variants. However, phylogenetic analyses of SARS-CoV-2 have often relied on tools designed for de novo phylogenetic inference, in which all data are collected before any analysis is performed and the phylogeny is inferred once from scratch. SARS-CoV-2 data sets do not fit this mold. There are currently over 14 million sequenced SARS-CoV-2 genomes in online databases, with tens of thousands of new genomes added every day. Continuous data collection, combined with the public health relevance of SARS-CoV-2, invites an "online" approach to phylogenetics, in which new samples are added to existing phylogenetic trees every day. The extremely dense sampling of SARS-CoV-2 genomes also invites a comparison between likelihood and parsimony approaches to phylogenetic inference. Maximum likelihood (ML) and pseudo-ML methods may be more accurate when there are multiple changes at a single site on a single branch, but this accuracy comes at a large computational cost, and the dense sampling of SARS-CoV-2 genomes means that these instances will be extremely rare because each internal branch is expected to be extremely short. Therefore, it may be that approaches based on maximum parsimony (MP) are sufficiently accurate for reconstructing phylogenies of SARS-CoV-2, and their simplicity means that they can be applied to much larger data sets. Here, we evaluate the performance of de novo and online phylogenetic approaches, as well as ML, pseudo-ML, and MP frameworks for inferring large and dense SARS-CoV-2 phylogenies. Overall, we find that online phylogenetics produces similar phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP optimization with UShER and matOptimize produces equivalent SARS-CoV-2 phylogenies to some of the most popular ML and pseudo-ML inference tools. MP optimization with UShER and matOptimize is thousands of times faster than presently available implementations of ML and online phylogenetics is faster than de novo inference. Our results therefore suggest that parsimony-based methods like UShER and matOptimize represent an accurate and more practical alternative to established ML implementations for large SARS-CoV-2 phylogenies and could be successfully applied to other similar data sets with particularly dense sampling and short branch lengths.


Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , Filogenia , Probabilidad , Genómica
12.
Genome Res ; 30(1): 85-94, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31857444

RESUMEN

Transfer RNA (tRNA) genes are among the most highly transcribed genes in the genome owing to their central role in protein synthesis. However, there is evidence for a broad range of gene expression across tRNA loci. This complexity, combined with difficulty in measuring transcript abundance and high sequence identity across transcripts, has severely limited our collective understanding of tRNA gene expression regulation and evolution. We establish sequence-based correlates to tRNA gene expression and develop a tRNA gene classification method that does not require, but benefits from, comparative genomic information and achieves accuracy comparable to molecular assays. We observe that guanine + cytosine (G + C) content and CpG density surrounding tRNA loci is exceptionally well correlated with tRNA gene activity, supporting a prominent regulatory role of the local genomic context in combination with internal sequence features. We use our tRNA gene activity predictions in conjunction with a comprehensive tRNA gene ortholog set spanning 29 placental mammals to estimate the evolutionary rate of functional changes among orthologs. Our method adds a new dimension to large-scale tRNA functional prediction and will help prioritize characterization of functional tRNA variants. Its simplicity and robustness should enable development of similar approaches for other clades, as well as exploration of functional diversification of members of large gene families.


Asunto(s)
Genoma , Genómica , ARN de Transferencia , Animales , Biología Computacional/métodos , Islas de CpG , Metilación de ADN , Epigénesis Genética , Epigenómica/métodos , Genómica/métodos , Mamíferos , Ratones , Filogenia , ARN de Transferencia/genética
13.
Bioinformatics ; 38(15): 3734-3740, 2022 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-35731204

RESUMEN

MOTIVATION: Phylogenetic tree optimization is necessary for precise analysis of evolutionary and transmission dynamics, but existing tools are inadequate for handling the scale and pace of data produced during the coronavirus disease 2019 (COVID-19) pandemic. One transformative approach, online phylogenetics, aims to incrementally add samples to an ever-growing phylogeny, but there are no previously existing approaches that can efficiently optimize this vast phylogeny under the time constraints of the pandemic. RESULTS: Here, we present matOptimize, a fast and memory-efficient phylogenetic tree optimization tool based on parsimony that can be parallelized across multiple CPU threads and nodes, and provides orders of magnitude improvement in runtime and peak memory usage compared to existing state-of-the-art methods. We have developed this method particularly to address the pressing need during the COVID-19 pandemic for daily maintenance and optimization of a comprehensive SARS-CoV-2 phylogeny. matOptimize is currently helping refine on a daily basis possibly the largest-ever phylogenetic tree, containing millions of SARS-CoV-2 sequences. AVAILABILITY AND IMPLEMENTATION: The matOptimize code is freely available as part of the UShER package (https://github.com/yatisht/usher) and can also be installed via bioconda (https://bioconda.github.io/recipes/usher/README.html). All scripts we used to perform the experiments in this manuscript are available at https://github.com/yceh/matOptimize-experiments. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
COVID-19 , SARS-CoV-2 , Humanos , Filogenia , SARS-CoV-2/genética , Pandemias , Programas Informáticos
14.
PLoS Comput Biol ; 18(4): e1010056, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35486906

RESUMEN

Sequence simulators are fundamental tools in bioinformatics, as they allow us to test data processing and inference tools, and are an essential component of some inference methods. The ongoing surge in available sequence data is however testing the limits of our bioinformatics software. One example is the large number of SARS-CoV-2 genomes available, which are beyond the processing power of many methods, and simulating such large datasets is also proving difficult. Here, we present a new algorithm and software for efficiently simulating sequence evolution along extremely large trees (e.g. > 100, 000 tips) when the branches of the tree are short, as is typical in genomic epidemiology. Our algorithm is based on the Gillespie approach, and it implements an efficient multi-layered search tree structure that provides high computational efficiency by taking advantage of the fact that only a small proportion of the genome is likely to mutate at each branch of the considered phylogeny. Our open source software allows easy integration with other Python packages as well as a variety of evolutionary models, including indel models and new hypermutability models that we developed to more realistically represent SARS-CoV-2 genome evolution.


Asunto(s)
COVID-19 , Pandemias , Algoritmos , COVID-19/epidemiología , Simulación por Computador , Evolución Molecular , Humanos , Filogenia , SARS-CoV-2/genética , Programas Informáticos
15.
PLoS Comput Biol ; 18(8): e1010409, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-36001646

RESUMEN

Accurate simulation of complex biological processes is an essential component of developing and validating new technologies and inference approaches. As an effort to help contain the COVID-19 pandemic, large numbers of SARS-CoV-2 genomes have been sequenced from most regions in the world. More than 5.5 million viral sequences are publicly available as of November 2021. Many studies estimate viral genealogies from these sequences, as these can provide valuable information about the spread of the pandemic across time and space. Additionally such data are a rich source of information about molecular evolutionary processes including natural selection, for example allowing the identification of new variants with transmissibility and immunity evasion advantages. To our knowledge, there is no framework that is both efficient and flexible enough to simulate the pandemic to approximate world-scale scenarios and generate viral genealogies of millions of samples. Here, we introduce a new fast simulator VGsim which addresses the problem of simulation genealogies under epidemiological models. The simulation process is split into two phases. During the forward run the algorithm generates a chain of population-level events reflecting the dynamics of the pandemic using an hierarchical version of the Gillespie algorithm. During the backward run a coalescent-like approach generates a tree genealogy of samples conditioning on the population-level events chain generated during the forward run. Our software can model complex population structure, epistasis and immunity escape.


Asunto(s)
COVID-19 , Pandemias , COVID-19/epidemiología , Simulación por Computador , Humanos , SARS-CoV-2/genética , Programas Informáticos
16.
PLoS Genet ; 16(8): e1008935, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-32841233

RESUMEN

Bacterial symbionts bring a wealth of functions to the associations they participate in, but by doing so, they endanger the genes and genomes underlying these abilities. When bacterial symbionts become obligately associated with their hosts, their genomes are thought to decay towards an organelle-like fate due to decreased homologous recombination and inefficient selection. However, numerous associations exist that counter these expectations, especially in marine environments, possibly due to ongoing horizontal gene flow. Despite extensive theoretical treatment, no empirical study thus far has connected these underlying population genetic processes with long-term evolutionary outcomes. By sampling marine chemosynthetic bacterial-bivalve endosymbioses that range from primarily vertical to strictly horizontal transmission, we tested this canonical theory. We found that transmission mode strongly predicts homologous recombination rates, and that exceedingly low recombination rates are associated with moderate genome degradation in the marine symbionts with nearly strict vertical transmission. Nonetheless, even the most degraded marine endosymbiont genomes are occasionally horizontally transmitted and are much larger than their terrestrial insect symbiont counterparts. Therefore, horizontal transmission and recombination enable efficient natural selection to maintain intermediate symbiont genome sizes and substantial functional genetic variation.


Asunto(s)
Bacterias/patogenicidad , Bivalvos/microbiología , Transferencia de Gen Horizontal , Genoma Bacteriano , Recombinación Genética , Simbiosis/genética , Animales , Bacterias/genética , Bivalvos/genética , Evolución Molecular , Variación Genética
17.
PLoS Genet ; 16(11): e1009175, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-33206635

RESUMEN

The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-or protocol-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse.


Asunto(s)
Genoma Viral/genética , Filogenia , SARS-CoV-2/genética , Algoritmos , COVID-19 , Biología Computacional , Evolución Molecular , Humanos , ARN Viral/genética , Alineación de Secuencia , Secuenciación Completa del Genoma
18.
Mol Biol Evol ; 38(5): 2152-2165, 2021 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-33502512

RESUMEN

Adaptive introgression-the flow of adaptive genetic variation between species or populations-has attracted significant interest in recent years and it has been implicated in a number of cases of adaptation, from pesticide resistance and immunity, to local adaptation. Despite this, methods for identification of adaptive introgression from population genomic data are lacking. Here, we present Ancestry_HMM-S, a hidden Markov model-based method for identifying genes undergoing adaptive introgression and quantifying the strength of selection acting on them. Through extensive validation, we show that this method performs well on moderately sized data sets for realistic population and selection parameters. We apply Ancestry_HMM-S to a data set of an admixed Drosophila melanogaster population from South Africa and we identify 17 loci which show signatures of adaptive introgression, four of which have previously been shown to confer resistance to insecticides. Ancestry_HMM-S provides a powerful method for inferring adaptive introgression in data sets that are typically collected when studying admixed populations. This method will enable powerful insights into the genetic consequences of admixture across diverse populations. Ancestry_HMM-S can be downloaded from https://github.com/jesvedberg/Ancestry_HMM-S/.


Asunto(s)
Adaptación Biológica/genética , Introgresión Genética , Modelos Genéticos , Selección Genética , Programas Informáticos , Algoritmos , Animales , Drosophila melanogaster/genética , Cadenas de Markov
19.
Mol Biol Evol ; 38(12): 5819-5824, 2021 12 09.
Artículo en Inglés | MEDLINE | ID: mdl-34469548

RESUMEN

The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils-a command-line utility for rapidly querying, interpreting, and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively.


Asunto(s)
Evolución Molecular , Filogenia , SARS-CoV-2 , COVID-19/virología , Humanos , Mutación , SARS-CoV-2/genética , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA