Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Proc Natl Acad Sci U S A ; 119(48): e2209766119, 2022 11 29.
Artículo en Inglés | MEDLINE | ID: mdl-36417430

RESUMEN

There is massive variation in intron numbers across eukaryotic genomes, yet the major drivers of intron content during evolution remain elusive. Rapid intron loss and gain in some lineages contrast with long-term evolutionary stasis in others. Episodic intron gain could be explained by recently discovered specialized transposons called Introners, but so far Introners are only known from a handful of species. Here, we performed a systematic search across 3,325 eukaryotic genomes and identified 27,563 Introner-derived introns in 175 genomes (5.2%). Species with Introners span remarkable phylogenetic diversity, from animals to basal protists, representing lineages whose last common ancestor dates to over 1.7 billion years ago. Aquatic organisms were 6.5 times more likely to contain Introners than terrestrial organisms. Introners exhibit mechanistic diversity but most are consistent with DNA transposition, indicating that Introners have evolved convergently hundreds of times from nonautonomous transposable elements. Transposable elements and aquatic taxa are associated with high rates of horizontal gene transfer, suggesting that this combination of factors may explain the punctuated and biased diversity of species containing Introners. More generally, our data suggest that Introners may explain the episodic nature of intron gain across the eukaryotic tree of life. These results illuminate the major source of ongoing intron creation in eukaryotic genomes.


Asunto(s)
Elementos Transponibles de ADN , Eucariontes , Animales , Intrones/genética , Eucariontes/genética , Elementos Transponibles de ADN/genética , Filogenia , Células Eucariotas
2.
Mol Biol Evol ; 40(4)2023 04 04.
Artículo en Inglés | MEDLINE | ID: mdl-36947073

RESUMEN

The genomic landscape of transposable elements (TEs) varies dramatically across species, with some TEs demonstrating greater success in colonizing particular lineages than others. In mammals, long interspersed nuclear element (LINE) retrotransposons are typically more common than any other TE. Here, we report an unusual genomic landscape of TEs in the deer mouse, Peromyscus maniculatus. In contrast to other previously examined mammals, long terminal repeat elements occupy more of the deer mouse genome than LINEs (11% and 10%, respectively). This pattern reflects a combination of relatively low LINE activity and a massive invasion of lineage-specific endogenous retroviruses (ERVs). Deer mouse ERVs exhibit diverse origins spanning the retroviral phylogeny suggesting they have been host to a wide range of exogenous retroviruses. Notably, we trace the origin of one ERV lineage, which arose ∼5-18 million years ago, to a close relative of feline leukemia virus, revealing inter-ordinal horizontal transmission. Several lineage-specific ERV subfamilies have very high copy numbers, with the top five most abundant accounting for ∼2% of the genome. We also observe a massive amplification of Kruppel-associated box domain-containing zinc finger genes, which likely control ERV activity and whose expansion may have been facilitated by ectopic recombination between ERVs. Finally, we find evidence that ERVs directly impacted the evolutionary trajectory of LINEs by outcompeting them for genomic sites and frequently disrupting autonomous LINE copies. Together, our results illuminate the genomic ecology that shaped the unique deer mouse TE landscape, shedding light on the evolutionary processes that give rise to variation in mammalian genome structure.


Asunto(s)
Retrovirus Endógenos , Peromyscus , Animales , Gatos , Peromyscus/genética , Elementos Transponibles de ADN , Genómica , Retroelementos/genética , Retrovirus Endógenos/genética , Mamíferos/genética , Evolución Molecular , Filogenia
3.
PLoS Genet ; 16(11): e1009175, 2020 11.
Artículo en Inglés | MEDLINE | ID: mdl-33206635

RESUMEN

The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-or protocol-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse.


Asunto(s)
Genoma Viral/genética , Filogenia , SARS-CoV-2/genética , Algoritmos , COVID-19 , Biología Computacional , Evolución Molecular , Humanos , ARN Viral/genética , Alineación de Secuencia , Secuenciación Completa del Genoma
4.
Genome Biol ; 25(1): 4, 2024 Jan 02.
Artículo en Inglés | MEDLINE | ID: mdl-38166955

RESUMEN

Transposable elements (TEs) are important drivers of genome evolution. Nonetheless, TE annotation remains a complex and challenging task. As more genomes from phylogenetically diverse species are published, a comprehensive pipeline for accurate annotation of diverse TEs is increasingly important. Recently, (Ou et al. Genome Biol. 20:275, 2019) developed a new comprehensive pipeline, Extensive De novo Transposable element Annotator (EDTA), and benchmarked its performance on the genomes of three species: maize, wheat, and fruit fly. Because TE landscapes can vary tremendously across species, we tested EDTA's performance on four additional genomes with different TE landscapes: mouse, zebrafish, zebra finch, and chicken. Our analysis reveals that EDTA faces challenges with repeat classification in these genomes and underperforms overall relative to its benchmark dataset. Notably, EDTA consistently misclassifies nonLTR retrotransposons as DNA transposons, resulting in erroneous TE annotations for species with considerable repertoires of nonLTR retrotransposons. Overall, we set expectations for EDTA's performance on genomes spanning additional diversity, urge caution when using EDTA on genomes with divergent TE repertoires from the species on which it was initially benchmarked, and hope to motivate the development of methods that are robust to both the diversity of TEs and TE landscapes observed across species.


Asunto(s)
Benchmarking , Elementos Transponibles de ADN , Animales , Ratones , Retroelementos , Pez Cebra , Ácido Edético , Drosophila
5.
bioRxiv ; 2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38585780

RESUMEN

The evolutionary mechanisms that drive the emergence of genome architecture remain poorly understood but can now be assessed with unprecedented power due to the massive accumulation of genome assemblies spanning phylogenetic diversity1,2. Transposable elements (TEs) are a rich source of large-effect mutations since they directly and indirectly drive genomic structural variation and changes in gene expression3. Here, we demonstrate universal patterns of TE compartmentalization across eukaryotic genomes spanning ~1.7 billion years of evolution, in which TEs colocalize with gene families under strong predicted selective pressure for dynamic evolution and involved in specific functions. For non-pathogenic species these genes represent families involved in defense, sensory perception and environmental interaction, whereas for pathogenic species, TE-compartmentalized genes are highly enriched for pathogenic functions. Many TE-compartmentalized gene families display signatures of positive selection at the molecular level. Furthermore, TE-compartmentalized genes exhibit an excess of high-frequency alleles for polymorphic TE insertions in fruit fly populations. We postulate that these patterns reflect selection for adaptive TE insertions as well as TE-associated structural variants. This process may drive the emergence of a shared TE-compartmentalized genome architecture across diverse eukaryotic lineages.

6.
Curr Biol ; 33(1): 189-196.e4, 2023 01 09.
Artículo en Inglés | MEDLINE | ID: mdl-36543167

RESUMEN

Spliceosomal introns, which interrupt nuclear genes, are ubiquitous features of eukaryotic nuclear genes.1 Spliceosomal intron evolution is complex, with different lineages ranging from virtually zero to thousands of newly created introns.2,3,4,5 This punctate phylogenetic distribution could be explained if intron creation is driven by specialized transposable elements ("Introners"), with Introner-containing lineages undergoing frequent intron gain.6,7,8,9,10 Fragmentation of nuclear genes by spliceosomal introns reaches its apex in dinoflagellates, which have some twenty introns per gene11,12; however, little is known about dinoflagellate intron evolution. We reconstructed intron evolution in five dinoflagellate genomes, revealing a dynamic history of intron gain. We find evidence for historical creation of introns in all five species and identify recently active Introners in 4/5 studied species. In one species, Polarella glacialis, we find an unprecedented diversity of Introners, with recent Introner insertion leading to creation of some 12,253 introns, and with 15 separate families of Introners accounting for at least 100 introns each. These Introner families show diverse mechanisms of moblization and intron creation. Comparison within and between Introner families provides evidence that biases in the so-called intron phase, intron position relative to codon periodicity, could be driven by Introner insertion site requirements.9,13,14 Finally, we report additional transformations of the spliceosomal system in dinoflagellates, including widespread loss of ancestral introns, and novelties of tolerated and favored donor sequence motifs. These results reveal unappreciated diversity of intron-creating elements and spliceosomal evolutionary capacity and highlight the complex evolutionary dependencies shaping genome structures.


Asunto(s)
Elementos Transponibles de ADN , Dinoflagelados , Intrones/genética , Filogenia , Elementos Transponibles de ADN/genética , Dinoflagelados/genética , Evolución Molecular , Empalmosomas/genética
7.
BMC Res Notes ; 14(1): 189, 2021 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-34001211

RESUMEN

OBJECTIVE: The SARS-CoV-2 pandemic has prompted one of the most extensive and expeditious genomic sequencing efforts in history. Each viral genome is accompanied by a set of metadata which supplies important information such as the geographic origin of the sample, age of the host, and the lab at which the sample was sequenced, and is integral to epidemiological efforts and public health direction. Here, we interrogate some shortcomings of metadata within the GISAID database to raise awareness of common errors and inconsistencies that may affect data-driven analyses and provide possible avenues for resolutions. RESULTS: Our analysis reveals a startling prevalence of spelling errors and inconsistent naming conventions, which together occur in an estimated ~ 9.8% and ~ 11.6% of "originating lab" and "submitting lab" GISAID metadata entries respectively. We also find numerous ambiguous entries which provide very little information about the actual source of a sample and could easily associate with multiple sources worldwide. Importantly, all of these issues can impair the ability and accuracy of association studies by deceptively causing a group of samples to identify with multiple sources when they truly all identify with one source, or vice versa.


Asunto(s)
COVID-19 , SARS-CoV-2 , Genoma Viral/genética , Genómica , Humanos , Metadatos , Filogenia
8.
Nat Genet ; 53(6): 809-816, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33972780

RESUMEN

As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering in a new era of 'genomic contact tracing'-that is, using viral genomes to trace local transmission dynamics. However, because the viral phylogeny is already so large-and will undoubtedly grow many fold-placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach greatly improves the speed of phylogenetic placement of new samples and data visualization, making it possible to complete the placements under the constraints of real-time contact tracing. Thus, our method addresses an important need for maintaining a fully updated reference phylogeny. We make these tools available to the research community through the University of California Santa Cruz SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for SARS-CoV-2 specifically for laboratories worldwide.


Asunto(s)
COVID-19/epidemiología , COVID-19/virología , Biología Computacional/métodos , Filogenia , SARS-CoV-2/clasificación , SARS-CoV-2/genética , Programas Informáticos , Algoritmos , Biología Computacional/normas , Bases de Datos Genéticas , Genoma Viral , Humanos , Anotación de Secuencia Molecular , Mutación , Navegador Web
9.
Science ; 372(6542): 592-600, 2021 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-33958470

RESUMEN

The mammalian sex chromosome system (XX female/XY male) is ancient and highly conserved. The sex chromosome karyotype of the creeping vole (Microtus oregoni) represents a long-standing anomaly, with an X chromosome that is unpaired in females (X0) and exclusively maternally transmitted. We produced a highly contiguous male genome assembly, together with short-read genomes and transcriptomes for both sexes. We show that M. oregoni has lost an independently segregating Y chromosome and that the male-specific sex chromosome is a second X chromosome that is largely homologous to the maternally transmitted X. Both maternally inherited and male-specific sex chromosomes carry fragments of the ancestral Y chromosome. Consequences of this recently transformed sex chromosome system include Y-like degeneration and gene amplification on the male-specific X, expression of ancestral Y-linked genes in females, and X inactivation of the male-specific chromosome in male somatic cells. The genome of M. oregoni elucidates the processes that shape the gene content and dosage of mammalian sex chromosomes and exemplifies a rare case of plasticity in an ancient sex chromosome system.


Asunto(s)
Cariotipo Anormal , Arvicolinae/genética , Procesos de Determinación del Sexo/genética , Cromosoma X/genética , Animales , Secuencia de Bases , Femenino , Amplificación de Genes , Genes sry , Haplotipos , Masculino , Herencia Materna , Inactivación del Cromosoma X , Cromosoma Y/genética
10.
bioRxiv ; 2020 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-33024970

RESUMEN

As the SARS-CoV-2 virus spreads through human populations, the unprecedented accumulation of viral genome sequences is ushering a new era of "genomic contact tracing" - that is, using viral genome sequences to trace local transmission dynamics. However, because the viral phylogeny is already so large - and will undoubtedly grow many fold - placing new sequences onto the tree has emerged as a barrier to real-time genomic contact tracing. Here, we resolve this challenge by building an efficient, tree-based data structure encoding the inferred evolutionary history of the virus. We demonstrate that our approach improves the speed of phylogenetic placement of new samples and data visualization by orders of magnitude, making it possible to complete the placements under real-time constraints. Our method also provides the key ingredient for maintaining a fully-updated reference phylogeny. We make these tools available to the research community through the UCSC SARS-CoV-2 Genome Browser to enable rapid cross-referencing of information in new virus sequences with an ever-expanding array of molecular and structural biology data. The methods described here will empower research and genomic contact tracing for laboratories worldwide. SOFTWARE AVAILABILITY: USHER is available to users through the UCSC Genome Browser at https://genome.ucsc.edu/cgi-bin/hgPhyloPlace . The source code and detailed instructions on how to compile and run UShER are available from https://github.com/yatisht/usher .

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA