RESUMEN
Accurate species phylogenies are a prerequisite for all evolutionary research. Teleosts are the largest and most diversified group of extant vertebrates, but relationships among their three oldest extant lineages remain unresolved. On the basis of seven high-quality new genome assemblies in Elopomorpha (tarpons, eels), we revisited the topology of the deepest branches of the teleost phylogeny using independent gene sequence and chromosomal rearrangement phylogenomic approaches. These analyses converged to a single scenario that unambiguously places the Elopomorpha and Osteoglossomorpha (arapaima, elephantnose fish) in a monophyletic sister group to all other teleosts, i.e., the Clupeocephala lineage (zebrafish, medaka). This finding resolves more than 50 years of controversy on the evolutionary relationships of these lineages and highlights the power of combining different levels of genome-wide information to solve complex phylogenies.
Asunto(s)
Evolución Biológica , Peces , Animales , Anguilas/clasificación , Anguilas/genética , Peces/clasificación , Peces/genética , Genoma , Filogenia , Pez Cebra/clasificación , Pez Cebra/genéticaRESUMEN
Ancestral sequence reconstruction is a fundamental aspect of molecular evolution studies and can trace small-scale sequence modifications through the evolution of genomes and species. In contrast, fine-grained reconstructions of ancestral genome organizations are still in their infancy, limiting our ability to draw comprehensive views of genome and karyotype evolution. Here we reconstruct the detailed gene contents and organizations of 624 ancestral vertebrate, plant, fungi, metazoan and protist genomes, 183 of which are near-complete chromosomal gene order reconstructions. Reconstructed ancestral genomes are similar to their descendants in terms of gene content as expected and agree precisely with reference cytogenetic and in silico reconstructions when available. By comparing successive ancestral genomes along the phylogenetic tree, we estimate the intra- and interchromosomal rearrangement history of all major vertebrate clades at high resolution. This freely available resource introduces the possibility to follow evolutionary processes at genomic scales in chronological order, across multiple clades and without relying on a single extant species as reference.
Asunto(s)
Eucariontes , Genoma , Animales , Eucariontes/genética , Filogenia , Cromosomas , GenómicaRESUMEN
Teleost fishes are ancient tetraploids descended from an ancestral whole-genome duplication that may have contributed to the impressive diversification of this clade. Whole-genome duplications can occur via self-doubling (autopolyploidy) or via hybridization between different species (allopolyploidy). The mode of tetraploidization conditions evolutionary processes by which duplicated genomes return to diploid meiotic pairing, and subsequent genetic divergence of duplicated genes (cytological and genetic rediploidization). How teleosts became tetraploid remains unresolved, leaving a fundamental gap in the interpretation of their functional evolution. As a result of the whole-genome duplication, identifying orthologous and paralogous genomic regions across teleosts is challenging, hindering genome-wide investigations into their polyploid history. Here, we combine tailored gene phylogeny methodology together with a state-of-the-art ancestral karyotype reconstruction to establish the first high-resolution comparative atlas of paleopolyploid regions across 74 teleost genomes. We then leverage this atlas to investigate how rediploidization occurred in teleosts at the genome-wide level. We uncover that some duplicated regions maintained tetraploidy for more than 60 million years, with three chromosome pairs diverging genetically only after the separation of major teleost families. This evidence suggests that the teleost ancestor was an autopolyploid. Further, we find evidence for biased gene retention along several duplicated chromosomes, contradicting current paradigms that asymmetrical evolution is specific to allopolyploids. Altogether, our results offer novel insights into genome evolutionary dynamics following ancient polyploidizations in vertebrates.
RESUMEN
Whole genome sequencing is increasingly used to diagnose medical conditions of genetic origin. While both coding and non-coding DNA variants contribute to a wide range of diseases, most patients who receive a WGS-based diagnosis today harbour a protein-coding mutation. Functional interpretation and prioritization of non-coding variants represents a persistent challenge, and disease-causing non-coding variants remain largely unidentified. Depending on the disease, WGS fails to identify a candidate variant in 20-80% of patients, severely limiting the usefulness of sequencing for personalised medicine. Here we present FINSURF, a machine-learning approach to predict the functional impact of non-coding variants in regulatory regions. FINSURF outperforms state-of-the-art methods, owing in particular to optimized control variants selection during training. In addition to ranking candidate variants, FINSURF breaks down the score for each variant into contributions from individual annotations, facilitating the evaluation of their functional relevance. We applied FINSURF to a diverse set of 30 diseases with described causative non-coding mutations, and correctly identified the disease-causative non-coding variant within the ten top hits in 22 cases. FINSURF is implemented as an online server to as well as custom browser tracks, and provides a quick and efficient solution to prioritize candidate non-coding variants in realistic clinical settings.
Asunto(s)
Aprendizaje Automático , Programas Informáticos , Humanos , Mutación , Secuenciación Completa del GenomaRESUMEN
Genomicus is a database and web-server dedicated to comparative genomics in eukaryotes. Its main functionality is to graphically represent the conservation of genomic blocks between multiple genomes, locally around a specific gene of interest or genome-wide through karyotype comparisons. Since 2010 and its first release, Genomicus has synchronized with 60 Ensembl releases and seen the addition of functions that have expanded the type of analyses that users can perform. Today, five public instances of Genomicus are supporting a total number of 1029 extant genomes and 621 ancestral reconstructions from all eukaryotes kingdoms available in Ensembl and Ensembl Genomes databases complemented with four additional instances specific to taxonomic groups of interest. New visualization and query tools are described in this manuscript. Genomicus is freely available at http://www.genomicus.bio.ens.psl.eu/genomicus.
Asunto(s)
Bases de Datos Genéticas , Eucariontes/genética , Evolución Molecular , Genoma/genética , Eucariontes/clasificación , Genómica , Humanos , Internet , Filogenia , Programas Informáticos , Sintenía/genéticaRESUMEN
Haitian (HA) and African American (AA) men have the highest prostate cancer (PCa) and colorectal cancer (CRC) age-adjusted mortality rates compared with other racial/ethnic groups worldwide. One contributing factor to mortality differences is that a low percentage of age-eligible HA and AA men screen for PCa and CRC, even when healthcare access and insurance are available. Reasons for cancer screening disparities may be differences in knowledge, preferences and willingness in HA and AA men. However, limited information exists on whether HA and AA men are knowledgeable about and are willing to be screened for PCa and CRC. Moreover, understanding preferences and willingness of HA and AA men to use cancer screening tests completed at home is of paramount importance given the current pandemic. We used a cross-sectional study design to assess HA and AA men's knowledge, preferences and willingness to use at-home PCa and CRC screening tests. Survey items were developed from existing surveys assessing CRC knowledge and willingness to screen. Institutional Review Board approval was obtained to invite persons who identified as male, at least 18 years of age and Black (as either AA and/or HA) to complete the survey. A total of 36 Black men completed the survey; 42% self-identified as both 'African American' and 'Haitian' (AA/HA), 44% identified only as AA, and 14% identified only as HA. Regardless of race or ethnicity, 75% of all participants were 45 years or younger (range: 18-85). Although more than 80% of all participants heard about PCa and CRC, only 50% of participants aged at least 50 years old were screened for CRC. The majority of participants (AA/HA = 67%; HA = 80%; AA = 56%) were unaware of at-home CRC screening tests; however, 80% of AA/HA men and 60% of HA men were willing to use an at-home CRC screening test compared to 44% of AA men.
RESUMEN
RapGreen is a modular software package targeted at scientists handling large datasets for phylogenetic analysis. Its primary function is the graphical visualization and exploration of large trees. In addition, RapGreen offers a tree pattern search function to seek evolutionary scenarios among large collections of phylogenetic trees. Other functionalities include tree reconciliation with a given species tree: the detection of duplication or loss events during evolution and tree rooting. Last but not least, RapGreen features the ability to integrate heterogeneous data while visualizing and otherwise analyzing phylogenetic trees.
RESUMEN
The bowfin (Amia calva) is a ray-finned fish that possesses a unique suite of ancestral and derived phenotypes, which are key to understanding vertebrate evolution. The phylogenetic position of bowfin as a representative of neopterygian fishes, its archetypical body plan and its unduplicated and slowly evolving genome make bowfin a central species for the genomic exploration of ray-finned fishes. Here we present a chromosome-level genome assembly for bowfin that enables gene-order analyses, settling long-debated neopterygian phylogenetic relationships. We examine chromatin accessibility and gene expression through bowfin development to investigate the evolution of immune, scale, respiratory and fin skeletal systems and identify hundreds of gene-regulatory loci conserved across vertebrates. These resources connect developmental evolution among bony fishes, further highlighting the bowfin's importance for illuminating vertebrate biology and diversity in the genomic era.
Asunto(s)
Evolución Biológica , Evolución Molecular , Genoma/genética , Rajidae/genética , Rajidae/fisiología , Animales , Cromatina/genética , Peces , Rajidae/inmunología , Secuenciación Completa del GenomaRESUMEN
Whole-genome duplications (WGDs) have major impacts on the evolution of species, as they produce new gene copies contributing substantially to adaptation, isolation, phenotypic robustness, and evolvability. They result in large, complex gene families with recurrent gene losses in descendant species that sequence-based phylogenetic methods fail to reconstruct accurately. As a result, orthologs and paralogs are difficult to identify reliably in WGD-descended species, which hinders the exploration of functional consequences of WGDs. Here, we present Synteny-guided CORrection of Paralogies and Orthologies (SCORPiOs), a novel method to reconstruct gene phylogenies in the context of a known WGD event. WGDs generate large duplicated syntenic regions, which SCORPiOs systematically leverages as a complement to sequence evolution to infer the evolutionary history of genes. We applied SCORPiOs to the 320-My-old WGD at the origin of teleost fish. We find that almost one in four teleost gene phylogenies in the Ensembl database (3,394) are inconsistent with their syntenic contexts. For 70% of these gene families (2,387), we were able to propose an improved phylogenetic tree consistent with both the molecular substitution distances and the local syntenic information. We show that these synteny-guided phylogenies are more congruent with the species tree, with sequence evolution and with expected expression conservation patterns than those produced by state-of-the-art methods. Finally, we show that synteny-guided gene trees emphasize contributions of WGD paralogs to evolutionary innovations in the teleost clade.
Asunto(s)
Técnicas Genéticas , Filogenia , Poliploidía , Algoritmos , Animales , Evolución Biológica , Duplicación Cromosómica , Peces/genética , Familia de MultigenesRESUMEN
ANISEED (https://www.aniseed.cnrs.fr) is the main model organism database for the worldwide community of scientists working on tunicates, the vertebrate sister-group. Information provided for each species includes functionally-annotated gene and transcript models with orthology relationships within tunicates, and with echinoderms, cephalochordates and vertebrates. Beyond genes the system describes other genetic elements, including repeated elements and cis-regulatory modules. Gene expression profiles for several thousand genes are formalized in both wild-type and experimentally-manipulated conditions, using formal anatomical ontologies. These data can be explored through three complementary types of browsers, each offering a different view-point. A developmental browser summarizes the information in a gene- or territory-centric manner. Advanced genomic browsers integrate the genetic features surrounding genes or gene sets within a species. A Genomicus synteny browser explores the conservation of local gene order across deuterostome. This new release covers an extended taxonomic range of 14 species, including for the first time a non-ascidian species, the appendicularian Oikopleura dioica. Functional annotations, provided for each species, were enhanced through a combination of manual curation of gene models and the development of an improved orthology detection pipeline. Finally, gene expression profiles and anatomical territories can be explored in 4D online through the newly developed Morphonet morphogenetic browser.
Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica , Genoma , Programas Informáticos , Urocordados/genética , Animales , Sitios de Unión , Cefalocordados/genética , Gráficos por Computador , Simulación por Computador , Equinodermos/genética , Evolución Molecular , Orden Génico , Genómica , Hibridación in Situ , Internet , Anotación de Secuencia Molecular , Filogenia , Lenguajes de Programación , RNA-Seq , Sintenía , Interfaz Usuario-Computador , Vertebrados/genéticaRESUMEN
Vertebrates have greatly elaborated the basic chordate body plan and evolved highly distinctive genomes that have been sculpted by two whole-genome duplications. Here we sequence the genome of the Mediterranean amphioxus (Branchiostoma lanceolatum) and characterize DNA methylation, chromatin accessibility, histone modifications and transcriptomes across multiple developmental stages and adult tissues to investigate the evolution of the regulation of the chordate genome. Comparisons with vertebrates identify an intermediate stage in the evolution of differentially methylated enhancers, and a high conservation of gene expression and its cis-regulatory logic between amphioxus and vertebrates that occurs maximally at an earlier mid-embryonic phylotypic period. We analyse regulatory evolution after whole-genome duplications, and find that-in vertebrates-over 80% of broadly expressed gene families with multiple paralogues derived from whole-genome duplications have members that restricted their ancestral expression, and underwent specialization rather than subfunctionalization. Counter-intuitively, paralogues that restricted their expression increased the complexity of their regulatory landscapes. These data pave the way for a better understanding of the regulatory principles that underlie key vertebrate innovations.
Asunto(s)
Regulación de la Expresión Génica , Genómica , Anfioxos/genética , Vertebrados/genética , Animales , Tipificación del Cuerpo/genética , Metilación de ADN , Humanos , Anfioxos/embriología , Anotación de Secuencia Molecular , Regiones Promotoras Genéticas , Transcriptoma/genéticaRESUMEN
BACKGROUND: It has been proposed that more than 450 million years ago, two successive whole genome duplications took place in a marine chordate lineage before leading to the common ancestor of vertebrates. A precise reconstruction of these founding events would provide a framework to better understand the impact of these early whole genome duplications on extant vertebrates. RESULTS: We reconstruct the evolution of chromosomes at the beginning of vertebrate evolution. We first compare 61 extant animal genomes to reconstruct the highly contiguous order of genes in a 326-million-year-old ancestral Amniota genome. In this genome, we establish a well-supported list of duplicated genes originating from the two whole genome duplications to identify tetrads of duplicated chromosomes. From this, we reconstruct a chronology in which a pre-vertebrate genome composed of 17 chromosomes duplicated to 34 chromosomes and was subject to seven chromosome fusions before duplicating again into 54 chromosomes. After the separation of the lineage of Gnathostomata (jawed vertebrates) from Cyclostomata (extant jawless fish), four more fusions took place to form the ancestral Euteleostomi (bony vertebrates) genome of 50 chromosomes. CONCLUSIONS: These results firmly establish the occurrence of two whole genome duplications in the lineage that precedes the ancestor of vertebrates, resolving in particular the ambiguity raised by the analysis of the lamprey genome. This work provides a foundation for studying the evolution of vertebrate chromosomes from the standpoint of a common ancestor and particularly the pattern of duplicate gene retention and loss that resulted in the gene composition of extant vertebrate genomes.
Asunto(s)
Cromosomas/genética , Evolución Molecular , Genoma , Vertebrados/genética , Animales , Duplicación de Gen , Genoma Humano , Genómica , Humanos , Cariotipo , Modelos Genéticos , Filogenia , Especificidad de la EspecieRESUMEN
Since 2010, the Genomicus web server is available online at http://genomicus.biologie.ens.fr/genomicus. This graphical browser provides access to comparative genomic analyses in four different phyla (Vertebrate, Plants, Fungi, and non vertebrate Metazoans). Users can analyse genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants, in an integrated evolutionary context. New analyses and visualization tools have recently been implemented in Genomicus Vertebrate. Karyotype structures from several genomes can now be compared along an evolutionary pathway (Multi-KaryotypeView), and synteny blocks can be computed and visualized between any two genomes (PhylDiagView).
Asunto(s)
Bases de Datos Genéticas , Evolución Molecular , Cariotipo , Filogenia , Sintenía , Algoritmos , Animales , Presentación de Datos , Hongos/genética , Genoma , Plantas/genética , Programas Informáticos , Vertebrados/genéticaRESUMEN
ANISEED (www.aniseed.cnrs.fr) is the main model organism database for tunicates, the sister-group of vertebrates. This release gives access to annotated genomes, gene expression patterns, and anatomical descriptions for nine ascidian species. It provides increased integration with external molecular and taxonomy databases, better support for epigenomics datasets, in particular RNA-seq, ChIP-seq and SELEX-seq, and features novel interactive interfaces for existing and novel datatypes. In particular, the cross-species navigation and comparison is enhanced through a novel taxonomy section describing each represented species and through the implementation of interactive phylogenetic gene trees for 60% of tunicate genes. The gene expression section displays the results of RNA-seq experiments for the three major model species of solitary ascidians. Gene expression is controlled by the binding of transcription factors to cis-regulatory sequences. A high-resolution description of the DNA-binding specificity for 131 Ciona robusta (formerly C. intestinalis type A) transcription factors by SELEX-seq is provided and used to map candidate binding sites across the Ciona robusta and Phallusia mammillata genomes. Finally, use of a WashU Epigenome browser enhances genome navigation, while a Genomicus server was set up to explore microsynteny relationships within tunicates and with vertebrates, Amphioxus, echinoderms and hemichordates.
Asunto(s)
Bases de Datos Genéticas , Conjuntos de Datos como Asunto , Genoma , Urocordados/genética , Animales , Evolución Biológica , Ciona intestinalis/genética , ADN/metabolismo , Minería de Datos , Evolución Molecular , Expresión Génica , Ontología de Genes , Internet , Anotación de Secuencia Molecular , Filogenia , Unión Proteica , Especificidad de la Especie , Factores de Transcripción/metabolismo , Transcripción Genética , Vertebrados/genética , Navegador WebRESUMEN
BACKGROUND: Polycomb Repressive Complexes 2 (PRC2) are multi-protein chromatin modifiers that are evolutionarily conserved among eukaryotes and play key roles in the regulation of gene expression, notably through the trimethylation of lysine 27 of histone H3 (H3K27me3). Although PRC2-mediated gene regulation has been studied in many organisms, few studies have explored in depth the evolutionary conservation of PRC2 targets. RESULTS: Here, we compare the H3K27me3 epigenomic profiles for the two closely related species Arabidopsis thaliana and Arabidopsis lyrata and the more distant species Arabis alpina, three Brassicaceae that diverged from each other within the past 24 million years. Using a robust set of gene orthologs present in the three species, we identify two classes of evolutionarily conserved PRC2 targets, which are characterized by either developmentally plastic or developmentally constrained H3K27me3 marking across species. Constrained H3K27me3 marking is associated with higher conservation of promoter sequence information content and higher nucleosome occupancy compared to plastic H3K27me3 marking. Moreover, gene orthologs with constrained H3K27me3 marking exhibit a higher degree of tissue specificity and tend to be involved in developmental functions, whereas gene orthologs with plastic H3K27me3 marking preferentially encode proteins associated with metabolism and stress responses. In addition, gene orthologs with constrained H3K27me3 marking are the predominant contributors to higher-order chromosome organization. CONCLUSIONS: Our findings indicate that developmentally plastic and constrained H3K27me3 marking define two evolutionarily conserved modes of PRC2-mediated gene regulation that are associated with distinct selective pressures operating at multiple scales, from DNA sequence to gene function and chromosome architecture.
Asunto(s)
Brassicaceae/genética , Epigénesis Genética , Evolución Molecular , Regulación de la Expresión Génica de las Plantas , Código de Histonas , Complejo Represivo Polycomb 2/metabolismo , Arabidopsis/genética , Arabis/genética , Secuencia de Bases , Cromosomas de las Plantas , Secuencia Conservada , Duplicación de Gen , Regiones Promotoras Genéticas , TranscriptomaRESUMEN
BACKGROUND: Brassicaceae is a family of green plants of high scientific and economic interest, including thale cress (Arabidopsis thaliana), cruciferous vegetables (cabbages) and rapeseed. RESULTS: We reconstruct an evolutionary framework of Brassicaceae composed of high-resolution ancestral karyotypes using the genomes of modern A. thaliana, Arabidopsis lyrata, Capsella rubella, Brassica rapa and Thellungiella parvula. The ancestral Brassicaceae karyotype (Brassicaceae lineages I and II) is composed of eight protochromosomes and 20,037 ordered and oriented protogenes. After speciation, it evolved into the ancestral Camelineae karyotype (eight protochromosomes and 22,085 ordered protogenes) and the proto-Calepineae karyotype (seven protochromosomes and 21,035 ordered protogenes) genomes. CONCLUSIONS: The three inferred ancestral karyotype genomes are shown here to be powerful tools to unravel the reticulated evolutionary history of extant Brassicaceae genomes regarding the fate of ancestral genes and genomic compartments, particularly centromeres and evolutionary breakpoints. This new resource should accelerate research in comparative genomics and translational research by facilitating the transfer of genomic information from model systems to species of agronomic interest.
Asunto(s)
Brassicaceae/genética , Evolución Molecular , Genoma de Planta , Centrómero , Cromosomas de las Plantas , Genes de Plantas , Cariotipo , PoliploidíaRESUMEN
Enhancers can regulate the transcription of genes over long genomic distances. This is thought to lead to selection against genomic rearrangements within such regions that may disrupt this functional linkage. Here we test this concept experimentally using the human X chromosome. We describe a scoring method to identify evolutionary maintenance of linkage between conserved noncoding elements and neighbouring genes. Chromatin marks associated with enhancer function are strongly correlated with this linkage score. We test >1,000 putative enhancers by transgenesis assays in zebrafish to ascertain the identity of the target gene. The majority of active enhancers drive a transgenic expression in a pattern consistent with the known expression of a linked gene. These results show that evolutionary maintenance of linkage is a reliable predictor of an enhancer's function, and provide new information to discover the genetic basis of diseases caused by the mis-regulation of gene expression.
Asunto(s)
Cromosomas Humanos X/genética , Elementos de Facilitación Genéticos/genética , Expresión Génica/genética , Ligamiento Genético/genética , Selección Genética/genética , Animales , Animales Modificados Genéticamente , Evolución Molecular , Reordenamiento Génico/genética , Humanos , Pez CebraRESUMEN
The Genomicus web server (http://www.genomicus.biologie.ens.fr/genomicus) is a visualization tool allowing comparative genomics in four different phyla (Vertebrate, Fungi, Metazoan and Plants). It provides access to genomic information from extant species, as well as ancestral gene content and gene order for vertebrates and flowering plants. Here we present the new features available for vertebrate genome with a focus on new graphical tools. The interface to enter the database has been improved, two pairwise genome comparison tools are now available (KaryoView and MatrixView) and the multiple genome comparison tools (PhyloView and AlignView) propose three new kinds of representation and a more intuitive menu. These new developments have been implemented for Genomicus portal dedicated to vertebrates. This allows the analysis of 68 extant animal genomes, as well as 58 ancestral reconstructed genomes. The Genomicus server also provides access to ancestral gene orders, to facilitate evolutionary and comparative genomics studies, as well as computationally predicted regulatory interactions, thanks to the representation of conserved non-coding elements with their putative gene targets.
Asunto(s)
Bases de Datos Genéticas , Genómica , Animales , Secuencia de Bases , Gráficos por Computador , Secuencia Conservada , ADN Intergénico/química , Evolución Molecular , Orden Génico , Genoma , Humanos , Internet , Filogenia , Análisis de Secuencia de Proteína , Vertebrados/genéticaRESUMEN
Comparative genomics combined with phylogenetic reconstructions are powerful approaches to study the evolution of genes and genomes. However, the current rapid expansion of the volume of genomic information makes it increasingly difficult to interrogate, integrate and synthesize comparative genome data while taking into account the maximum breadth of information available. GenomicusPlants (http://www.genomicus.biologie.ens.fr/genomicus-plants) is an extension of the Genomicus webserver that addresses this issue by allowing users to explore flowering plant genomes in an intuitive way, across the broadest evolutionary scales. Extant genomes of 26 flowering plants can be analyzed, as well as 23 ancestral reconstructed genomes. Ancestral gene order provides a long-term chronological view of gene order evolution, greatly facilitating comparative genomics and evolutionary studies. Four main interfaces ('views') are available where: (i) PhyloView combines phylogenetic trees with comparisons of genomic loci across any number of genomes; (ii) AlignView projects loci of interest against all other genomes to visualize its topological conservation; (iii) MatrixView compares two genomes in a classical dotplot representation; and (iv) Karyoview visualizes chromosome karyotypes 'painted' with colours of another genome of interest. All four views are interconnected and benefit from many customizable features.