RESUMEN
Overlapping genes pose an evolutionary dilemma as one DNA sequence evolves under the selection pressures of multiple proteins. Here, we perform systematic statistical and mutational analyses of the overlapping HIV-1 genes tat and rev and engineer exhaustive libraries of non-overlapped viruses to perform deep mutational scanning of each gene independently. We find a "segregated" organization in which overlapped sites encode functional residues of one gene or the other, but never both. Furthermore, this organization eliminates unfit genotypes, providing a fitness advantage to the population. Our comprehensive analysis reveals the extraordinary manner in which HIV minimizes the constraint of overlapping genes and repurposes that constraint to its own advantage. Thus, overlaps are not just consequences of evolutionary constraints, but rather can provide population fitness advantages.
Asunto(s)
Evolución Biológica , VIH-1/genética , Productos del Gen tat del Virus de la Inmunodeficiencia Humana/genética , Entropía , Aptitud Genética , Infecciones por VIH/virología , Humanos , Mutación , Sistemas de Lectura Abierta , Productos del Gen rev del Virus de la Inmunodeficiencia Humana/genéticaRESUMEN
The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3-5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome.
Asunto(s)
Evolución Molecular , Genoma/genética , Genómica , Pan paniscus/genética , Filogenia , Animales , Factor 4A Eucariótico de Iniciación/genética , Femenino , Genes , Gorilla gorilla/genética , Anotación de Secuencia Molecular/normas , Pan troglodytes/genética , Pongo/genética , Duplicaciones Segmentarias en el Genoma , Análisis de Secuencia de ADNRESUMEN
For more than two decades, the UCSC Genome Browser database (https://genome.ucsc.edu) has provided high-quality genomics data visualization and genome annotations to the research community. As the field of genomics grows and more data become available, new modes of display are required to accommodate new technologies. New features released this past year include a Hi-C heatmap display, a phased family trio display for VCF files, and various track visualization improvements. Striving to keep data up-to-date, new updates to gene annotations include GENCODE Genes, NCBI RefSeq Genes, and Ensembl Genes. New data tracks added for human and mouse genomes include the ENCODE registry of candidate cis-regulatory elements, promoters from the Eukaryotic Promoter Database, and NCBI RefSeq Select and Matched Annotation from NCBI and EMBL-EBI (MANE). Within weeks of learning about the outbreak of coronavirus, UCSC released a genome browser, with detailed annotation tracks, for the SARS-CoV-2 RNA reference assembly.
Asunto(s)
COVID-19/prevención & control , Biología Computacional/métodos , Bases de Datos Genéticas , Genoma/genética , Genómica/métodos , SARS-CoV-2/genética , Animales , COVID-19/epidemiología , COVID-19/virología , Curaduría de Datos/métodos , Epidemias , Humanos , Internet , Ratones , Anotación de Secuencia Molecular/métodos , SARS-CoV-2/fisiología , Programas InformáticosRESUMEN
The SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-or protocol-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein-coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation or recombination among viral lineages. We suggest how samples can be screened and problematic variants removed, and we plan to regularly inform the scientific community with our updated results as more SARS-CoV-2 genome sequences are shared (https://virological.org/t/issues-with-sars-cov-2-sequencing-data/473 and https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). We also develop tools for comparing and visualizing differences among very large phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse.
Asunto(s)
Genoma Viral/genética , Filogenia , SARS-CoV-2/genética , Algoritmos , COVID-19 , Biología Computacional , Evolución Molecular , Humanos , ARN Viral/genética , Alineación de Secuencia , Secuenciación Completa del GenomaRESUMEN
Co-option of transposable elements (TEs) to become part of existing or new enhancers is an important mechanism for evolution of gene regulation. However, contributions of lineage-specific TE insertions to recent regulatory adaptations remain poorly understood. Gibbons present a suitable model to study these contributions as they have evolved a lineage-specific TE called LAVA (LINE-AluSz-VNTR-AluLIKE), which is still active in the gibbon genome. The LAVA retrotransposon is thought to have played a role in the emergence of the highly rearranged structure of the gibbon genome by disrupting transcription of cell cycle genes. In this study, we investigated whether LAVA may have also contributed to the evolution of gene regulation by adopting enhancer function. We characterized fixed and polymorphic LAVA insertions across multiple gibbons and found 96 LAVA elements overlapping enhancer chromatin states. Moreover, LAVA was enriched in multiple transcription factor binding motifs, was bound by an important transcription factor (PU.1), and was associated with higher levels of gene expression in cis We found gibbon-specific signatures of purifying/positive selection at 27 LAVA insertions. Two of these insertions were fixed in the gibbon lineage and overlapped with enhancer chromatin states, representing putative co-opted LAVA enhancers. These putative enhancers were located within genes encoding SETD2 and RAD9A, two proteins that facilitate accurate repair of DNA double-strand breaks and prevent chromosomal rearrangement mutations. Co-option of LAVA in these genes may have influenced regulation of processes that preserve genome integrity. Our findings highlight the importance of considering lineage-specific TEs in studying evolution of gene regulatory elements.
Asunto(s)
Genoma , Hylobates/genética , Retroelementos , Animales , Cromatina/genética , Evolución Molecular , Regulación de la Expresión Génica , Hylobates/clasificación , Mutagénesis Insercional , Secuencias Reguladoras de Ácidos Nucleicos , Especificidad de la EspecieRESUMEN
Viruses are obligate parasites that rely heavily on host cellular processes for replication. The small number of proteins typically encoded by a virus is faced with selection pressures that lead to the evolution of distinctive structural properties, allowing each protein to maintain its function under constraints such as small genome size, high mutation rate, and rapidly changing fitness conditions. One common strategy for this evolution is to utilize small building blocks to generate protein oligomers that assemble in multiple ways, thereby diversifying protein function and regulation. In this review, we discuss specific cases that illustrate how oligomerization is used to generate a single defined functional state, to modulate activity via different oligomeric states, or to generate multiple functional forms via different oligomeric states.
Asunto(s)
Multimerización de Proteína , Proteínas Virales/química , Virosis/virología , Virus/química , Animales , Cápside/química , Cápside/inmunología , Cápside/metabolismo , Ebolavirus/química , Ebolavirus/inmunología , Ebolavirus/metabolismo , Flavivirus/química , Flavivirus/inmunología , Flavivirus/metabolismo , Infecciones por Flavivirus/inmunología , Infecciones por Flavivirus/metabolismo , Infecciones por Flavivirus/virología , VIH/química , VIH/inmunología , VIH/metabolismo , Infecciones por VIH/inmunología , Infecciones por VIH/metabolismo , Infecciones por VIH/virología , Fiebre Hemorrágica Ebola/inmunología , Fiebre Hemorrágica Ebola/metabolismo , Fiebre Hemorrágica Ebola/virología , Humanos , Modelos Moleculares , Conformación Proteica , Proteínas Virales/inmunología , Proteínas Virales/metabolismo , Virosis/inmunología , Virosis/metabolismo , Replicación Viral , Virus/inmunología , Virus/metabolismoRESUMEN
Overlapping coding regions balance selective forces between multiple genes. One possible division of nucleotide sequence is that the predominant selective force on a particular nucleotide can be attributed to just one gene. While this arrangement has been observed in regions in which one gene is structured and the other is disordered, we sought to explore how overlapping genes balance constraints when both protein products are structured over the same sequence. We use a combination of sequence analysis, functional assays, and selection experiments to examine an overlapped region in HIV-1 that encodes helical regions in both Env and Rev. We find that functional segregation occurs even in this overlap, with each protein spacing its functional residues in a manner that allows a mutable non-binding face of one helix to encode important functional residues on a charged face in the other helix. Additionally, our experiments reveal novel and critical functional residues in Env and have implications for the therapeutic targeting of HIV-1.
Asunto(s)
VIH-1 , VIH-1/química , VIH-1/genética , Sistemas de Lectura AbiertaRESUMEN
BACKGROUND: Nearly half the human genome consists of repeat elements, most of which are retrotransposons, and many of which play important biological roles. However repeat elements pose several unique challenges to current bioinformatic analyses and visualization tools, as short repeat sequences can map to multiple genomic loci resulting in their misclassification and misinterpretation. In fact, sequence data mapping to repeat elements are often discarded from analysis pipelines. Therefore, there is a continued need for standardized tools and techniques to interpret genomic data of repeats. RESULTS: We present the UCSC Repeat Browser, which consists of a complete set of human repeat reference sequences derived from annotations made by the commonly used program RepeatMasker. The UCSC Repeat Browser also provides an alignment from the human genome to these references, uses it to map the standard human genome annotation tracks, and presents all of them as a comprehensive interface to facilitate work with repetitive elements. It also provides processed tracks of multiple publicly available datasets of particular interest to the repeat community, including ChIP-seq datasets for KRAB Zinc Finger Proteins (KZNFs) - a family of proteins known to bind and repress certain classes of repeats. We used the UCSC Repeat Browser in combination with these datasets, as well as RepeatMasker annotations in several non-human primates, to trace the independent trajectories of species-specific evolutionary battles between LINE 1 retroelements and their repressors. Furthermore, we document at https://repeatbrowser.ucsc.edu how researchers can map their own human genome annotations to these reference repeat sequences. CONCLUSIONS: The UCSC Repeat Browser allows easy and intuitive visualization of genomic data on consensus repeat elements, circumventing the problem of multi-mapping, in which sequencing reads of repeat elements map to multiple locations on the human genome. By developing a reference consensus, multiple datasets and annotation tracks can easily be overlaid to reveal complex evolutionary histories of repeats in a single interactive window. Specifically, we use this approach to retrace the history of several primate specific LINE-1 families across apes, and discover several species-specific routes of evolution that correlate with the emergence and binding of KZNFs.
RESUMEN
Many postdoctoral researchers apply for faculty positions knowing relatively little about the hiring process or what is needed to secure a job offer. To address this lack of knowledge about the hiring process we conducted a survey of applicants for faculty positions: the survey ran between May 2018 and May 2019, and received 317 responses. We analyzed the responses to explore the interplay between various scholarly metrics and hiring outcomes. We concluded that, above a certain threshold, the benchmarks traditionally used to measure research success - including funding, number of publications or journals published in - were unable to completely differentiate applicants with and without job offers. Respondents also reported that the hiring process was unnecessarily stressful, time-consuming, and lacking in feedback, irrespective of outcome. Our findings suggest that there is considerable scope to improve the transparency of the hiring process.
Asunto(s)
Movilidad Laboral , Docentes/estadística & datos numéricos , Investigadores/estadística & datos numéricos , Logro , Femenino , Humanos , Solicitud de Empleo , Conocimiento , Masculino , Edición , Investigación , Encuestas y Cuestionarios , UniversidadesRESUMEN
HIV-1 Rev is an essential viral regulatory protein that facilitates the nuclear export of intron-containing viral mRNAs. It is organized into structured, functionally well-characterized motifs joined by less understood linker regions. Our recent competitive deep mutational scanning study confirmed many known constraints in Rev's established motifs, but also identified positions of mutational plasticity, most notably in surrounding linker regions. Here, we probe the mutational limits of these linkers by testing the activities of multiple truncation and mass substitution mutations. We find that these regions possess previously unknown structural, functional or regulatory roles, not apparent from systematic point mutational approaches. Specifically, the N- and C-termini of Rev contribute to protein stability; mutations in a turn that connects the two main helices of Rev have different effects in different contexts; and a linker region which connects the second helix of Rev to its nuclear export sequence has structural requirements for function. Thus, Rev function extends beyond its characterized motifs, and is tuned by determinants within seemingly plastic portions of its sequence. Additionally, Rev's ability to tolerate many of these massive truncations and substitutions illustrates the overall mutational and functional robustness inherent in this viral protein.
Asunto(s)
VIH-1/química , Productos del Gen rev del Virus de la Inmunodeficiencia Humana/química , Secuencias de Aminoácidos , Células HEK293 , VIH-1/crecimiento & desarrollo , VIH-1/metabolismo , Humanos , Mutación , Dominios Proteicos , Estabilidad Proteica , Relación Estructura-Actividad , Productos del Gen rev del Virus de la Inmunodeficiencia Humana/genética , Productos del Gen rev del Virus de la Inmunodeficiencia Humana/metabolismoRESUMEN
HIV replication requires the nuclear export of essential, intron-containing viral RNAs. To facilitate export, HIV encodes the viral accessory protein Rev which binds unspliced and partially spliced viral RNAs and creates a ribonucleoprotein complex that recruits the cellular Chromosome maintenance factor 1 export machinery. Exporting RNAs in this manner bypasses the necessity for complete splicing as a prerequisite for mRNA export, and allows intron-containing RNAs to reach the cytoplasm intact for translation and virus packaging. Recent structural studies have revealed that this entire complex exhibits remarkable plasticity at many levels of organization, including RNA folding, protein-RNA recognition, multimer formation, and host factor recruitment. In this review, we explore each aspect of plasticity from structural, functional, and possible therapeutic viewpoints. WIREs RNA 2016, 7:470-486. doi: 10.1002/wrna.1342 For further resources related to this article, please visit the WIREs website.