RESUMEN
Integrating human genomics and proteomics can help elucidate disease mechanisms, identify clinical biomarkers and discover drug targets1-4. Because previous proteogenomic studies have focused on common variation via genome-wide association studies, the contribution of rare variants to the plasma proteome remains largely unknown. Here we identify associations between rare protein-coding variants and 2,923 plasma protein abundances measured in 49,736 UK Biobank individuals. Our variant-level exome-wide association study identified 5,433 rare genotype-protein associations, of which 81% were undetected in a previous genome-wide association study of the same cohort5. We then looked at aggregate signals using gene-level collapsing analysis, which revealed 1,962 gene-protein associations. Of the 691 gene-level signals from protein-truncating variants, 99.4% were associated with decreased protein levels. STAB1 and STAB2, encoding scavenger receptors involved in plasma protein clearance, emerged as pleiotropic loci, with 77 and 41 protein associations, respectively. We demonstrate the utility of our publicly accessible resource through several applications. These include detailing an allelic series in NLRC4, identifying potential biomarkers for a fatty liver disease-associated variant in HSD17B13 and bolstering phenome-wide association studies by integrating protein quantitative trait loci with protein-truncating variants in collapsing analyses. Finally, we uncover distinct proteomic consequences of clonal haematopoiesis (CH), including an association between TET2-CH and increased FLT3 levels. Our results highlight a considerable role for rare variation in plasma protein abundance and the value of proteogenomics in therapeutic discovery.
Asunto(s)
Bancos de Muestras Biológicas , Proteínas Sanguíneas , Estudios de Asociación Genética , Genómica , Proteómica , Humanos , Alelos , Biomarcadores/sangre , Proteínas Sanguíneas/análisis , Proteínas Sanguíneas/genética , Bases de Datos Factuales , Exoma/genética , Hematopoyesis , Mutación , Plasma/química , Reino UnidoRESUMEN
African swine fever virus (ASFV) has a major global economic impact. With a case fatality in domestic pigs approaching 100%, it currently presents the largest threat to animal farming. Although genomic differences between attenuated and highly virulent ASFV strains have been identified, the molecular determinants for virulence at the level of gene expression have remained opaque. Here, we characterize the transcriptome of ASFV genotype II Georgia 2007/1 (GRG) during infection of the physiologically relevant host cells, porcine macrophages. In this study, we applied cap analysis gene expression sequencing (CAGE-seq) to map th0e 5' ends of viral mRNAs at 5 and 16 h postinfection. A bioinformatics analysis of the sequence context surrounding the transcription start sites (TSSs) enabled us to characterize the global early and late promoter landscape of GRG. We compared transcriptome maps of the GRG isolate and the lab-attenuated BA71V strain that highlighted GRG virulence-specific transcripts belonging to multigene families, including two predicted MGF 100 genes, I7L and I8L. In parallel, we monitored transcriptome changes in the infected host macrophage cells. Of the 9,384 macrophage genes studied, transcripts for 652 host genes were differentially regulated between 5 and 16 h postinfection compared with only 25 between uninfected cells and 5 h postinfection. NF-κB activated genes and lysosome components such as S100 were upregulated, and chemokines such as CCL24, CXCL2, CXCL5, and CXCL8 were downregulated. IMPORTANCE African swine fever virus (ASFV) causes hemorrhagic fever in domestic pigs, with case fatality rates approaching 100% and no approved vaccines or antivirals. The highly virulent ASFV Georgia 2007/1 strain (GRG) was the first isolated when ASFV spread from Africa to the Caucasus region in 2007, then spreading through Eastern Europe and, more recently, across Asia. We used an RNA-based next-generation sequencing technique called CAGE-seq to map the starts of viral genes across the GRG DNA genome. This has allowed us to investigate which viral genes are expressed during early or late stages of infection and how this is controlled, comparing their expression to the nonvirulent ASFV-BA71V strain to identify key genes that play a role in virulence. In parallel, we investigated how host cells respond to infection, which revealed how the ASFV suppresses components of the host immune response to ultimately win the arms race against its porcine host.
Asunto(s)
Virus de la Fiebre Porcina Africana , Fiebre Porcina Africana , Interacciones Microbiota-Huesped , Macrófagos , Proteínas Virales , Fiebre Porcina Africana/inmunología , Fiebre Porcina Africana/virología , Virus de la Fiebre Porcina Africana/genética , Virus de la Fiebre Porcina Africana/inmunología , Animales , Perfilación de la Expresión Génica , Georgia (República) , Interacciones Microbiota-Huesped/inmunología , Macrófagos/inmunología , Macrófagos/virología , Sus scrofa , Porcinos , Transcriptoma , Proteínas Virales/genética , Proteínas Virales/inmunologíaRESUMEN
African swine fever virus (ASFV) causes hemorrhagic fever in domestic pigs, presenting the biggest global threat to animal farming in recorded history. Despite the importance of ASFV, little is known about the mechanisms and regulation of ASFV transcription. Using RNA sequencing methods, we have determined total RNA abundance, transcription start sites, and transcription termination sites at single-nucleotide resolution. This allowed us to characterize DNA consensus motifs of early and late ASFV core promoters, as well as a polythymidylate sequence determinant for transcription termination. Our results demonstrate that ASFV utilizes alternative transcription start sites between early and late stages of infection and that ASFV RNA polymerase (RNAP) undergoes promoter-proximal transcript slippage at 5' ends of transcription units, adding quasitemplated AU- and AUAU-5' extensions to mRNAs. Here, we present the first much-needed genome-wide transcriptome study that provides unique insight into ASFV transcription and serves as a resource to aid future functional analyses of ASFV genes which are essential to combat this devastating disease.IMPORTANCE African swine fever virus (ASFV) causes incurable and often lethal hemorrhagic fever in domestic pigs. In 2020, ASF presents an acute and global animal health emergency that has the potential to devastate entire national economies as effective vaccines or antiviral drugs are not currently available (according to the Food and Agriculture Organization of the United Nations). With major outbreaks ongoing in Eastern Europe and Asia, urgent action is needed to advance our knowledge about the fundamental biology of ASFV, including the mechanisms and temporal control of gene expression. A thorough understanding of RNAP and transcription factor function, and of the sequence context of their promoter motifs, as well as accurate knowledge of which genes are expressed when and the amino acid sequence of the encoded proteins, is direly needed for the development of antiviral drugs and vaccines.
Asunto(s)
Virus de la Fiebre Porcina Africana/genética , Fiebre Porcina Africana/prevención & control , Transcripción Genética/genética , Secuencia de Aminoácidos , Animales , Genoma Viral , Fiebres Hemorrágicas Virales/virología , Sus scrofa/virología , Porcinos/virología , Terminación de la Transcripción Genética , Activación Transcripcional/genética , Transcriptoma/genética , Proteínas Virales/genéticaRESUMEN
Following publication of the original article [1], we have been notified that some important information was omitted by the authors from the Competing interests section. The declaration should read as below.
RESUMEN
PIN-like domains constitute a widespread superfamily of nucleases, diverse in terms of the reaction mechanism, substrate specificity, biological function and taxonomic distribution. Proteins with PIN-like domains are involved in central cellular processes, such as DNA replication and repair, mRNA degradation, transcription regulation and ncRNA maturation. In this work, we identify and classify the most complete set of PIN-like domains to provide the first comprehensive analysis of sequence-structure-function relationships within the whole PIN domain-like superfamily. Transitive sequence searches using highly sensitive methods for remote homology detection led to the identification of several new families, including representatives of Pfam (DUF1308, DUF4935) and CDD (COG2454), and 23 other families not classified in the public domain databases. Further sequence clustering revealed relationships between individual sequence clusters and showed heterogeneity within some families, suggesting a possible functional divergence. With five structural groups, 70 defined clusters, over 100,000 proteins, and broad biological functions, the PIN domain-like superfamily constitutes one of the largest and most diverse nuclease superfamilies. Detailed analyses of sequences and structures, domain architectures, and genomic contexts allowed us to predict biological function of several new families, including new toxin-antitoxin components, proteins involved in tRNA/rRNA maturation and transcription/translation regulation.
Asunto(s)
Desoxirribonucleasas/química , Desoxirribonucleasas/clasificación , Ribonucleasas/química , Ribonucleasas/clasificación , Secuencia de Aminoácidos , Bacterias/enzimología , Bacterias/genética , Bacteriófagos/enzimología , Bacteriófagos/genética , Sitios de Unión , Biocatálisis , Cristalografía por Rayos X , Desoxirribonucleasas/genética , Desoxirribonucleasas/metabolismo , Hongos/enzimología , Hongos/genética , Humanos , Cinética , Modelos Moleculares , Unión Proteica , Conformación Proteica en Hélice alfa , Conformación Proteica en Lámina beta , Dominios Proteicos , Estructura Terciaria de Proteína , Ribonucleasas/genética , Ribonucleasas/metabolismo , Alineación de Secuencia , Homología de Secuencia de Aminoácido , Especificidad por SustratoRESUMEN
The His-Me finger endonucleases, also known as HNH or ßßα-metal endonucleases, form a large and diverse protein superfamily. The His-Me finger domain can be found in proteins that play an essential role in cells, including genome maintenance, intron homing, host defense and target offense. Its overall structural compactness and non-specificity make it a perfectly-tailored pathogenic module that participates on both sides of inter- and intra-organismal competition. An extremely low sequence similarity across the superfamily makes it difficult to identify and classify new His-Me fingers. Using state-of-the-art distant homology detection methods, we provide an updated and systematic classification of His-Me finger proteins. In this work, we identified over 100 000 proteins and clustered them into 38 groups, of which three groups are new and cannot be found in any existing public domain database of protein families. Based on an analysis of sequences, structures, domain architectures, and genomic contexts, we provide a careful functional annotation of the poorly characterized members of this superfamily. Our results may inspire further experimental investigations that should address the predicted activity and clarify the potential substrates, to provide more detailed insights into the fundamental biological roles of these proteins.
Asunto(s)
Dominio Catalítico , Endonucleasas/clasificación , Endonucleasas/metabolismo , Pliegue de Proteína , Secuencia de Aminoácidos , Sitios de Unión , ADN/química , Endonucleasas/genética , Alineación de SecuenciaRESUMEN
RNA has been found to play an ever-increasing role in a variety of biological processes. The function of most non-coding RNA molecules depends on their structure. Comparing and classifying macromolecular 3D structures is of crucial importance for structure-based function inference and it is used in the characterization of functional motifs and in structure prediction by comparative modeling. However, compared to the numerous methods for protein structure superposition, there are few tools dedicated to the superimposing of RNA 3D structures. Here, we present SupeRNAlign (v1.3.1), a new method for flexible superposition of RNA 3D structures, and SupeRNAlign-Coffee-a workflow that combines SupeRNAlign with T-Coffee for inferring structure-based sequence alignments. The methods have been benchmarked with eight other methods for RNA structural superposition and alignment. The benchmark included 151 structures from 32 RNA families (with a total of 1734 pairwise superpositions). The accuracy of superpositions was assessed by comparing structure-based sequence alignments to the reference alignments from the Rfam database. SupeRNAlign and SupeRNAlign-Coffee achieved significantly higher scores than most of the benchmarked methods: SupeRNAlign generated the most accurate sequence alignments among the structure superposition methods, and SupeRNAlign-Coffee performed best among the sequence alignment methods.
Asunto(s)
ARN/química , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Modelos Moleculares , Conformación de Ácido NucleicoRESUMEN
The d-2-hydroxyacid dehydrogenase (2HADH) family illustrates a complex evolutionary history with multiple lateral gene transfers and gene duplications and losses. As a result, the exact functional annotation of individual members can be extrapolated to a very limited extent. Here, we revise the previous simplified view on the classification of the 2HADH family; specifically, we show that the previously delineated glyoxylate/hydroxypyruvate reductase (GHPR) subfamily consists of two evolutionary separated GHRA and GHRB subfamilies. We compare two representatives of these subfamilies from Sinorhizobium meliloti (SmGhrA and SmGhrB), employing a combination of biochemical, structural, and bioinformatics approaches. Our kinetic results show that both enzymes reduce several 2-ketocarboxylic acids with overlapping, but not equivalent, substrate preferences. SmGhrA and SmGhrB show highest activity with glyoxylate and hydroxypyruvate, respectively; in addition, only SmGhrB reduces 2-keto-d-gluconate, and only SmGhrA reduces pyruvate (with low efficiency). We present nine crystal structures of both enzymes in apo forms and in complexes with cofactors and substrates/substrate analogues. In particular, we determined a crystal structure of SmGhrB with 2-keto-d-gluconate, which is the biggest substrate cocrystallized with a 2HADH member. The structures reveal significant differences between SmGhrA and SmGhrB, both in the overall structure and within the substrate-binding pocket, offering insight into the molecular basis for the observed substrate preferences and subfamily differences. In addition, we provide an overview of all GHRA and GHRB structures complexed with a ligand in the active site.
Asunto(s)
Oxidorreductasas de Alcohol/química , Aldehído Oxidorreductasas/química , Proteínas Bacterianas/química , Hidroxipiruvato Reductasa/química , Sinorhizobium meliloti/enzimología , Oxidorreductasas de Alcohol/clasificación , Oxidorreductasas de Alcohol/genética , Oxidorreductasas de Alcohol/metabolismo , Aldehído Oxidorreductasas/clasificación , Aldehído Oxidorreductasas/genética , Aldehído Oxidorreductasas/metabolismo , Proteínas Bacterianas/clasificación , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Cristalografía por Rayos X , Hidroxipiruvato Reductasa/clasificación , Hidroxipiruvato Reductasa/genética , Hidroxipiruvato Reductasa/metabolismo , Cinética , Modelos Moleculares , Filogenia , Conformación Proteica , Sinorhizobium meliloti/química , Sinorhizobium meliloti/genética , Sinorhizobium meliloti/metabolismo , Especificidad por SustratoRESUMEN
BACKGROUND: The family of D-isomer specific 2-hydroxyacid dehydrogenases (2HADHs) contains a wide range of oxidoreductases with various metabolic roles as well as biotechnological applications. Despite a vast amount of biochemical and structural data for various representatives of the family, the long and complex evolution and broad sequence diversity hinder functional annotations for uncharacterized members. RESULTS: We report an in-depth phylogenetic analysis, followed by mapping of available biochemical and structural data on the reconstructed phylogenetic tree. The analysis suggests that some subfamilies comprising enzymes with similar yet broad substrate specificity profiles diverged early in the evolution of 2HADHs. Based on the phylogenetic tree, we present a revised classification of the family that comprises 22 subfamilies, including 13 new subfamilies not studied biochemically. We summarize characteristics of the nine biochemically studied subfamilies by aggregating all available sequence, biochemical, and structural data, providing comprehensive descriptions of the active site, cofactor-binding residues, and potential roles of specific structural regions in substrate recognition. In addition, we concisely present our analysis as an online 2HADH enzymes knowledgebase. CONCLUSIONS: The knowledgebase enables navigation over the 2HADHs classification, search through collected data, and functional predictions of uncharacterized 2HADHs. Future characterization of the new subfamilies may result in discoveries of enzymes with novel metabolic roles and with properties beneficial for biotechnological applications.
Asunto(s)
Oxidorreductasas de Alcohol/química , Oxidorreductasas de Alcohol/clasificación , Bases del Conocimiento , Oxidorreductasas de Alcohol/metabolismo , Secuencia de Aminoácidos , Dominio Catalítico , Coenzimas/metabolismo , Funciones de Verosimilitud , Filogenia , Especificidad por SustratoRESUMEN
BACKGROUND: Coagulase negative staphylococci (CoNS) are commensal bacteria on human skin. Staphylococcus lugdunensis is a unique CoNS which produces various virulence factors and may, like S. aureus, cause severe infections, particularly in hospital settings. Unlike other staphylococci, it remains highly susceptible to antimicrobials, and genome-based phylogenetic studies have evidenced a highly conserved genome that distinguishes it from all other staphylococci. RESULTS: We demonstrate that S. lugdunensis possesses a closed pan-genome with a very limited number of new genes, in contrast to other staphylococci that have an open pan-genome. Whole-genome nucleotide and amino acid identity levels are also higher than in other staphylococci. We identified numerous genetic barriers to horizontal gene transfer that might explain this result. The S. lugdunensis genome has multiple operons encoding for restriction-modification, CRISPR/Cas and toxin/antitoxin systems. We also identified a new PIN-like domain-associated protein that might belong to a larger operon, comprising a metalloprotease, that could function as a new toxin/antitoxin or detoxification system. CONCLUSION: We show that S. lugdunensis has a unique genome profile within staphylococci, with a closed pan-genome and several systems to prevent horizontal gene transfer. Its virulence in clinical settings does not rely on its ability to acquire and exchange antibiotic resistance genes or other virulence factors as shown for other staphylococci.
Asunto(s)
Transferencia de Gen Horizontal/genética , Genoma Bacteriano , Staphylococcus lugdunensis/genética , Sistemas CRISPR-Cas/genética , Humanos , Filogenia , Análisis de Secuencia de ADN , Infecciones Estafilocócicas/microbiología , Virulencia , Factores de Virulencia/genéticaRESUMEN
This paper is a report of a second round of RNA-Puzzles, a collective and blind experiment in three-dimensional (3D) RNA structure prediction. Three puzzles, Puzzles 5, 6, and 10, represented sequences of three large RNA structures with limited or no homology with previously solved RNA molecules. A lariat-capping ribozyme, as well as riboswitches complexed to adenosylcobalamin and tRNA, were predicted by seven groups using RNAComposer, ModeRNA/SimRNA, Vfold, Rosetta, DMD, MC-Fold, 3dRNA, and AMBER refinement. Some groups derived models using data from state-of-the-art chemical-mapping methods (SHAPE, DMS, CMCT, and mutate-and-map). The comparisons between the predictions and the three subsequently released crystallographic structures, solved at diffraction resolutions of 2.5-3.2 Å, were carried out automatically using various sets of quality indicators. The comparisons clearly demonstrate the state of present-day de novo prediction abilities as well as the limitations of these state-of-the-art methods. All of the best prediction models have similar topologies to the native structures, which suggests that computational methods for RNA structure prediction can already provide useful structural information for biological problems. However, the prediction accuracy for non-Watson-Crick interactions, key to proper folding of RNAs, is low and some predicted models had high Clash Scores. These two difficulties point to some of the continuing bottlenecks in RNA structure prediction. All submitted models are available for download at http://ahsoka.u-strasbg.fr/rnapuzzles/.
Asunto(s)
Biología Computacional/métodos , ARN/química , Cristalografía por Rayos X , Modelos Moleculares , Conformación de Ácido Nucleico , ARN Mensajero/química , ARN de Transferencia/química , Programas InformáticosRESUMEN
α2ß1 integrin is one of the most important collagen-binding receptors, and it has been implicated in numerous thrombotic and immune diseases. α2ß1 integrin is a potent tumour suppressor, and its downregulation is associated with increased metastasis and poor prognosis in breast cancer. Currently, very little is known about the mechanism that regulates the cell-surface expression and trafficking of α2ß1 integrin. Here, using a quantitative fluorescence-microscopy-based RNAi assay, we investigated the impact of 386 cytoskeleton-associated or -regulatory genes on α2 integrin endocytosis and found that 122 of these affected the intracellular accumulation of α2 integrin. Of these, 83 were found to be putative regulators of α2 integrin trafficking and/or expression, with no observed effect on the internalization of epidermal growth factor (EGF) or transferrin. Further interrogation and validation of the siRNA screen revealed a role for KIF15, a microtubule-based molecular motor, as a significant inhibitor of the endocytic trafficking of α2 integrin. Our data suggest a novel role for KIF15 in mediating plasma membrane localization of the alternative clathrin adaptor Dab2, thus impinging on pathways that regulate α2 integrin internalization.
Asunto(s)
Proteínas Adaptadoras Transductoras de Señales/metabolismo , Neoplasias de la Mama/genética , Membrana Celular/metabolismo , Integrina alfa2beta1/metabolismo , Cinesinas/metabolismo , Proteínas Supresoras de Tumor/metabolismo , Proteínas Reguladoras de la Apoptosis , Colágeno/metabolismo , Citoesqueleto/genética , Endocitosis/genética , Femenino , Pruebas Genéticas/métodos , Células HeLa , Humanos , Integrina alfa2beta1/genética , Cinesinas/genética , Microscopía Fluorescente , Metástasis de la Neoplasia , Unión Proteica/genética , Transporte de Proteínas/genética , Interferencia de ARN , ARN Interferente Pequeño/genéticaRESUMEN
Prokaryotic ribosomal protein genes are typically grouped within highly conserved operons. In many cases, one or more of the encoded proteins not only bind to a specific site in the ribosomal RNA, but also to a motif localized within their own mRNA, and thereby regulate expression of the operon. In this study, we computationally predicted an RNA motif present in many bacterial phyla within the 5' untranslated region of operons encoding ribosomal proteins S6 and S18. We demonstrated that the S6:S18 complex binds to this motif, which we hereafter refer to as the S6:S18 complex-binding motif (S6S18CBM). This motif is a conserved CCG sequence presented in a bulge flanked by a stem and a hairpin structure. A similar structure containing a CCG trinucleotide forms the S6:S18 complex binding site in 16S ribosomal RNA. We have constructed a 3D structural model of a S6:S18 complex with S6S18CBM, which suggests that the CCG trinucleotide in a specific structural context may be specifically recognized by the S18 protein. This prediction was supported by site-directed mutagenesis of both RNA and protein components. These results provide a molecular basis for understanding protein-RNA recognition and suggest that the S6S18CBM is involved in an auto-regulatory mechanism.
Asunto(s)
Proteínas Bacterianas/metabolismo , Conformación de Ácido Nucleico , ARN Bacteriano/metabolismo , ARN Mensajero/metabolismo , ARN Ribosómico/metabolismo , Proteína S6 Ribosómica/metabolismo , Proteínas Ribosómicas/metabolismo , Regiones no Traducidas 5'/genética , Proteínas Bacterianas/química , Proteínas Bacterianas/genética , Emparejamiento Base , Secuencia de Bases , Sitios de Unión , Ensayo de Cambio de Movilidad Electroforética , Escherichia coli/genética , Escherichia coli/metabolismo , Modelos Moleculares , Datos de Secuencia Molecular , Operón/genética , Unión Proteica , Estructura Terciaria de Proteína , ARN Bacteriano/química , ARN Bacteriano/genética , ARN Mensajero/química , ARN Mensajero/genética , ARN Ribosómico/química , ARN Ribosómico/genética , Proteína S6 Ribosómica/química , Proteína S6 Ribosómica/genética , Proteínas Ribosómicas/química , Proteínas Ribosómicas/genética , Ribosomas/química , Ribosomas/genética , Ribosomas/metabolismo , Homología de Secuencia de Ácido Nucleico , Thermus thermophilus/genética , Thermus thermophilus/metabolismoRESUMEN
Protein-RNA interactions play fundamental roles in many biological processes, such as regulation of gene expression, RNA splicing, and protein synthesis. The understanding of these processes improves as new structures of protein-RNA complexes are solved and the molecular details of interactions analyzed. However, experimental determination of protein-RNA complex structures by high-resolution methods is tedious and difficult. Therefore, studies on protein-RNA recognition and complex formation present major technical challenges for macromolecular structural biology. Alternatively, protein-RNA interactions can be predicted by computational methods. Although less accurate than experimental measurements, theoretical models of macromolecular structures can be sufficiently accurate to prompt functional hypotheses and guide e.g. identification of important amino acid or nucleotide residues. In this article we present an overview of strategies and methods for computational modeling of protein-RNA complexes, including software developed in our laboratory, and illustrate it with practical examples of structural predictions.
Asunto(s)
Biología Computacional/métodos , Proteínas de Escherichia coli/química , ARN Ribosómico 16S/química , Proteínas de Unión al ARN/química , Riboswitch/genética , Programas Informáticos , Bacillus subtilis/química , Sitios de Unión , Bases de Datos de Proteínas , Escherichia coli/química , Conformación Molecular , Simulación del Acoplamiento Molecular , Unión Proteica , Thermoanaerobacter/químicaRESUMEN
In addition to mRNAs whose primary function is transmission of genetic information from DNA to proteins, numerous other classes of RNA molecules exist, which are involved in a variety of functions, such as catalyzing biochemical reactions or performing regulatory roles. In analogy to proteins, the function of RNAs depends on their structure and dynamics, which are largely determined by the ribonucleotide sequence. Experimental determination of high-resolution RNA structures is both laborious and difficult, and therefore, the majority of known RNAs remain structurally uncharacterized. To address this problem, computational structure prediction methods were developed that simulate either the physical process of RNA structure formation ("Greek science" approach) or utilize information derived from known structures of other RNA molecules ("Babylonian science" approach). All computational methods suffer from various limitations that make them generally unreliable for structure prediction of long RNA sequences. However, in many cases, the limitations of computational and experimental methods can be overcome by combining these two complementary approaches with each other. In this work, we review computational approaches for RNA structure prediction, with emphasis on implementations (particular programs) that can utilize restraints derived from experimental analyses. We also list experimental approaches, whose results can be relatively easily used by computational methods. Finally, we describe case studies where computational and experimental analyses were successfully combined to determine RNA structures that would remain out of reach for each of these approaches applied separately.
Asunto(s)
Modelos Moleculares , Conformación de Ácido Nucleico , ARN/química , Algoritmos , Emparejamiento Base , Biología Computacional/métodos , Evolución Molecular , ARN/genética , Solventes , TermodinámicaRESUMEN
The emergence of biobank-level datasets offers new opportunities to discover novel biomarkers and develop predictive algorithms for human disease. Here, we present an ensemble machine-learning framework (machine learning with phenotype associations, MILTON) utilizing a range of biomarkers to predict 3,213 diseases in the UK Biobank. Leveraging the UK Biobank's longitudinal health record data, MILTON predicts incident disease cases undiagnosed at time of recruitment, largely outperforming available polygenic risk scores. We further demonstrate the utility of MILTON in augmenting genetic association analyses in a phenome-wide association study of 484,230 genome-sequenced samples, along with 46,327 samples with matched plasma proteomics data. This resulted in improved signals for 88 known (P < 1 × 10-8) gene-disease relationships alongside 182 gene-disease relationships that did not achieve genome-wide significance in the nonaugmented baseline cohorts. We validated these discoveries in the FinnGen biobank alongside two orthogonal machine-learning methods built for gene-disease prioritization. All extracted gene-disease associations and incident disease predictive biomarkers are publicly available ( http://milton.public.cgr.astrazeneca.com ).
Asunto(s)
Bancos de Muestras Biológicas , Biomarcadores , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Aprendizaje Automático , Humanos , Reino Unido , Estudio de Asociación del Genoma Completo/métodos , Estudios de Casos y Controles , Herencia Multifactorial/genética , Proteómica/métodos , Fenotipo , Polimorfismo de Nucleótido Simple , Algoritmos , Multiómica , Biobanco del Reino UnidoRESUMEN
In eukaryotes, histone paralogues form obligate heterodimers such as H3/H4 and H2A/H2B that assemble into octameric nucleosome particles. Archaeal histones are dimeric and assemble on DNA into 'hypernucleosome' particles of varying sizes with each dimer wrapping 30 bp of DNA. These are composed of canonical and variant histone paralogues, but the function of these variants is poorly understood. Here, we characterise the structure and function of the histone paralogue MJ1647 from Methanocaldococcus jannaschii that has a unique C-terminal extension enabling homotetramerisation. The 1.9 Å X-ray structure of a dimeric MJ1647 species, structural modelling of the tetramer, and site-directed mutagenesis reveal that the C-terminal tetramerization module consists of two alpha helices in a handshake arrangement. Unlike canonical histones, MJ1647 tetramers can bridge two DNA molecules in vitro. Using single-molecule tethered particle motion and DNA binding assays, we show that MJ1647 tetramers bind ~60 bp DNA and compact DNA in a highly cooperative manner. We furthermore show that MJ1647 effectively competes with the transcription machinery to block access to the promoter in vitro. To the best of our knowledge, MJ1647 is the first histone shown to have DNA bridging properties, which has important implications for genome structure and gene expression in archaea.
Asunto(s)
ADN , Histonas , Histonas/genética , ADN/genética , Archaea/genética , Bioensayo , Eucariontes , PolímerosRESUMEN
Large reference datasets of protein-coding variation in human populations have allowed us to determine which genes and genic subregions are intolerant to germline genetic variation. There is also a growing number of genes implicated in severe Mendelian diseases that overlap with genes implicated in cancer. We hypothesized that cancer-driving mutations might be enriched in genic subregions that are depleted of germline variation relative to somatic variation. We introduce a new metric, OncMTR (oncology missense tolerance ratio), which uses 125,748 exomes in the Genome Aggregation Database (gnomAD) to identify these genic subregions. We demonstrate that OncMTR can significantly predict driver mutations implicated in hematologic malignancies. Divergent OncMTR regions were enriched for cancer-relevant protein domains, and overlaying OncMTR scores on protein structures identified functionally important protein residues. Last, we performed a rare variant, gene-based collapsing analysis on an independent set of 394,694 exomes from the UK Biobank and find that OncMTR markedly improves genetic signals for hematologic malignancies.
Asunto(s)
Mutación de Línea Germinal , Neoplasias Hematológicas , Células Germinativas , Neoplasias Hematológicas/genética , HumanosRESUMEN
Recruitment of RNA polymerase and initiation factors to the promoter is the only known target for transcription activation and repression in archaea. Whether any of the subsequent steps towards productive transcription elongation are involved in regulation is not known. We characterised how the basal transcription machinery is distributed along genes in the archaeon Saccharolobus solfataricus. We discovered a distinct early elongation phase where RNA polymerases sequentially recruit the elongation factors Spt4/5 and Elf1 to form the transcription elongation complex (TEC) before the TEC escapes into productive transcription. TEC escape is rate-limiting for transcription output during exponential growth. Oxidative stress causes changes in TEC escape that correlate with changes in the transcriptome. Our results thus establish that TEC escape contributes to the basal promoter strength and facilitates transcription regulation. Impaired TEC escape coincides with the accumulation of initiation factors at the promoter and recruitment of termination factor aCPSF1 to the early TEC. This suggests two possible mechanisms for how TEC escape limits transcription, physically blocking upstream RNA polymerases during transcription initiation and premature termination of early TECs.
Asunto(s)
Regiones Promotoras Genéticas , Sulfolobus solfataricus/genética , Elongación de la Transcripción Genética , Sistemas CRISPR-Cas/genética , ADN/metabolismo , ARN Polimerasas Dirigidas por ADN/metabolismo , Estrés Oxidativo/genética , Análisis de Regresión , Sulfolobus solfataricus/crecimiento & desarrolloRESUMEN
RNA polymerase inhibition plays an important role in the regulation of transcription in response to environmental changes and in the virus-host relationship. Here we present the high-resolution structures of two such RNAP-inhibitor complexes that provide the structural bases underlying RNAP inhibition in archaea. The Acidianus two-tailed virus encodes the RIP factor that binds inside the DNA-binding channel of RNAP, inhibiting transcription by occlusion of binding sites for nucleic acid and the transcription initiation factor TFB. Infection with the Sulfolobus Turreted Icosahedral Virus induces the expression of the host factor TFS4, which binds in the RNAP funnel similarly to eukaryotic transcript cleavage factors. However, TFS4 allosterically induces a widening of the DNA-binding channel which disrupts trigger loop and bridge helix motifs. Importantly, the conformational changes induced by TFS4 are closely related to inactivated states of RNAP in other domains of life indicating a deep evolutionary conservation of allosteric RNAP inhibition.