RESUMEN
Genomic variation can impact normal biological function in complex ways and so understanding variant effects requires a broad range of data to be coherently assimilated. Whilst the volume of human variant data and relevant annotations has increased, the corresponding increase in the breadth of participating fields, standards and versioning mean that moving between genomic, coding, protein and structure positions is increasingly complex. In turn this makes investigating variants in diverse formats and assimilating annotations from different resources challenging. ProtVar addresses these issues to facilitate the contextualization and interpretation of human missense variation with unparalleled flexibility and ease of accessibility for use by the broadest range of researchers. By precalculating all possible variants in the human proteome it offers near instantaneous mapping between all relevant data types. It also combines data and analyses from a plethora of resources to bring together genomic, protein sequence and function annotations as well as structural insights and predictions to better understand the likely effect of missense variation in humans. It is offered as an intuitive web server https://www.ebi.ac.uk/protvar where data can be explored and downloaded, and can be accessed programmatically via an API.
Asunto(s)
Mutación Missense , Programas Informáticos , Humanos , Bases de Datos de Proteínas , Anotación de Secuencia Molecular , Proteoma/genética , Proteínas/genética , Proteínas/química , Internet , Genómica/métodosRESUMEN
MOTIVATION: Understanding the protein structural context and patterning on proteins of genomic variants can help to separate benign from pathogenic variants and reveal molecular consequences. However, mapping genomic coordinates to protein structures is non-trivial, complicated by alternative splicing and transcript evidence. RESULTS: Here we present VarMap, a web tool for mapping a list of chromosome coordinates to canonical UniProt sequences and associated protein 3D structures, including validation checks, and annotating them with structural information. AVAILABILITY AND IMPLEMENTATION: https://www.ebi.ac.uk/thornton-srv/databases/VarMap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Genómica , Programas Informáticos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Anotación de Secuencia Molecular , ProteínasRESUMEN
Mineral surfaces are often proposed as the sites of critical processes in the emergence of life. Clay minerals in particular are thought to play significant roles in the origin of life including polymerizing, concentrating, organizing, and protecting biopolymers. In these scenarios, the impact of minerals on biopolymer folding is expected to influence evolutionary processes. These processes include both the initial emergence of functional structures in the presence of the mineral and the subsequent transition away from the mineral-associated niche. The initial evolution of function depends upon the number and distribution of sequences capable of functioning in the presence of the mineral, and the transition to new environments depends upon the overlap between sequences that evolve on the mineral surface and sequences that can perform the same functions in the mineral's absence. To examine these processes, we evolved self-cleaving ribozymes in vitro in the presence or absence of Na-saturated montmorillonite clay mineral particles. Starting from a shared population of random sequences, RNA populations were evolved in parallel, along separate evolutionary trajectories. Comparative sequence analysis and activity assays show that the impact of this clay mineral on functional structure selection was minimal; it neither prevented common structures from emerging, nor did it promote the emergence of new structures. This suggests that montmorillonite does not improve RNA's ability to evolve functional structures; however, it also suggests that RNAs that do evolve in contact with montmorillonite retain the same structures in mineral-free environments, potentially facilitating an evolutionary transition away from a mineral-associated niche.
Asunto(s)
Minerales/química , ARN Catalítico/química , Silicatos de Aluminio , Arcilla , Propiedades de SuperficieRESUMEN
The importance of elucidating the three dimensional structures of RNA molecules is becoming increasingly clear. However, traditional protein structural techniques such as NMR and X-ray crystallography have several important drawbacks when probing long RNA molecules. Single molecule Förster resonance energy transfer (smFRET) has emerged as a useful alternative as it allows native sequences to be probed in physiological conditions and allows multiple conformations to be probed simultaneously. This review serves to describe the method of generating a three dimensional RNA structure from smFRET data from the biochemical probing of the secondary structure to the computational refinement of the final model.
Asunto(s)
Transferencia Resonante de Energía de Fluorescencia , ARN/química , Secuencia de Bases , Polarización de Fluorescencia , Colorantes Fluorescentes/química , Modelos Moleculares , Conformación de Ácido Nucleico , ARN/ultraestructura , Coloración y EtiquetadoRESUMEN
Similarities and differences between amino acids define the rates at which they substitute for one another within protein sequences and the patterns by which these sequences form protein structures. However, there exist many ways to measure similarity, whether one considers the molecular attributes of individual amino acids, the roles that they play within proteins, or some nuanced contribution of each. One popular approach to representing these relationships is to divide the 20 amino acids of the standard genetic code into groups, thereby forming a simplified amino acid alphabet. Here, we develop a method to compare or combine different simplified alphabets, and apply it to 34 simplified alphabets from the scientific literature. We use this method to show that while different suggestions vary and agree in non-intuitive ways, they combine to reveal a consensus view of amino acid similarity that is clearly rooted in physico-chemistry.
Asunto(s)
Aminoácidos/química , Aminoácidos/clasificación , Análisis de Secuencia de Proteína/métodos , Algoritmos , Código Genético , Proteínas/química , Alineación de SecuenciaRESUMEN
Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein-protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder-order transitions upon binding with other protein partners and liquid-liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects.
Asunto(s)
Genoma Humano , Sistemas de Lectura Abierta , Proteínas , Humanos , Secuencia de Bases , Genoma Humano/genética , Genómica , Proteínas/genética , Mapeo CromosómicoRESUMEN
Loss-of-function of DDX3X is a leading cause of neurodevelopmental disorders (NDD) in females. DDX3X is also a somatically mutated cancer driver gene proposed to have tumour promoting and suppressing effects. We perform saturation genome editing of DDX3X, testing in vitro the functional impact of 12,776 nucleotide variants. We identify 3432 functionally abnormal variants, in three distinct classes. We train a machine learning classifier to identify functionally abnormal variants of NDD-relevance. This classifier has at least 97% sensitivity and 99% specificity to detect variants pathogenic for NDD, substantially out-performing in silico predictors, and resolving up to 93% of variants of uncertain significance. Moreover, functionally-abnormal variants can account for almost all of the excess nonsynonymous DDX3X somatic mutations seen in DDX3X-driven cancers. Systematic maps of variant effects generated in experimentally tractable cell types have the potential to transform clinical interpretation of both germline and somatic disease-associated variation.
Asunto(s)
Neoplasias , Trastornos del Neurodesarrollo , Femenino , Humanos , Edición Génica , Virulencia , Trastornos del Neurodesarrollo/genética , Neoplasias/genética , Células Germinativas , Mutación de Línea Germinal , ARN Helicasas DEAD-box/genéticaRESUMEN
VarSite is a web server mapping known disease-associated variants from UniProt and ClinVar, together with natural variants from gnomAD, onto protein 3D structures in the Protein Data Bank. The analyses are primarily image-based and provide both an overview for each human protein, as well as a report for any specific variant of interest. The information can be useful in assessing whether a given variant might be pathogenic or benign. The structural annotations for each position in the protein include protein secondary structure, interactions with ligand, metal, DNA/RNA, or other protein, and various measures of a given variant's possible impact on the protein's function. The 3D locations of the disease-associated variants can be viewed interactively via the 3dmol.js JavaScript viewer, as well as in RasMol and PyMOL. Users can search for specific variants, or sets of variants, by providing the DNA coordinates of the base change(s) of interest. Additionally, various agglomerative analyses are given, such as the mapping of disease and natural variants onto specific Pfam or CATH domains. The server is freely accessible to all at: https://www.ebi.ac.uk/thornton-srv/databases/VarSite.
Asunto(s)
Bases de Datos Genéticas , Proteínas/química , Proteínas/genética , Nube Computacional , Biología Computacional , Predisposición Genética a la Enfermedad , Variación Genética , Humanos , Modelos Moleculares , Conformación Proteica , Interfaz Usuario-ComputadorRESUMEN
We estimated the genome-wide contribution of recessive coding variation in 6040 families from the Deciphering Developmental Disorders study. The proportion of cases attributable to recessive coding variants was 3.6% in patients of European ancestry, compared with 50% explained by de novo coding mutations. It was higher (31%) in patients with Pakistani ancestry, owing to elevated autozygosity. Half of this recessive burden is attributable to known genes. We identified two genes not previously associated with recessive developmental disorders, KDM5B and EIF3F, and functionally validated them with mouse and cellular models. Our results suggest that recessive coding variants account for a small fraction of currently undiagnosed nonconsanguineous individuals, and that the role of noncoding variants, incomplete penetrance, and polygenic mechanisms need further exploration.
Asunto(s)
Discapacidades del Desarrollo/genética , Genes Recesivos , Código Genético , Variación Genética , Penetrancia , Animales , Modelos Animales de Enfermedad , Factor 3 de Iniciación Eucariótica/genética , Europa (Continente) , Estudio de Asociación del Genoma Completo , Humanos , Histona Demetilasas con Dominio de Jumonji/genética , Ratones , Proteínas Nucleares/genética , Pakistán , Filogenia , Proteínas Represoras/genéticaRESUMEN
We have detected a concentration of boron in martian clay far in excess of that in any previously reported extra-terrestrial object. This enrichment indicates that the chemistry necessary for the formation of ribose, a key component of RNA, could have existed on Mars since the formation of early clay deposits, contemporary to the emergence of life on Earth. Given the greater similarity of Earth and Mars early in their geological history, and the extensive disruption of Earth's earliest mineralogy by plate tectonics, we suggest that the conditions for prebiotic ribose synthesis may be better understood by further Mars exploration.
Asunto(s)
Silicatos de Aluminio/química , Boro/análisis , Medio Ambiente Extraterrestre/química , Marte , Arcilla , Planeta Tierra , Exobiología , Geología , Origen de la VidaRESUMEN
HIV-1 genomic RNA has a noncoding 5' region containing sequential conserved structural motifs that control many parts of the life cycle. Very limited data exist on their three-dimensional (3D) conformation and, hence, how they work structurally. To assemble a working model, we experimentally reassessed secondary structure elements of a 240-nt region and used single-molecule distances, derived from fluorescence resonance energy transfer, between defined locations in these elements as restraints to drive folding of the secondary structure into a 3D model with an estimated resolution below 10 Å. The folded 3D model satisfying the data is consensual with short nuclear-magnetic-resonance-solved regions and reveals previously unpredicted motifs, offering insight into earlier functional assays. It is a 3D representation of this entire region, with implications for RNA dimerization and protein binding during regulatory steps. The structural information of this highly conserved region of the virus has the potential to reveal promising therapeutic targets.