Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 69
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
J Biol Chem ; 300(5): 107250, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38569935

RESUMO

The process of heme binding to a protein is prevalent in almost all forms of life to control many important biological properties, such as O2-binding, electron transfer, gas sensing or to build catalytic power. In these cases, heme typically binds tightly (irreversibly) to a protein in a discrete heme binding pocket, with one or two heme ligands provided most commonly to the heme iron by His, Cys or Tyr residues. Heme binding can also be used as a regulatory mechanism, for example in transcriptional regulation or ion channel control. When used as a regulator, heme binds more weakly, with different heme ligations and without the need for a discrete heme pocket. This makes the characterization of heme regulatory proteins difficult, and new approaches are needed to predict and understand the heme-protein interactions. We apply a modified version of the ProFunc bioinformatics tool to identify heme-binding sites in a test set of heme-dependent regulatory proteins taken from the Protein Data Bank and AlphaFold models. The potential heme binding sites identified can be easily visualized in PyMol and, if necessary, optimized with RosettaDOCK. We demonstrate that the methodology can be used to identify heme-binding sites in proteins, including in cases where there is no crystal structure available, but the methodology is more accurate when the quality of the structural information is high. The ProFunc tool, with the modification used in this work, is publicly available at https://www.ebi.ac.uk/thornton-srv/databases/profunc and can be readily adopted for the examination of new heme binding targets.


Assuntos
Heme , Ligação Proteica , Humanos , Sítios de Ligação , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados de Proteínas , Heme/metabolismo , Heme/química , Hemeproteínas/metabolismo , Hemeproteínas/química , Hemeproteínas/genética , Modelos Moleculares , Estrutura Terciária de Proteína
2.
Blood ; 142(24): 2055-2068, 2023 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-37647632

RESUMO

Rare genetic diseases affect millions, and identifying causal DNA variants is essential for patient care. Therefore, it is imperative to estimate the effect of each independent variant and improve their pathogenicity classification. Our study of 140 214 unrelated UK Biobank (UKB) participants found that each of them carries a median of 7 variants previously reported as pathogenic or likely pathogenic. We focused on 967 diagnostic-grade gene (DGG) variants for rare bleeding, thrombotic, and platelet disorders (BTPDs) observed in 12 367 UKB participants. By association analysis, for a subset of these variants, we estimated effect sizes for platelet count and volume, and odds ratios for bleeding and thrombosis. Variants causal of some autosomal recessive platelet disorders revealed phenotypic consequences in carriers. Loss-of-function variants in MPL, which cause chronic amegakaryocytic thrombocytopenia if biallelic, were unexpectedly associated with increased platelet counts in carriers. We also demonstrated that common variants identified by genome-wide association studies (GWAS) for platelet count or thrombosis risk may influence the penetrance of rare variants in BTPD DGGs on their associated hemostasis disorders. Network-propagation analysis applied to an interactome of 18 410 nodes and 571 917 edges showed that GWAS variants with large effect sizes are enriched in DGGs and their first-order interactors. Finally, we illustrate the modifying effect of polygenic scores for platelet count and thrombosis risk on disease severity in participants carrying rare variants in TUBB1 or PROC and PROS1, respectively. Our findings demonstrate the power of association analyses using large population datasets in improving pathogenicity classifications of rare variants.


Assuntos
Estudo de Associação Genômica Ampla , Trombose , Humanos , Bancos de Espécimes Biológicos , Hemostasia , Hemorragia/genética , Doenças Raras
3.
BMC Bioinformatics ; 21(1): 586, 2020 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-33375946

RESUMO

BACKGROUND: Proteases are key drivers in many biological processes, in part due to their specificity towards their substrates. However, depending on the family and molecular function, they can also display substrate promiscuity which can also be essential. Databases compiling specificity matrices derived from experimental assays have provided valuable insights into protease substrate recognition. Despite this, there are still gaps in our knowledge of the structural determinants. Here, we compile a set of protease crystal structures with bound peptide-like ligands to create a protocol for modelling substrates bound to protease structures, and for studying observables associated to the binding recognition. RESULTS: As an application, we modelled a subset of protease-peptide complexes for which experimental cleavage data are available to compare with informational entropies obtained from protease-specificity matrices. The modelled complexes were subjected to conformational sampling using the Backrub method in Rosetta, and multiple observables from the simulations were calculated and compared per peptide position. We found that some of the calculated structural observables, such as the relative accessible surface area and the interaction energy, can help characterize a protease's substrate recognition, giving insights for the potential prediction of novel substrates by combining additional approaches. CONCLUSION: Overall, our approach provides a repository of protease structures with annotated data, and an open source computational protocol to reproduce the modelling and dynamic analysis of the protease-peptide complexes.


Assuntos
Modelos Moleculares , Peptídeo Hidrolases/metabolismo , Peptídeos/química , Peptídeos/metabolismo , Automação , Ligantes , Peptídeo Hidrolases/química , Conformação Proteica , Software , Especificidade por Substrato
4.
Bioinformatics ; 35(22): 4854-4856, 2019 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-31192369

RESUMO

MOTIVATION: Understanding the protein structural context and patterning on proteins of genomic variants can help to separate benign from pathogenic variants and reveal molecular consequences. However, mapping genomic coordinates to protein structures is non-trivial, complicated by alternative splicing and transcript evidence. RESULTS: Here we present VarMap, a web tool for mapping a list of chromosome coordinates to canonical UniProt sequences and associated protein 3D structures, including validation checks, and annotating them with structural information. AVAILABILITY AND IMPLEMENTATION: https://www.ebi.ac.uk/thornton-srv/databases/VarMap. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genômica , Software , Sequência de Aminoácidos , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Proteínas
5.
Hum Mol Genet ; 26(3): 519-526, 2017 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-28053047

RESUMO

Haploinsufficiency in DYRK1A is associated with a recognizable developmental syndrome, though the mechanism of action of pathogenic missense mutations is currently unclear. Here we present 19 de novo mutations in this gene, including five missense mutations, identified by the Deciphering Developmental Disorder study. Protein structural analysis reveals that the missense mutations are either close to the ATP or peptide binding-sites within the kinase domain, or are important for protein stability, suggesting they lead to a loss of the protein's function mechanism. Furthermore, there is some correlation between the magnitude of the change and the severity of the resultant phenotype. A comparison of the distribution of the pathogenic mutations along the length of DYRK1A with that of natural variants, as found in the ExAC database, confirms that mutations in the N-terminal end of the kinase domain are more disruptive of protein function. In particular, pathogenic mutations occur in significantly closer proximity to the ATP and the substrate peptide than the natural variants. Overall, we suggest that de novo dominant mutations in DYRK1A account for nearly 0.5% of severe developmental disorders due to substantially reduced kinase function.


Assuntos
Transtorno Autístico/genética , Deficiências do Desenvolvimento/genética , Deficiência Intelectual/genética , Proteínas Serina-Treonina Quinases/genética , Proteínas Tirosina Quinases/genética , Transtorno Autístico/patologia , Deficiências do Desenvolvimento/fisiopatologia , Feminino , Haploinsuficiência/genética , Humanos , Deficiência Intelectual/patologia , Masculino , Mutação , Mutação de Sentido Incorreto , Linhagem , Fenótipo , Conformação Proteica , Proteínas Serina-Treonina Quinases/química , Proteínas Tirosina Quinases/química , Relação Estrutura-Atividade , Quinases Dyrk
6.
Hum Mol Genet ; 25(5): 927-35, 2016 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-26740553

RESUMO

We present a generic, multidisciplinary approach for improving our understanding of novel missense variants in recently discovered disease genes exhibiting genetic heterogeneity, by combining clinical and population genetics with protein structural analysis. Using six new de novo missense diagnoses in TBL1XR1 from the Deciphering Developmental Disorders study, together with population variation data, we show that the ß-propeller structure of the ubiquitous WD40 domain provides a convincing way to discriminate between pathogenic and benign variation. Children with likely pathogenic mutations in this gene have severely delayed language development, often accompanied by intellectual disability, autism, dysmorphology and gastrointestinal problems. Amino acids affected by likely pathogenic missense mutations are either crucial for the stability of the fold, forming part of a highly conserved symmetrically repeating hydrogen-bonded tetrad, or located at the top face of the ß-propeller, where 'hotspot' residues affect the binding of ß-catenin to the TBLR1 protein. In contrast, those altered by population variation are significantly less likely to be spatially clustered towards the top face or to be at buried or highly conserved residues. This result is useful not only for interpreting benign and pathogenic missense variants in this gene, but also in other WD40 domains, many of which are associated with disease.


Assuntos
Deficiências do Desenvolvimento/diagnóstico , Deficiências do Desenvolvimento/genética , Heterogeneidade Genética , Mutação de Sentido Incorreto , Proteínas Nucleares/química , Receptores Citoplasmáticos e Nucleares/química , Proteínas Repressoras/química , beta Catenina/química , Sequência de Aminoácidos , Criança , Pré-Escolar , Deficiências do Desenvolvimento/metabolismo , Deficiências do Desenvolvimento/patologia , Feminino , Expressão Gênica , Genética Populacional , Humanos , Ligação de Hidrogênio , Masculino , Modelos Moleculares , Dados de Sequência Molecular , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Prognóstico , Ligação Proteica , Domínios Proteicos , Estrutura Secundária de Proteína , Receptores Citoplasmáticos e Nucleares/genética , Receptores Citoplasmáticos e Nucleares/metabolismo , Proteínas Repressoras/genética , Proteínas Repressoras/metabolismo , Alinhamento de Sequência , beta Catenina/genética , beta Catenina/metabolismo
7.
Nucleic Acids Res ; 44(W1): W416-23, 2016 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-27151195

RESUMO

Many applications, such as protein design, homology modeling, flexible docking, etc. require the prediction of a protein's optimal side-chain conformations from just its amino acid sequence and backbone structure. Side-chain prediction (SCP) is an NP-hard energy minimization problem. Here, we present BetaSCPWeb which efficiently computes a conformation close to optimal using a geometry-prioritization method based on the Voronoi diagram of spherical atoms. Its outputs are visual, textual and PDB file format. The web server is free and open to all users at http://voronoi.hanyang.ac.kr/betascpweb with no login requirement.


Assuntos
Internet , Matemática , Proteínas/química , Software , Algoritmos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Modelos Moleculares , Conformação Proteica , Termodinâmica
8.
Nucleic Acids Res ; 43(W1): W413-8, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-25904629

RESUMO

Molecular cavities, which include voids and channels, are critical for molecular function. We present a webserver, BetaCavityWeb, which computes these cavities for a given molecular structure and a given spherical probe, and reports their geometrical properties: volume, boundary area, buried area, etc. The server's algorithms are based on the Voronoi diagram of atoms and its derivative construct: the beta-complex. The correctness of the computed result and computational efficiency are both mathematically guaranteed. BetaCavityWeb is freely accessible at the Voronoi Diagram Research Center (VDRC) (http://voronoi.hanyang.ac.kr/betacavityweb).


Assuntos
Conformação Molecular , Software , Algoritmos , Internet , Modelos Moleculares , Conformação Proteica
9.
Nucleic Acids Res ; 43(Database issue): D376-81, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25348408

RESUMO

The latest version of the CATH-Gene3D protein structure classification database (4.0, http://www.cathdb.info) provides annotations for over 235,000 protein domain structures and includes 25 million domain predictions. This article provides an update on the major developments in the 2 years since the last publication in this journal including: significant improvements to the predictive power of our functional families (FunFams); the release of our 'current' putative domain assignments (CATH-B); a new, strictly non-redundant data set of CATH domains suitable for homology benchmarking experiments (CATH-40) and a number of improvements to the web pages.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Estrutura Terciária de Proteína , Genômica , Internet , Estrutura Terciária de Proteína/genética , Proteínas/classificação
10.
Nucleic Acids Res ; 42(Database issue): D292-6, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24153109

RESUMO

PDBsum, http://www.ebi.ac.uk/pdbsum, is a website providing numerous pictorial analyses of each entry in the Protein Data Bank. It portrays the structural features of all proteins, DNA and ligands in the entry, as well as depicting the interactions between them. The latest features, described here, include annotation of human protein sequences with their naturally occurring amino acid variants, dynamic graphs showing the relationships between related protein domain architectures, analyses of ligand binding clusters across different experimental determinations of the same protein, analyses of tunnels in proteins and new search options.


Assuntos
Bases de Dados de Proteínas , Conformação Proteica , Análise por Conglomerados , Gráficos por Computador , Desenho de Fármacos , Variação Genética , Humanos , Internet , Ligantes , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/genética
11.
BMC Bioinformatics ; 15: 379, 2014 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-25403510

RESUMO

BACKGROUND: Enzyme active sites can be connected to the exterior environment by one or more channels passing through the protein. Despite our current knowledge of enzyme structure and function, surprisingly little is known about how often channels are present or about any structural features such channels may have in common. RESULTS: Here, we analyze the long channels (i.e. >15 Å) leading to the active sites of 4,306 enzyme structures. We find that over 64% of enzymes contain two or more long channels, their typical length being 28 Å. We show that amino acid compositions of the channel significantly differ both to the composition of the active site, surface and interior of the protein. CONCLUSIONS: The majority of enzymes have buried active sites accessible via a network of access channels. This indicates that enzymes tend to have buried active sites, with channels controlling access to, and egress from, them, and that suggests channels may play a key role in helping determine enzyme substrate.


Assuntos
Aminoácidos/química , Enzimas/química , Canais Iônicos/fisiologia , Aminoácidos/genética , Domínio Catalítico , Enzimas/genética , Humanos , Modelos Moleculares , Conformação Proteica
12.
Proteins ; 82(9): 1829-49, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-24677176

RESUMO

Molecular external structure is important for molecular function, with voids on the surface and interior being one of the most important features. Hence, recognition of molecular voids and accurate computation of their geometrical properties, such as volume, area and topology, are crucial, yet most popular algorithms are based on the crude use of sampling points and thus are approximations even with a significant amount of computation. In this article, we propose an analytic approach to the problem using the Voronoi diagram of atoms and the beta-complex. The correctness and efficiency of the proposed algorithm is mathematically proved and experimentally verified. The benchmark test clearly shows the superiority of BetaVoid to two popular programs: VOIDOO and CASTp. The proposed algorithm is implemented in the BetaVoid program which is freely available at the Voronoi Diagram Research Center (http://voronoi.hanyang.ac.kr).


Assuntos
Modelos Moleculares , Conformação Molecular , Dobramento de Proteína , Proteínas/ultraestrutura , Algoritmos , Simulação por Computador , Estrutura Terciária de Proteína
13.
PLoS Comput Biol ; 9(12): e1003382, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24348229

RESUMO

The 1000 Genomes Project data provides a natural background dataset for amino acid germline mutations in humans. Since the direction of mutation is known, the amino acid exchange matrix generated from the observed nucleotide variants is asymmetric and the mutabilities of the different amino acids are very different. These differences predominantly reflect preferences for nucleotide mutations in the DNA (especially the high mutation rate of the CpG dinucleotide, which makes arginine mutability very much higher than other amino acids) rather than selection imposed by protein structure constraints, although there is evidence for the latter as well. The variants occur predominantly on the surface of proteins (82%), with a slight preference for sites which are more exposed and less well conserved than random. Mutations to functional residues occur about half as often as expected by chance. The disease-associated amino acid variant distributions in OMIM are radically different from those expected on the basis of the 1000 Genomes dataset. The disease-associated variants preferentially occur in more conserved sites, compared to 1000 Genomes mutations. Many of the amino acid exchange profiles appear to exhibit an anti-correlation, with common exchanges in one dataset being rare in the other. Disease-associated variants exhibit more extreme differences in amino acid size and hydrophobicity. More modelling of the mutational processes at the nucleotide level is needed, but these observations should contribute to an improved prediction of the effects of specific variants in humans.


Assuntos
Aminoácidos/genética , Bases de Dados Genéticas , Genoma Humano , Humanos , Mutação , Proteínas/química , Proteínas/genética
14.
Nat Rev Genet ; 9(2): 141-51, 2008 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-18160966

RESUMO

Detailed knowledge of the three-dimensional structures of biological molecules has had an enormous impact on all areas of biological science, including genetics, as structure can reveal the fine details of how molecules perform their biological functions. Here we consider how changes in protein sequence affect the corresponding 3D structure, and describe how structural information about proteins, DNA and chromatin has shed light on gene regulatory mechanisms and the storage and transmission of epigenetic information. Finally, we describe how structure determination is benefiting from the high-throughput technologies of the worldwide structural genomics projects.


Assuntos
Imageamento Tridimensional/métodos , Biologia Molecular/métodos , Proteínas/química , Proteínas/ultraestrutura , Sequência de Aminoácidos , Animais , Cromatina/química , Bases de Dados de Proteínas , Epigênese Genética/fisiologia , Evolução Molecular , Regulação da Expressão Gênica/fisiologia , Genômica/métodos , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Mutagênese Insercional/fisiologia , Conformação de Ácido Nucleico , Mutação Puntual/fisiologia , Proteínas/genética , Deleção de Sequência/fisiologia , Relação Estrutura-Atividade
15.
Nucleic Acids Res ; 40(Database issue): D776-82, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22006843

RESUMO

FunTree is a new resource that brings together sequence, structure, phylogenetic, chemical and mechanistic information for structurally defined enzyme superfamilies. Gathering together this range of data into a single resource allows the investigation of how novel enzyme functions have evolved within a structurally defined superfamily as well as providing a means to analyse trends across many superfamilies. This is done not only within the context of an enzyme's sequence and structure but also the relationships of their reactions. Developed in tandem with the CATH database, it currently comprises 276 superfamilies covering ~1800 (70%) of sequence assigned enzyme reactions. Central to the resource are phylogenetic trees generated from structurally informed multiple sequence alignments using both domain structural alignments supplemented with domain sequences and whole sequence alignments based on commonality of multi-domain architectures. These trees are decorated with functional annotations such as metabolite similarity as well as annotations from manually curated resources such the catalytic site atlas and MACiE for enzyme mechanisms. The resource is freely available through a web interface: www.ebi.ac.uk/thorton-srv/databases/FunTree.


Assuntos
Bases de Dados de Proteínas , Enzimas/química , Enzimas/classificação , Evolução Biológica , Enzimas/metabolismo , Filogenia , Estrutura Terciária de Proteína , Alinhamento de Sequência , Análise de Sequência de Proteína
16.
Acta Crystallogr D Biol Crystallogr ; 69(Pt 12): 2395-402, 2013 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-24311580

RESUMO

Identifying which ligands might bind to a protein before crystallization trials could provide a significant saving in time and resources. LigSearch, a web server aimed at predicting ligands that might bind to and stabilize a given protein, has been developed. Using a protein sequence and/or structure, the system searches against a variety of databases, combining available knowledge, and provides a clustered and ranked output of possible ligands. LigSearch can be accessed at http://www.ebi.ac.uk/thornton-srv/databases/LigSearch.


Assuntos
Bases de Dados de Produtos Farmacêuticos , Bases de Dados de Proteínas , Proteínas/metabolismo , Software , Sítios de Ligação , Sistemas Inteligentes , Internet , Ligantes , Modelos Moleculares , Ligação Proteica , Proteínas/química , Ferramenta de Busca
17.
Nat Methods ; 7(3 Suppl): S42-55, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-20195256

RESUMO

Structural biology is rapidly accumulating a wealth of detailed information about protein function, binding sites, RNA, large assemblies and molecular motions. These data are increasingly of interest to a broader community of life scientists, not just structural experts. Visualization is a primary means for accessing and using these data, yet visualization is also a stumbling block that prevents many life scientists from benefiting from three-dimensional structural data. In this review, we focus on key biological questions where visualizing three-dimensional structures can provide insight and describe available methods and tools.


Assuntos
Processamento de Imagem Assistida por Computador , Substâncias Macromoleculares , Cristalografia por Raios X , Internet , Modelos Moleculares , Conformação Molecular
18.
Biopolymers ; 99(3): 183-8, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23023892

RESUMO

In the 40 years since its inception, the Protein Data Bank (PDB) has amassed over 80,000 experimentally determined structural models of proteins, plus many models of DNA and RNA fragments. The majority of the protein models have contributed, in some way, to an understanding of their particular protein's function, be it through the conformation of its catalytic residues, the details of its interactions with other proteins, substrate molecules, DNA, and so on. However, the totality of the data in the PDB provides a rich source of more generalized knowledge about proteins, their molecular biology, and evolution. Here, we describe how the focus of protein structural analysis has developed over the past 40 years. © 2012 Wiley Periodicals, Inc. Biopolymers 99: 183-188, 2013.


Assuntos
Bases de Dados de Proteínas/história , Armazenamento e Recuperação da Informação , Proteínas/química , História do Século XX , História do Século XXI , Modelos Moleculares , Proteínas Tirosina Fosfatases/química
19.
PLoS Comput Biol ; 8(3): e1002403, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22396634

RESUMO

In order to understand the evolution of enzyme reactions and to gain an overview of biological catalysis we have combined sequence and structural data to generate phylogenetic trees in an analysis of 276 structurally defined enzyme superfamilies, and used these to study how enzyme functions have evolved. We describe in detail the analysis of two superfamilies to illustrate different paradigms of enzyme evolution. Gathering together data from all the superfamilies supports and develops the observation that they have all evolved to act on a diverse set of substrates, whilst the evolution of new chemistry is much less common. Despite that, by bringing together so much data, we can provide a comprehensive overview of the most common and rare types of changes in function. Our analysis demonstrates on a larger scale than previously studied, that modifications in overall chemistry still occur, with all possible changes at the primary level of the Enzyme Commission (E.C.) classification observed to a greater or lesser extent. The phylogenetic trees map out the evolutionary route taken within a superfamily, as well as all the possible changes within a superfamily. This has been used to generate a matrix of observed exchanges from one enzyme function to another, revealing the scale and nature of enzyme evolution and that some types of exchanges between and within E.C. classes are more prevalent than others. Surprisingly a large proportion (71%) of all known enzyme functions are performed by this relatively small set of 276 superfamilies. This reinforces the hypothesis that relatively few ancient enzymatic domain superfamilies were progenitors for most of the chemistry required for life.


Assuntos
Enzimas/química , Enzimas/fisiologia , Evolução Molecular , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Dados de Sequência Molecular , Relação Estrutura-Atividade
20.
J Mol Biol ; 435(2): 167892, 2023 01 30.
Artigo em Inglês | MEDLINE | ID: mdl-36410474

RESUMO

Constrained Coding Regions (CCRs) in the human genome have been derived from DNA sequencing data of large cohorts of healthy control populations, available in the Genome Aggregation Database (gnomAD) [1]. They identify regions depleted of protein-changing variants and thus identify segments of the genome that have been constrained during human evolution. By mapping these DNA-defined regions from genomic coordinates onto the corresponding protein positions and combining this information with protein annotations, we have explored the distribution of CCRs and compared their co-occurrence with different protein functional features, previously annotated at the amino acid level in public databases. As expected, our results reveal that functional amino acids involved in interactions with DNA/RNA, protein-protein contacts and catalytic sites are the protein features most likely to be highly constrained for variation in the control population. More surprisingly, we also found that linear motifs, linear interacting peptides (LIPs), disorder-order transitions upon binding with other protein partners and liquid-liquid phase separating (LLPS) regions are also strongly associated with high constraint for variability. We also compared intra-species constraints in the human CCRs with inter-species conservation and functional residues to explore how such CCRs may contribute to the analysis of protein variants. As has been previously observed, CCRs are only weakly correlated with conservation, suggesting that intraspecies constraints complement interspecies conservation and can provide more information to interpret variant effects.


Assuntos
Genoma Humano , Fases de Leitura Aberta , Proteínas , Humanos , Sequência de Bases , Genoma Humano/genética , Genômica , Proteínas/genética , Mapeamento Cromossômico
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa