RESUMO
Proteolytic signaling, or regulated proteolysis, is an essential part of many important pathways such as Notch, Wnt, and Hedgehog. How the structure of the cleaved substrate regions influences the efficacy of proteolytic processing remains underexplored. Here, we analyzed the relative importance in proteolysis of various structural features derived from substrate sequences using a dataset of more than 5000 experimentally verified proteolytic events captured in CutDB. Accessibility to the solvent was recognized as an essential property of a proteolytically processed polypeptide chain. Proteolytic events were found nearly uniformly distributed among three types of secondary structure, although with some enrichment in loops. Cleavages in α-helices were found to be relatively abundant in regions apparently prone to unfolding, while cleavages in ß-structures tended to be located at the periphery of ß-sheets. Application of the same statistical procedures to proteolytic events divided into separate sets according to the catalytic classes of proteases proved consistency of the results and confirmed that the structural mechanisms of proteolysis are universal. The estimated prediction power of sequence-derived structural features, which turned out to be sufficiently high, presents a rationale for their use in bioinformatic prediction of proteolytic events.
Assuntos
Sequência de Aminoácidos , Biologia Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Proteólise , Modelos Estatísticos , Conformação Proteica , Curva ROCRESUMO
RegTransBase is a manually curated database of regulatory interactions in prokaryotes that captures the knowledge in public scientific literature using a controlled vocabulary. Although several databases describing interactions between regulatory proteins and their binding sites are already being maintained, they either focus mostly on the model organisms Escherichia coli and Bacillus subtilis or are entirely computationally derived. RegTransBase describes a large number of regulatory interactions reported in many organisms and contains the following types of experimental data: the activation or repression of transcription by an identified direct regulator, determining the transcriptional regulatory function of a protein (or RNA) directly binding to DNA (RNA), mapping or prediction of a binding site for a regulatory protein and characterization of regulatory mutations. Currently, RegTransBase content is derived from about 3000 relevant articles describing over 7000 experiments in relation to 128 microbes. It contains data on the regulation of about 7500 genes and evidence for 6500 interactions with 650 regulators. RegTransBase also contains manually created position weight matrices (PWM) that can be used to identify candidate regulatory sites in over 60 species. RegTransBase is available at http://regtransbase.lbl.gov.
Assuntos
Proteínas de Bactérias/metabolismo , Bases de Dados de Ácidos Nucleicos , Regulação Bacteriana da Expressão Gênica , Genoma Bacteriano , Elementos Reguladores de Transcrição , Fatores de Transcrição/metabolismo , Sítios de Ligação , Internet , Interface Usuário-ComputadorRESUMO
We develop the means to mine for associative features in biological data. The hybrid reasoning schema for deterministic machine learning and its implementation via logic programming is presented. The methodology of mining for correlation between features is illustrated by the prediction tasks for protein secondary structure and phylogenetic profiles. The suggested methodology leads to a clearer approach to hierarchical classification of proteins and a novel way to represent evolutionary relationships. Comparative analysis of Jasmine and other statistical and deterministic systems (including Explanation-Based Learning and Inductive Logic Programming) are outlined. Advantages of using deterministic versus statistical data mining approaches for high-level exploration of correlation structure are analyzed.