Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
Mais filtros

Bases de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 121(21): e2400260121, 2024 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-38743624

RESUMO

We introduce ZEPPI (Z-score Evaluation of Protein-Protein Interfaces), a framework to evaluate structural models of a complex based on sequence coevolution and conservation involving residues in protein-protein interfaces. The ZEPPI score is calculated by comparing metrics for an interface to those obtained from randomly chosen residues. Since contacting residues are defined by the structural model, this obviates the need to account for indirect interactions. Further, although ZEPPI relies on species-paired multiple sequence alignments, its focus on interfacial residues allows it to leverage quite shallow alignments. ZEPPI can be implemented on a proteome-wide scale and is applied here to millions of structural models of dimeric complexes in the Escherichia coli and human interactomes found in the PrePPI database. PrePPI's scoring function is based primarily on the evaluation of protein-protein interfaces, and ZEPPI adds a new feature to this analysis through the incorporation of evolutionary information. ZEPPI performance is evaluated through applications to experimentally determined complexes and to decoys from the CASP-CAPRI experiment. As we discuss, the standard CAPRI scores used to evaluate docking models are based on model quality and not on the ability to give yes/no answers as to whether two proteins interact. ZEPPI is able to detect weak signals from PPI models that the CAPRI scores define as incorrect and, similarly, to identify potential PPIs defined as low confidence by the current PrePPI scoring function. A number of examples that illustrate how the combination of PrePPI and ZEPPI can yield functional hypotheses are provided.


Assuntos
Proteoma , Proteoma/metabolismo , Humanos , Mapeamento de Interação de Proteínas/métodos , Modelos Moleculares , Escherichia coli/metabolismo , Escherichia coli/genética , Bases de Dados de Proteínas , Ligação Proteica , Proteínas de Escherichia coli/metabolismo , Proteínas de Escherichia coli/química , Proteínas de Escherichia coli/genética , Proteínas/química , Proteínas/metabolismo , Alinhamento de Sequência
2.
J Biol Chem ; 296: 100562, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33744294

RESUMO

Systems biology is a data-heavy field that focuses on systems-wide depictions of biological phenomena necessarily sacrificing a detailed characterization of individual components. As an example, genome-wide protein interaction networks are widely used in systems biology and continuously extended and refined as new sources of evidence become available. Despite the vast amount of information about individual protein structures and protein complexes that has accumulated in the past 50 years in the Protein Data Bank, the data, computational tools, and language of structural biology are not an integral part of systems biology. However, increasing effort has been devoted to this integration, and the related literature is reviewed here. Relationships between proteins that are detected via structural similarity offer a rich source of information not available from sequence similarity, and homology modeling can be used to leverage Protein Data Bank structures to produce 3D models for a significant fraction of many proteomes. A number of structure-informed genomic and cross-species (i.e., virus-host) interactomes will be described, and the unique information they provide will be illustrated with a number of examples. Tissue- and tumor-specific interactomes have also been developed through computational strategies that exploit patient information and through genetic interactions available from increasingly sensitive screens. Strategies to integrate structural information with these alternate data sources will be described. Finally, efforts to link protein structure space with chemical compound space offer novel sources of information in drug design, off-target identification, and the identification of targets for compounds found to be effective in phenotypic screens.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Biologia de Sistemas , Conformação Proteica , Mapas de Interação de Proteínas
3.
Hum Genet ; 139(11): 1443-1454, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-32514796

RESUMO

Dilated cardiomyopathy (DCM) belongs to the most frequent forms of cardiomyopathy mainly characterized by cardiac dilatation and reduced systolic function. Although most cases of DCM are classified as sporadic, 20-30% of cases show a heritable pattern. Familial forms of DCM are genetically heterogeneous, and mutations in several genes have been identified that most commonly play a role in cytoskeleton and sarcomere-associated processes. Still, a large number of familial cases remain unsolved. Here, we report five individuals from three independent families who presented with severe dilated cardiomyopathy during the neonatal period. Using whole-exome sequencing (WES), we identified causative, compound heterozygous missense variants in RPL3L (ribosomal protein L3-like) in all the affected individuals. The identified variants co-segregated with the disease in each of the three families and were absent or very rare in the human population, in line with an autosomal recessive inheritance pattern. They are located within the conserved RPL3 domain of the protein and were classified as deleterious by several in silico prediction software applications. RPL3L is one of the four non-canonical riboprotein genes and it encodes the 60S ribosomal protein L3-like protein that is highly expressed only in cardiac and skeletal muscle. Three-dimensional homology modeling and in silico analysis of the affected residues in RPL3L indicate that the identified changes specifically alter the interaction of RPL3L with the RNA components of the 60S ribosomal subunit and thus destabilize its binding to the 60S subunit. In conclusion, we report that bi-allelic pathogenic variants in RPL3L are causative of an early-onset, severe neonatal form of dilated cardiomyopathy, and we show for the first time that cytoplasmic ribosomal proteins are involved in the pathogenesis of non-syndromic cardiomyopathies.


Assuntos
Cardiomiopatia Dilatada/genética , Mutação de Sentido Incorreto/genética , Proteínas Ribossômicas/genética , Ribossomos/genética , Alelos , Exoma/genética , Feminino , Coração/fisiopatologia , Humanos , Lactente , Recém-Nascido , Masculino , Músculo Esquelético/fisiopatologia , Linhagem , Fenótipo , RNA/genética , Proteína Ribossômica L3
4.
N Engl J Med ; 376(8): 742-754, 2017 02 23.
Artigo em Inglês | MEDLINE | ID: mdl-28121514

RESUMO

BACKGROUND: The DiGeorge syndrome, the most common of the microdeletion syndromes, affects multiple organs, including the heart, the nervous system, and the kidney. It is caused by deletions on chromosome 22q11.2; the genetic driver of the kidney defects is unknown. METHODS: We conducted a genomewide search for structural variants in two cohorts: 2080 patients with congenital kidney and urinary tract anomalies and 22,094 controls. We performed exome and targeted resequencing in samples obtained from 586 additional patients with congenital kidney anomalies. We also carried out functional studies using zebrafish and mice. RESULTS: We identified heterozygous deletions of 22q11.2 in 1.1% of the patients with congenital kidney anomalies and in 0.01% of population controls (odds ratio, 81.5; P=4.5×10-14). We localized the main drivers of renal disease in the DiGeorge syndrome to a 370-kb region containing nine genes. In zebrafish embryos, an induced loss of function in snap29, aifm3, and crkl resulted in renal defects; the loss of crkl alone was sufficient to induce defects. Five of 586 patients with congenital urinary anomalies had newly identified, heterozygous protein-altering variants, including a premature termination codon, in CRKL. The inactivation of Crkl in the mouse model induced developmental defects similar to those observed in patients with congenital urinary anomalies. CONCLUSIONS: We identified a recurrent 370-kb deletion at the 22q11.2 locus as a driver of kidney defects in the DiGeorge syndrome and in sporadic congenital kidney and urinary tract anomalies. Of the nine genes at this locus, SNAP29, AIFM3, and CRKL appear to be critical to the phenotype, with haploinsufficiency of CRKL emerging as the main genetic driver. (Funded by the National Institutes of Health and others.).


Assuntos
Proteínas Adaptadoras de Transdução de Sinal/genética , Deleção Cromossômica , Síndrome de DiGeorge/genética , Haploinsuficiência , Rim/anormalidades , Proteínas Nucleares/genética , Sistema Urinário/anormalidades , Adolescente , Animais , Criança , Cromossomos Humanos Par 22 , Exoma , Feminino , Heterozigoto , Humanos , Lactente , Recém-Nascido , Masculino , Camundongos , Modelos Animais , Análise de Sequência de DNA , Adulto Jovem , Peixe-Zebra
5.
Proc Natl Acad Sci U S A ; 114(52): 13685-13690, 2017 12 26.
Artigo em Inglês | MEDLINE | ID: mdl-29229851

RESUMO

We report a template-based method, LT-scanner, which scans the human proteome using protein structural alignment to identify proteins that are likely to bind ligands that are present in experimentally determined complexes. A scoring function that rapidly accounts for binding site similarities between the template and the proteins being scanned is a crucial feature of the method. The overall approach is first tested based on its ability to predict the residues on the surface of a protein that are likely to bind small-molecule ligands. The algorithm that we present, LBias, is shown to compare very favorably to existing algorithms for binding site residue prediction. LT-scanner's performance is evaluated based on its ability to identify known targets of Food and Drug Administration (FDA)-approved drugs and it too proves to be highly effective. The specificity of the scoring function that we use is demonstrated by the ability of LT-scanner to identify the known targets of FDA-approved kinase inhibitors based on templates involving other kinases. Combining sequence with structural information further improves LT-scanner performance. The approach we describe is extendable to the more general problem of identifying binding partners of known ligands even if they do not appear in a structurally determined complex, although this will require the integration of methods that combine protein structure and chemical compound databases.


Assuntos
Bases de Dados de Proteínas , Genoma , Inibidores de Proteínas Quinases/química , Proteínas , Ligantes , Proteínas/química , Proteínas/genética , Proteínas/metabolismo
6.
Nature ; 490(7421): 556-60, 2012 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-23023127

RESUMO

The genome-wide identification of pairs of interacting proteins is an important step in the elucidation of cell regulatory mechanisms. Much of our present knowledge derives from high-throughput techniques such as the yeast two-hybrid assay and affinity purification, as well as from manual curation of experiments on individual systems. A variety of computational approaches based, for example, on sequence homology, gene co-expression and phylogenetic profiles, have also been developed for the genome-wide inference of protein-protein interactions (PPIs). Yet comparative studies suggest that the development of accurate and complete repertoires of PPIs is still in its early stages. Here we show that three-dimensional structural information can be used to predict PPIs with an accuracy and coverage that are superior to predictions based on non-structural evidence. Moreover, an algorithm, termed PrePPI, which combines structural information with other functional clues, is comparable in accuracy to high-throughput experiments, yielding over 30,000 high-confidence interactions for yeast and over 300,000 for human. Experimental tests of a number of predictions demonstrate the ability of the PrePPI algorithm to identify unexpected PPIs of considerable biological interest. The surprising effectiveness of three-dimensional structural information can be attributed to the use of homology models combined with the exploitation of both close and remote geometric relationships between proteins.


Assuntos
Algoritmos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Proteínas/química , Proteínas/metabolismo , Proteômica/métodos , Animais , Teorema de Bayes , Encéfalo/metabolismo , Caderinas/metabolismo , Ensaios de Triagem em Larga Escala , Humanos , Proteínas de Ligação à Região de Interação com a Matriz/metabolismo , Camundongos , Modelos Moleculares , PPAR gama/metabolismo , Filogenia , Ligação Proteica , Conformação Proteica , Proteínas Quinases/química , Proteínas Quinases/metabolismo , Proteoma/química , Proteoma/metabolismo , Curva ROC , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/química , Saccharomyces cerevisiae/metabolismo , Proteínas Supressoras da Sinalização de Citocina/metabolismo , Fatores de Transcrição/metabolismo
7.
Neurogenetics ; 17(1): 43-9, 2016 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-26576547

RESUMO

Protein phosphatase 2A (PP2A) is a heterotrimeric protein serine/threonine phosphatase and is involved in a broad range of cellular processes. PPP2R5D is a regulatory B subunit of PP2A and plays an important role in regulating key neuronal and developmental regulation processes such as PI3K/AKT and glycogen synthase kinase 3 beta (GSK3ß)-mediated cell growth, chromatin remodeling, and gene transcriptional regulation. Using whole-exome sequencing (WES), we identified four de novo variants in PPP2R5D in a total of seven unrelated individuals with intellectual disability (ID) and other shared clinical characteristics, including autism spectrum disorder, macrocephaly, hypotonia, seizures, and dysmorphic features. Among the four variants, two have been previously reported and two are novel. All four amino acids are highly conserved among the PP2A subunit family, and all change a negatively charged acidic glutamic acid (E) to a positively charged basic lysine (K) and are predicted to disrupt the PP2A subunit binding and impair the dephosphorylation capacity. Our data provides further support for PPP2R5D as a genetic cause of ID.


Assuntos
Transtorno Autístico/genética , Deficiência Intelectual/genética , Megalencefalia/genética , Hipotonia Muscular/genética , Mutação de Sentido Incorreto , Proteína Fosfatase 2/genética , Adolescente , Transtorno do Espectro Autista/epidemiologia , Transtorno do Espectro Autista/genética , Transtorno Autístico/epidemiologia , Criança , Pré-Escolar , Análise Mutacional de DNA , Feminino , Estudos de Associação Genética , Predisposição Genética para Doença , Humanos , Lactente , Deficiência Intelectual/epidemiologia , Masculino , Megalencefalia/epidemiologia , Hipotonia Muscular/epidemiologia , Polimorfismo de Nucleotídeo Único
8.
PLoS Comput Biol ; 11(5): e1004248, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25938916

RESUMO

We describe a method to predict protein-protein interactions (PPIs) formed between structured domains and short peptide motifs. We take an integrative approach based on consensus patterns of known motifs in databases, structures of domain-motif complexes from the PDB and various sources of non-structural evidence. We combine this set of clues using a Bayesian classifier that reports the likelihood of an interaction and obtain significantly improved prediction performance when compared to individual sources of evidence and to previously reported algorithms. Our Bayesian approach was integrated into PrePPI, a structure-based PPI prediction method that, so far, has been limited to interactions formed between two structured domains. Around 80,000 new domain-motif mediated interactions were predicted, thus enhancing PrePPI's coverage of the human protein interactome.


Assuntos
Mapeamento de Interação de Proteínas/estatística & dados numéricos , Algoritmos , Teorema de Bayes , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Genoma Humano , Humanos , Funções Verossimilhança , Modelos Biológicos , Domínios e Motivos de Interação entre Proteínas , Proteômica/estatística & dados numéricos , Máquina de Vetores de Suporte
9.
Nucleic Acids Res ; 41(Database issue): D828-33, 2013 01.
Artigo em Inglês | MEDLINE | ID: mdl-23193263

RESUMO

PrePPI (http://bhapp.c2b2.columbia.edu/PrePPI) is a database that combines predicted and experimentally determined protein-protein interactions (PPIs) using a Bayesian framework. Predicted interactions are assigned probabilities of being correct, which are derived from calculated likelihood ratios (LRs) by combining structural, functional, evolutionary and expression information, with the most important contribution coming from structure. Experimentally determined interactions are compiled from a set of public databases that manually collect PPIs from the literature and are also assigned LRs. A final probability is then assigned to every interaction by combining the LRs for both predicted and experimentally determined interactions. The current version of PrePPI contains ∼2 million PPIs that have a probability more than ∼0.1 of which ∼60 000 PPIs for yeast and ∼370 000 PPIs for human are considered high confidence (probability > 0.5). The PrePPI database constitutes an integrated resource that enables users to examine aggregate information on PPIs, including both known and potentially novel interactions, and that provides structural models for many of the PPIs.


Assuntos
Bases de Dados de Proteínas , Complexos Multiproteicos/química , Mapeamento de Interação de Proteínas , Teorema de Bayes , Humanos , Internet , Conformação Proteica , Interface Usuário-Computador
10.
Nucleic Acids Res ; 39(Web Server issue): W357-61, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21672961

RESUMO

We describe MarkUs, a web server for analysis and comparison of the structural and functional properties of proteins. In contrast to a 'structure in/function out' approach to protein function annotation, the server is designed to be highly interactive and to allow flexibility in the examination of possible functions, suggested either automatically by various similarity measures or specified by a user directly. This is combined with tools that allow a user to assess independently whether or not a suggested function is consistent with the bioinformatic and biophysical properties of a given query structure, further allowing the user to generate testable hypotheses. The server is available at http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:Mark-Us.


Assuntos
Proteínas/química , Software , Proteínas de Bactérias/química , Internet , Conformação Proteica , Proteínas/fisiologia , Relação Estrutura-Atividade
11.
Nucleic Acids Res ; 39(Web Server issue): W283-7, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21609948

RESUMO

We describe PredUs, an interactive web server for the prediction of protein-protein interfaces. Potential interfacial residues for a query protein are identified by 'mapping' contacts from known interfaces of the query protein's structural neighbors to surface residues of the query. We calculate a score for each residue to be interfacial with a support vector machine. Results can be visualized in a molecular viewer and a number of interactive features allow users to tailor a prediction to a particular hypothesis. The PredUs server is available at: http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:PredUs.


Assuntos
Complexos Multiproteicos/química , Mapeamento de Interação de Proteínas/métodos , Software , Algoritmos , Inteligência Artificial , Sítios de Ligação , Internet , Modelos Moleculares , Conformação Proteica
12.
Proc Natl Acad Sci U S A ; 107(24): 10896-901, 2010 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-20534496

RESUMO

With the advent of Systems Biology, the prediction of whether two proteins form a complex has become a problem of increased importance. A variety of experimental techniques have been applied to the problem, but three-dimensional structural information has not been widely exploited. Here we explore the range of applicability of such information by analyzing the extent to which the location of binding sites on protein surfaces is conserved among structural neighbors. We find, as expected, that interface conservation is most significant among proteins that have a clear evolutionary relationship, but that there is a significant level of conservation even among remote structural neighbors. This finding is consistent with recent evidence that information available from structural neighbors, independent of classification, should be exploited in the search for functional insights. The value of such structural information is highlighted through the development of a new protein interface prediction method, PredUs, that identifies what residues on protein surfaces are likely to participate in complexes with other proteins. The performance of PredUs, as measured through comparisons with other methods, suggests that relationships across protein structure space can be successfully exploited in the prediction of protein-protein interactions.


Assuntos
Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas , Proteínas/química , Sítios de Ligação , Sequência Conservada , Bases de Dados de Proteínas , Modelos Moleculares , Complexos Multiproteicos/química , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Proteínas/genética , Alinhamento de Sequência , Homologia Estrutural de Proteína , Biologia de Sistemas
13.
Res Sq ; 2023 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-37790387

RESUMO

We introduce ZEPPI (Z-score Evaluation of Protein-Protein Interfaces), a framework to evaluate structural models of a complex based on sequence co-evolution and conservation involving residues in protein-protein interfaces. The ZEPPI score is calculated by comparing metrics for an interface to those obtained from randomly chosen residues. Since contacting residues are defined by the structural model, this obviates the need to account for indirect interactions. Further, although ZEPPI relies on species-paired multiple sequence alignments, its focus on interfacial residues allows it to leverage quite shallow alignments. ZEPPI performance is evaluated through applications to experimentally determined complexes and to decoys from the CASP-CAPRI experiment. ZEPPI can be implemented on a proteome-wide scale as evidenced by calculations on millions of structural models of dimeric complexes in the E. coli and human interactomes found in the PrePPI database. A number of examples that illustrate how these tools can yield novel functional hypotheses are provided.

14.
bioRxiv ; 2023 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-36909476

RESUMO

We present an updated version of the Predicting Protein-Protein Interactions (PrePPI) webserver which predicts PPIs on a proteome-wide scale. PrePPI combines structural and non-structural clues within a Bayesian framework to compute a likelihood ratio (LR) for essentially every possible pair of proteins in a proteome; the current database is for the human interactome. The structural modeling (SM) clue is derived from templatebased modeling and its application on a proteome-wide scale is enabled by a unique scoring function used to evaluate a putative complex. The updated version of PrePPI leverages AlphaFold structures that are parsed into individual domains. As has been demonstrated in earlier applications, PrePPI performs extremely well as measured by receiver operating characteristic curves derived from testing on E. coli and human protein-protein interaction (PPI) databases. A PrePPI database of ~1.3 million human PPIs can be queried with a webserver application that comprises multiple functionalities for examining query proteins, template complexes, 3D models for predicted complexes, and related features ( https://honiglab.c2b2.columbia.edu/PrePPI ). PrePPI is a state-of- the-art resource that offers an unprecedented structure-informed view of the human interactome.

15.
J Mol Biol ; 435(14): 168052, 2023 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-36933822

RESUMO

We present an updated version of the Predicting Protein-Protein Interactions (PrePPI) webserver which predicts PPIs on a proteome-wide scale. PrePPI combines structural and non-structural evidence within a Bayesian framework to compute a likelihood ratio (LR) for essentially every possible pair of proteins in a proteome; the current database is for the human interactome. The structural modeling (SM) component is derived from template-based modeling and its application on a proteome-wide scale is enabled by a unique scoring function used to evaluate a putative complex. The updated version of PrePPI leverages AlphaFold structures that are parsed into individual domains. As has been demonstrated in earlier applications, PrePPI performs extremely well as measured by receiver operating characteristic curves derived from testing on E. coli and human protein-protein interaction (PPI) databases. A PrePPI database of ∼1.3 million human PPIs can be queried with a webserver application that comprises multiple functionalities for examining query proteins, template complexes, 3D models for predicted complexes, and related features (https://honiglab.c2b2.columbia.edu/PrePPI). PrePPI is a state-of-the-art resource that offers an unprecedented structure-informed view of the human interactome.


Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Proteoma , Humanos , Teorema de Bayes , Escherichia coli/metabolismo , Proteoma/metabolismo
16.
Protein Sci ; 32(4): e4594, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36776141

RESUMO

We describe the Predicting Protein-Compound Interactions (PrePCI) database which comprises over 5 billion predicted interactions between 6.8 million chemical compounds and 19,797 human proteins. PrePCI relies on a proteome-wide database of structural models based on both traditional modeling techniques and the AlphaFold Protein Structure Database. Sequence- and structural similarity-based metrics are established between template proteins, T, in the Protein Data Bank that bind compounds, C, and query proteins in the model database, Q. When the metrics exceed threshold values, it is assumed that C also binds to Q with a likelihood ratio (LR) derived from machine learning. If the relationship is based on structural similarity, the LR is based on a scoring function that measures the extent to which C is compatible with the binding site of Q as described in the LT-scanner algorithm. For every predicted complex derived in this way, chemical similarity based on the Tanimoto coefficient identifies other small molecules that may bind to Q. An overall LR for the binding of C to Q is obtained from Naive Bayesian statistics. The PrePCI database can be queried by entering a UniProt ID or gene name for a protein to obtain a list of compounds predicted to bind to it along with associated LRs. Alternatively, entering an identifier for the compound outputs a list of proteins it is predicted to bind. Specific applications of the database to lead discovery, elucidation of drug mechanism of action, and biological function annotation are described.


Assuntos
Bases de Dados de Compostos Químicos , Proteínas , Humanos , Teorema de Bayes , Proteínas/química , Algoritmos , Bases de Dados de Proteínas
17.
J Struct Funct Genomics ; 13(3): 171-6, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-22592539

RESUMO

Protein domain family PF11267 (DUF3067) is a family of proteins of unknown function found in both bacteria and eukaryotes. Here we present the solution NMR structure of the 102-residue Alr2454 protein from Nostoc sp. PCC 7120, which constitutes the first structural representative from this conserved protein domain family. The structure of Nostoc sp. Alr2454 adopts a novel protein fold.


Assuntos
Proteínas de Bactérias/química , Espectroscopia de Ressonância Magnética/métodos , Nostoc/química , Sequência de Aminoácidos , Proteínas de Bactérias/genética , Clonagem Molecular , Escherichia coli/química , Escherichia coli/genética , Genes Bacterianos , Dados de Sequência Molecular , Nostoc/genética , Conformação Proteica , Dobramento de Proteína , Estrutura Terciária de Proteína , Alinhamento de Sequência , Soluções/química
18.
J Struct Funct Genomics ; 13(1): 1-7, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22223187

RESUMO

High-quality NMR structures of the homo-dimeric proteins Bvu3908 (69-residues in monomeric unit) from Bacteroides vulgatus and Bt2368 (74-residues) from Bacteroides thetaiotaomicron reveal the presence of winged helix-turn-helix (wHTH) motifs mediating tight complex formation. Such homo-dimer formation by winged HTH motifs is otherwise found only in two DNA-binding proteins with known structure: the C-terminal wHTH domain of transcriptional activator FadR from E. coli and protein TubR from B. thurigensis, which is involved in plasmid DNA segregation. However, the relative orientation of the wHTH motifs is different and residues involved in DNA-binding are not conserved in Bvu3908 and Bt2368. Hence, the proteins of the present study are not very likely to bind DNA, but are likely to exhibit a function that has thus far not been ascribed to homo-dimers formed by winged HTH motifs. The structures of Bvu3908 and Bt2368 are the first atomic resolution structures for PFAM family PF10771, a family of unknown function (DUF2582) currently containing 128 members.


Assuntos
Proteínas de Bactérias/química , Bacteroides/química , Multimerização Proteica , Proteínas de Bactérias/genética , Bacteroides/genética , Sequências Hélice-Volta-Hélice , Ressonância Magnética Nuclear Biomolecular/métodos , Estrutura Quaternária de Proteína , Estrutura Terciária de Proteína
19.
PLoS Comput Biol ; 7(10): e1002175, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21998567

RESUMO

Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is "optimal" in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are "suboptimal" in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for "modelability", we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended.


Assuntos
Alinhamento de Sequência/estatística & dados numéricos , Homologia Estrutural de Proteína , Algoritmos , Biologia Computacional , Bases de Dados de Proteínas/estatística & dados numéricos , Modelos Moleculares , Design de Software
20.
Nucleic Acids Res ; 38(Web Server issue): W550-4, 2010 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-20525783

RESUMO

The construction of a homology model for a protein can involve a number of decisions requiring the integration of different sources of information and the application of different modeling tools depending on the particular problem. Functional information can be especially important in guiding the modeling process, but such information is not generally integrated into modeling pipelines. Pudge is a flexible, interactive protein structure prediction server, which is designed with these issues in mind. By dividing the modeling into five stages (template selection, alignment, model building, model refinement and model evaluation) and providing various tools to visualize, analyze and compare the results at each stage, we enable a flexible modeling strategy that can be tailored to the needs of a given problem. Pudge is freely available at http://wiki.c2b2.columbia.edu/honiglab_public/index.php/Software:PUDGE.


Assuntos
Software , Homologia Estrutural de Proteína , Proteínas de Bactérias/química , Internet , Modelos Moleculares , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA