Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
1.
PLoS Comput Biol ; 17(1): e1008474, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33406091

RESUMO

Intrinsically disordered regions (IDRs) are prevalent in the eukaryotic proteome. Common functional roles of IDRs include forming flexible linkers or undergoing allosteric folding-upon-binding. Recent studies have suggested an additional functional role for IDRs: generating steric pressure on the plasma membrane during endocytosis, via molecular crowding. However, in order to accomplish useful functions, such crowding needs to be regulated in space (e.g., endocytic hotspots) and time (e.g., during vesicle formation). In this work, we explore binding-induced regulation of IDR steric volume. We simulate the IDRs of two proteins from Clathrin-mediated endocytosis (CME) to see if their conformational spaces are regulated via binding-induced expansion. Using Monte-Carlo computational modeling of excluded volumes, we generate large conformational ensembles (3 million) for the IDRs of Epsin and Eps15 and dock the conformers to the alpha subunit of Adaptor Protein 2 (AP2α), their CME binding partner. Our results show that as more molecules of AP2α are bound, the Epsin-derived ensemble shows a significant increase in global dimensions, measured as the radius of Gyration (RG) and the end-to-end distance (EED). Unlike Epsin, Eps15-derived conformers that permit AP2α binding at one motif were found to be more likely to accommodate binding of AP2α at other motifs, suggesting a tendency toward co-accessibility of binding motifs. Co-accessibility was not observed for any pair of binding motifs in Epsin. Thus, we speculate that the disordered regions of Epsin and Eps15 perform different roles during CME, with accessibility in Eps15 allowing it to act as a recruiter of AP2α molecules, while binding-induced expansion of the Epsin disordered region could impose steric pressure and remodel the plasma membrane during vesicle formation.


Assuntos
Complexo 2 de Proteínas Adaptadoras , Proteínas Adaptadoras de Transporte Vesicular , Proteínas Intrinsicamente Desordenadas , Complexo 2 de Proteínas Adaptadoras/química , Complexo 2 de Proteínas Adaptadoras/metabolismo , Proteínas Adaptadoras de Transporte Vesicular/química , Proteínas Adaptadoras de Transporte Vesicular/metabolismo , Membrana Celular/química , Membrana Celular/metabolismo , Clatrina/química , Clatrina/metabolismo , Endocitose/fisiologia , Humanos , Proteínas Intrinsicamente Desordenadas/química , Proteínas Intrinsicamente Desordenadas/metabolismo , Simulação de Acoplamento Molecular , Ligação Proteica , Conformação Proteica
2.
J Biol Chem ; 290(45): 27280-27296, 2015 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-26370083

RESUMO

ATP synthesis is a critical and universal life process carried out by ATP synthases. Whereas eukaryotic and prokaryotic ATP synthases are well characterized, archaeal ATP synthases are relatively poorly understood. The hyperthermophilic archaeal parasite, Nanoarcheaum equitans, lacks several subunits of the ATP synthase and is suspected to be energetically dependent on its host, Ignicoccus hospitalis. This suggests that this ATP synthase might be a rudimentary machine. Here, we report the crystal structures and biophysical studies of the regulatory subunit, NeqB, the apo-NeqAB, and NeqAB in complex with nucleotides, ADP, and adenylyl-imidodiphosphate (non-hydrolysable analog of ATP). NeqB is ∼20 amino acids shorter at its C terminus than its homologs, but this does not impede its binding with NeqA to form the complex. The heterodimeric NeqAB complex assumes a closed, rigid conformation irrespective of nucleotide binding; this differs from its homologs, which require conformational changes for catalytic activity. Thus, although N. equitans possesses an ATP synthase core A3B3 hexameric complex, it might not function as a bona fide ATP synthase.


Assuntos
Complexos de ATP Sintetase/química , Proteínas Arqueais/química , Nanoarchaeota/enzimologia , Complexos de ATP Sintetase/genética , Complexos de ATP Sintetase/metabolismo , Sequência de Aminoácidos , Proteínas Arqueais/genética , Proteínas Arqueais/metabolismo , Domínio Catalítico , Cristalografia por Raios X , Ativação Enzimática , Ligação de Hidrogênio , Modelos Moleculares , Dados de Sequência Molecular , Nanoarchaeota/genética , Filogenia , Conformação Proteica , Estrutura Quaternária de Proteína , Subunidades Proteicas , Homologia de Sequência de Aminoácidos , Eletricidade Estática , Homologia Estrutural de Proteína
3.
PLoS Comput Biol ; 10(4): e1003532, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24722239

RESUMO

Mechanical stretch-induced tyrosine phosphorylation in the proline-rich 306-residue substrate domain (CasSD) of p130Cas (or BCAR1) has eluded an experimentally validated structural understanding. Cellular p130Cas tyrosine phosphorylation is shown to function in areas without internal actomyosin contractility, sensing force at the leading edge of cell migration. Circular dichroism shows CasSD is intrinsically disordered with dominant polyproline type II conformations. Strongly conserved in placental mammals, the proline-rich sequence exhibits a pseudo-repeat unit with variation hotspots 2-9 residues before substrate tyrosine residues. Atomic-force microscopy pulling experiments show CasSD requires minimal extension force and exhibits infrequent, random regions of weak stability. Proteolysis, light scattering and ultracentrifugation results show that a monomeric intrinsically disordered form persists for CasSD in solution with an expanded hydrodynamic radius. All-atom 3D conformer sampling with the TraDES package yields ensembles in agreement with experiment when coil-biased sampling is used, matching the experimental radius of gyration. Increasing ß-sampling propensities increases the number of prolate conformers. Combining the results, we conclude that CasSD has no stable compact structure and is unlikely to efficiently autoinhibit phosphorylation. Taking into consideration the structural propensity of CasSD and the fact that it is known to bind to LIM domains, we propose a model of how CasSD and LIM domain family of transcription factor proteins may function together to regulate phosphorylation of CasSD and effect machanosensing.


Assuntos
Proteína Substrato Associada a Crk/química , Proteínas Intrinsicamente Desordenadas/química , Mecanotransdução Celular , Biofísica , Microscopia de Força Atômica , Desdobramento de Proteína
4.
BMC Bioinformatics ; 12 Suppl 13: S13, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22372892

RESUMO

BACKGROUND: LRP6 is a membrane protein crucial in the initiation of canonical Wnt/ß-catenin signalling. Its function is dependent on its proline-serine rich intracellular domain. LRP6 has five PPP(S/T)P motifs that are phosphorylated during activation, starting with the site closest to the membrane. Like all long proline rich regions, there is no stable 3D structure for this isolated, contiguous region. RESULTS: In our study, we use a computational simulation tool to sample the conformational space of the LRP6 intracellular domain, under the spatial constraints imposed by (a) the membrane and (b) the close approach of the neighboring intracellular molecular complex, which is assembled on Frizzled when Wnt binds to both LRP6 and Frizzled on the opposite side of the membrane. We observe that an elongated form dominates in the LRP6 intracellular domain structure ensemble. This elongation could relieve conformational auto-inhibition of the PPP(S/T)PX(S/T) motif binding sites and allow GSK3 and CK1 to approach their phosphorylation sites, thereby activating LRP6 and the downstream pathway. CONCLUSIONS: We propose a model in which the conformation of the LRP6 intracellular domain is elongated before activation. This is based on the intrusion of the Frizzled complex into the ensemble space of the proline rich region of LRP6, which alters the shape of its available ensemble space. To test whether this observed ensemble conformational change is sequence dependent, we did a control simulation with a hypothetical sequence with 50% proline and 50% serine in alternating residues. We confirm that this ensemble neighbourhood-based conformational change is independent of sequence and conclude that it is likely found in all proline rich sequences. These observations help us understand the nature of proline rich regions which are both unstructured and which seem to evolve at a higher rate of mutation, while maintaining sequence composition.


Assuntos
Proteína-6 Relacionada a Receptor de Lipoproteína de Baixa Densidade/química , Proteína-6 Relacionada a Receptor de Lipoproteína de Baixa Densidade/metabolismo , Transdução de Sinais , Via de Sinalização Wnt , Quinase 3 da Glicogênio Sintase/metabolismo , Humanos , Fosforilação , Dobramento de Proteína , Estrutura Terciária de Proteína , beta Catenina/metabolismo
5.
Pac Symp Biocomput ; 25: 183-194, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31797596

RESUMO

Proteins with intrinsically disordered regions (IDRs) have large hydrodynamic radii, compared with globular proteins of equivalent weight. Recent experiments showed that IDRs with large radii can create steric pressure to drive membrane curvature during Clathrin-mediated endocytosis (CME). Epsin and Eps15 are two CME proteins with IDRs that contain multiple motifs for binding the adaptor protein AP2, but the impact of AP2-binding on these IDRs is unknown. Some IDRs acquire binding-induced function by forming a folded quaternary structure, but we hypothesize that the IDRs of Epsin and/or Eps15 acquire binding-induced function by increasing their steric volume. We explore this hypothesis in silico by generating conformational ensembles of the IDRs of Epsin (4 million structures) or Eps15 (3 million structures), then estimating the impact of AP2-binding on Radius of Gyration (RG). Results show that the ensemble of Epsin IDR conformations that accommodate AP2 binding has a right-shifted distribution of RG (larger radii) than the unbound Epsin ensemble. In contrast, the ensemble of Eps15 IDR conformations has comparable RG distribution between AP2-bound and unbound. We speculate that AP2 triggers the Epsin IDR to function through binding-induced-expansion, which could increase steric pressure and membrane bending during CME.


Assuntos
Proteínas Adaptadoras de Transporte Vesicular , Biologia Computacional , Endocitose , Humanos
6.
PLoS Comput Biol ; 3(9): 1783-9, 2007 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-17892321

RESUMO

The multitude of functions performed in the cell are largely controlled by a set of carefully orchestrated protein interactions often facilitated by specific binding of conserved domains in the interacting proteins. Interacting domains commonly exhibit distinct binding specificity to short and conserved recognition peptides called binding profiles. Although many conserved domains are known in nature, only a few have well-characterized binding profiles. Here, we describe a novel predictive method known as domain-motif interactions from structural topology (D-MIST) for elucidating the binding profiles of interacting domains. A set of domains and their corresponding binding profiles were derived from extant protein structures and protein interaction data and then used to predict novel protein interactions in yeast. A number of the predicted interactions were verified experimentally, including new interactions of the mitotic exit network, RNA polymerases, nucleotide metabolism enzymes, and the chaperone complex. These results demonstrate that new protein interactions can be predicted exclusively from sequence information.


Assuntos
Algoritmos , Modelos Químicos , Mapeamento de Interação de Proteínas/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Sítios de Ligação , Simulação por Computador , Sequência Conservada , Estudos de Viabilidade , Modelos Moleculares , Dados de Sequência Molecular , Ligação Proteica , Estrutura Terciária de Proteína , Relação Estrutura-Atividade
7.
Nat Biotechnol ; 20(10): 991-7, 2002 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-12355115

RESUMO

High-throughput methods for detecting protein interactions, such as mass spectrometry and yeast two-hybrid assays, continue to produce vast amounts of data that may be exploited to infer protein function and regulation. As this article went to press, the pool of all published interaction information on Saccharomyces cerevisiae was 15,143 interactions among 4,825 proteins, and power-law scaling supports an estimate of 20,000 specific protein interactions. To investigate the biases, overlaps, and complementarities among these data, we have carried out an analysis of two high-throughput mass spectrometry (HMS)-based protein interaction data sets from budding yeast, comparing them to each other and to other interaction data sets. Our analysis reveals 198 interactions among 222 proteins common to both data sets, many of which reflect large multiprotein complexes. It also indicates that a "spoke" model that directly pairs bait proteins with associated proteins is roughly threefold more accurate than a "matrix" model that connects all proteins. In addition, we identify a large, previously unsuspected nucleolar complex of 148 proteins, including 39 proteins of unknown function. Our results indicate that existing large-scale protein interaction data sets are nonsaturating and that integrating many different experimental data sets yields a clearer biological view than any single method alone.


Assuntos
Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Alinhamento de Sequência/métodos , Cromatografia Líquida/métodos , Sistemas de Gerenciamento de Base de Dados , Genoma Fúngico , Substâncias Macromoleculares , Espectrometria de Massas/métodos , Complexos Multiproteicos , Proteoma , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Análise de Sequência de Proteína , Especificidade da Espécie
8.
BMC Bioinformatics ; 7: 152, 2006 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-16545112

RESUMO

BACKGROUND: Accurate small molecule binding site information for a protein can facilitate studies in drug docking, drug discovery and function prediction, but small molecule binding site protein sequence annotation is sparse. The Small Molecule Interaction Database (SMID), a database of protein domain-small molecule interactions, was created using structural data from the Protein Data Bank (PDB). More importantly it provides a means to predict small molecule binding sites on proteins with a known or unknown structure and unlike prior approaches, removes large numbers of false positive hits arising from transitive alignment errors, non-biologically significant small molecules and crystallographic conditions that overpredict ion binding sites. DESCRIPTION: Using a set of co-crystallized protein-small molecule structures as a starting point, SMID interactions were generated by identifying protein domains that bind to small molecules, using NCBI's Reverse Position Specific BLAST (RPS-BLAST) algorithm. SMID records are available for viewing at http://smid.blueprint.org. The SMID-BLAST tool provides accurate transitive annotation of small-molecule binding sites for proteins not found in the PDB. Given a protein sequence, SMID-BLAST identifies domains using RPS-BLAST and then lists potential small molecule ligands based on SMID records, as well as their aligned binding sites. A heuristic ligand score is calculated based on E-value, ligand residue identity and domain entropy to assign a level of confidence to hits found. SMID-BLAST predictions were validated against a set of 793 experimental small molecule interactions from the PDB, of which 472 (60%) of predicted interactions identically matched the experimental small molecule and of these, 344 had greater than 80% of the binding site residues correctly identified. Further, we estimate that 45% of predictions which were not observed in the PDB validation set may be true positives. CONCLUSION: By focusing on protein domain-small molecule interactions, SMID is able to cluster similar interactions and detect subtle binding patterns that would not otherwise be obvious. Using SMID-BLAST, small molecule targets can be predicted for any protein sequence, with the only limitation being that the small molecule must exist in the PDB. Validation results and specific examples within illustrate that SMID-BLAST has a high degree of accuracy in terms of predicting both the small molecule ligand and binding site residue positions for a query protein.


Assuntos
Bases de Dados de Proteínas , Documentação/métodos , Armazenamento e Recuperação da Informação/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Proteínas/classificação , Análise de Sequência de Proteína/métodos , Sítios de Ligação , Sistemas de Gerenciamento de Base de Dados , Ligantes , Ligação Proteica , Alinhamento de Sequência/métodos
9.
FEBS Lett ; 580(6): 1649-53, 2006 Mar 06.
Artigo em Inglês | MEDLINE | ID: mdl-16494871

RESUMO

A complete set of 6300 small molecule ligands was extracted from the protein data bank, and deposited online in PubChem as data source 'SMID'. This set's major improvement over prior methods is the inclusion of cyclic polypeptides and branched polysaccharides, including an unambiguous nomenclature, in addition to normal monomeric ligands. Only the best available example of each ligand structure is retained, and an additional dataset is maintained containing co-ordinates for all examples of each structure. Attempts are made to correct ambiguous atomic elements and other common errors, and a perception algorithm was used to determine bond order and aromaticity when no other information was available.


Assuntos
Bases de Dados de Proteínas , Ligantes , Proteínas/química , Estrutura Molecular
10.
J Mol Biol ; 350(5): 1061-73, 2005 Jul 29.
Artigo em Inglês | MEDLINE | ID: mdl-15978619

RESUMO

The identification and annotation of protein domains provides a critical step in the accurate determination of molecular function. Both computational and experimental methods of protein structure determination may be deterred by large multi-domain proteins or flexible linker regions. Knowledge of domains and their boundaries may reduce the experimental cost of protein structure determination by allowing researchers to work on a set of smaller and possibly more successful alternatives. Current domain prediction methods often rely on sequence similarity to conserved domains and as such are poorly suited to detect domain structure in poorly conserved or orphan proteins. We present here a simple computational method to identify protein domain linkers and their boundaries from sequence information alone. Our domain predictor, Armadillo (http://armadillo.blueprint.org), uses any amino acid index to convert a protein sequence to a smoothed numeric profile from which domains and domain boundaries may be predicted. We derived an amino acid index called the domain linker propensity index (DLI) from the amino acid composition of domain linkers using a non-redundant structure dataset. The index indicates that Pro and Gly show a propensity for linker residues while small hydrophobic residues do not. Armadillo predicts domain linker boundaries from Z-score distributions and obtains 35% sensitivity with DLI in a two-domain, single-linker dataset (within +/-20 residues from linker). The combination of DLI and an entropy-based amino acid index increases the overall Armadillo sensitivity to 56% for two domain proteins. Moreover, Armadillo achieves 37% sensitivity for multi-domain proteins, surpassing most other prediction methods. Armadillo provides a simple, but effective method by which prediction of domain boundaries can be obtained with reasonable sensitivity. Armadillo should prove to be a valuable tool for rapidly delineating protein domains in poorly conserved proteins or those with no sequence neighbors. As a first-line predictor, domain meta-predictors could yield improved results with Armadillo predictions.


Assuntos
Modelos Moleculares , Proteínas/química , Software , Homologia Estrutural de Proteína , Sequência de Aminoácidos , Sequência Conservada , Estrutura Terciária de Proteína
11.
Nucleic Acids Res ; 31(1): 248-50, 2003 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-12519993

RESUMO

The Biomolecular Interaction Network Database (BIND: http://bind.ca) archives biomolecular interaction, complex and pathway information. A web-based system is available to query, view and submit records. BIND continues to grow with the addition of individual submissions as well as interaction data from the PDB and a number of large-scale interaction and complex mapping experiments using yeast two hybrid, mass spectrometry, genetic interactions and phage display. We have developed a new graphical analysis tool that provides users with a view of the domain composition of proteins in interaction and complex records to help relate functional domains to protein interactions. An interaction network clustering tool has also been developed to help focus on regions of interest. Continued input from users has helped further mature the BIND data specification, which now includes the ability to store detailed information about genetic interactions. The BIND data specification is available as ASN.1 and XML DTD.


Assuntos
Bases de Dados de Proteínas , Proteínas/metabolismo , Sequência de Aminoácidos , Animais , Gráficos por Computador , Substâncias Macromoleculares , Mapeamento de Interação de Proteínas , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/fisiologia , Alinhamento de Sequência/métodos
12.
Cancer Res ; 62(5): 1284-8, 2002 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-11888892

RESUMO

Human colorectal, endometrial, and gastric cancers with defective DNA mismatch repair (MMR) have microsatellite instability, a unique molecular alteration characterized by widespread frameshift mutations of repetitive DNA sequences. We developed "Kangaroo," a bioinformatics program for searches in nucleotide and protein sequence databases, and performed an in silico genome scan for DNA coding microsatellites that may have novel mutations in MMR-deficient cancers. Examination of 29 previously untested coding polyadenines revealed widespread mutations in MMR-deficient colorectal cancers, with the highest frequencies in ERCC5, CASP8AP2, p72, RAD50, CDC25, RECQL1, CBF2, RACK7, GRK4, and DNAPK (range, 10-33%). This algorithm allows comprehensive mutation profiling of MMR-deficient cancers, an important step in understanding the pathogenesis of these neoplasms.


Assuntos
Pareamento Incorreto de Bases , Neoplasias Colorretais/genética , Repetições de Microssatélites , Mutação , Algoritmos , Biologia Computacional , Reparo do DNA , Humanos
13.
FEBS Lett ; 579(21): 4685-91, 2005 Aug 29.
Artigo em Inglês | MEDLINE | ID: mdl-16098521

RESUMO

A novel chemical ontology based on chemical functional groups automatically, objectively assigned by a computer program, was developed to categorize small molecules. It has been applied to PubChem and the small molecule interaction database to demonstrate its utility as a basic pharmacophore search system. Molecules can be compared using a semantic similarity score based on functional group assignments rather than 3D shape, which succeeds in identifying small molecules known to bind a common binding site. This ontology will serve as a powerful tool for searching chemical databases and identifying key functional groups responsible for biological activities.


Assuntos
Substâncias Macromoleculares/química , Semântica , Software , Sítios de Ligação , Bases de Dados Factuais , Modelos Moleculares , Conformação Molecular
14.
Bioinformatics ; 20 Suppl 1: i55-62, 2004 Aug 04.
Artigo em Inglês | MEDLINE | ID: mdl-15262781

RESUMO

MOTIVATION: A growing body of research has concentrated on the identification and definition of conserved sequence motifs. It is widely recognized that these conserved sequence and structural units often mediate protein functions and interactions. The continuing advancements in high-throughput experiments necessitate the development of computational methods to critically assess the results. In this work, we analyzed high-throughput protein complexes using the domain composition of their protein constituents. Domains that mediate similar or related functions may consistently co-occur in protein complexes. RESULTS: We analyzed Saccharomyces cerevisiae protein complexes from curated and high-throughput experimental datasets to identify statistically significant functional associations between domains. The resulting correlations are represented as domain networks that form the basis of comparison between the datasets, as well as to binary protein interactions. The results show that the curated datasets produce domain networks that map to known biological assemblies, such as ribosome, RNA polymerase, proteasome regulators, transcription initiation and histones. Furthermore, many of these domain correlations were also found in binary protein interactions. In contrast, the high-throughput datasets contain one large network of domain associations. High connectivity of RNA processing and binding domains in the high-throughput datasets reflects the abundance of RNA binding proteins in yeast, in agreement with a previous report that identified a nucleolar protein cluster, possibly mediated by rRNA, from these complexes. AVAILABILITY: The software is available upon request from the authors and is dependent on the NCBI C++ toolkit.


Assuntos
Bases de Dados de Proteínas , Proteínas de Saccharomyces cerevisiae/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Motivos de Aminoácidos , Sequência de Aminoácidos , Sequência Conservada , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Estatística como Assunto
16.
BMC Bioinformatics ; 3: 13, 2002 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-12019022

RESUMO

BACKGROUND: The BLAST algorithm compares biological sequences to one another in order to determine shared motifs and common ancestry. However, the comparison of all non-redundant (NR) sequences against all other NR sequences is a computationally intensive task. We developed NBLAST as a cluster computer implementation of the BLAST family of sequence comparison programs for the purpose of generating pre-computed BLAST alignments and neighbour lists of NR sequences. RESULTS: NBLAST performs the heuristic BLAST algorithm and generates an exhaustive database of alignments, but it only computes alignments (i.e. the upper triangle) of a possible N2 alignments, where N is the set of all sequences to be compared. A task-partitioning algorithm allows for cluster computing across all cluster nodes and the NBLAST master process produces a BLAST sequence alignment database and a list of sequence neighbours for each sequence record. The resulting sequence alignment and neighbour databases are used to serve the SeqHound query system through a C/C++ and PERL Application Programming Interface (API). CONCLUSIONS: NBLAST offers a local alternative to the NCBI's remote Entrez system for pre-computed BLAST alignments and neighbour queries. On our 216-processor 450 MHz PIII cluster, NBLAST requires ~24 hrs to compute neighbours for 850000 proteins currently in the non-redundant protein database.


Assuntos
Algoritmos , Biologia Computacional/estatística & dados numéricos , Variação Genética/genética , Análise por Conglomerados , Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados/estatística & dados numéricos , Alinhamento de Sequência/métodos
17.
BMC Bioinformatics ; 3: 20, 2002 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-12150718

RESUMO

BACKGROUND: Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. RESULTS: Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. CONCLUSION: A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats.


Assuntos
Alinhamento de Sequência/métodos , Software , Pareamento Incorreto de Bases/genética , Biologia Computacional/métodos , Reparo do DNA/genética , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Humanos , Internet , Mutação , Elementos Nucleotídeos Curtos e Dispersos/genética
18.
BMC Bioinformatics ; 4: 2, 2003 Jan 13.
Artigo em Inglês | MEDLINE | ID: mdl-12525261

RESUMO

BACKGROUND: Recent advances in proteomics technologies such as two-hybrid, phage display and mass spectrometry have enabled us to create a detailed map of biomolecular interaction networks. Initial mapping efforts have already produced a wealth of data. As the size of the interaction set increases, databases and computational methods will be required to store, visualize and analyze the information in order to effectively aid in knowledge discovery. RESULTS: This paper describes a novel graph theoretic clustering algorithm, "Molecular Complex Detection" (MCODE), that detects densely connected regions in large protein-protein interaction networks that may represent molecular complexes. The method is based on vertex weighting by local neighborhood density and outward traversal from a locally dense seed protein to isolate the dense regions according to given parameters. The algorithm has the advantage over other graph clustering methods of having a directed mode that allows fine-tuning of clusters of interest without considering the rest of the network and allows examination of cluster interconnectivity, which is relevant for protein networks. Protein interaction and complex information from the yeast Saccharomyces cerevisiae was used for evaluation. CONCLUSION: Dense regions of protein interaction networks can be found, based solely on connectivity data, many of which correspond to known protein complexes. The algorithm is not affected by a known high rate of false positives in data from high-throughput interaction techniques. The program is available from ftp://ftp.mshri.on.ca/pub/BIND/Tools/MCODE.


Assuntos
Mapeamento de Interação de Proteínas/métodos , Algoritmos , Análise por Conglomerados , Biologia Computacional/métodos , Gráficos por Computador , Substâncias Macromoleculares , Valor Preditivo dos Testes , Proteômica/métodos , Proteínas de Saccharomyces cerevisiae/química , Proteínas de Saccharomyces cerevisiae/metabolismo , Validação de Programas de Computador
19.
BMC Bioinformatics ; 3: 39, 2002 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-12487631

RESUMO

BACKGROUND: An organism's ability to adapt to its particular environmental niche is of fundamental importance to its survival and proliferation. In the largest study of its kind, we sought to identify and exploit the amino-acid signatures that make species-specific protein adaptation possible across 100 complete genomes. RESULTS: Environmental niche was determined to be a significant factor in variability from correspondence analysis using the amino acid composition of over 360,000 predicted open reading frames (ORFs) from 17 archaea, 76 bacteria and 7 eukaryote complete genomes. Additionally, we found clusters of phylogenetically unrelated archaea and bacteria that share similar environments by amino acid composition clustering. Composition analyses of conservative, domain-based homology modeling suggested an enrichment of small hydrophobic residues Ala, Gly, Val and charged residues Asp, Glu, His and Arg across all genomes. However, larger aromatic residues Phe, Trp and Tyr are reduced in folds, and these results were not affected by low complexity biases. We derived two simple log-odds scoring functions from ORFs (CG) and folds (CF) for each of the complete genomes. CF achieved an average cross-validation success rate of 85 +/- 8% whereas the CG detected 73 +/- 9% species-specific sequences when competing against all other non-redundant CG. Continuously updated results are available at http://genome.mshri.on.ca. CONCLUSION: Our analysis of amino acid compositions from the complete genomes provides stronger evidence for species-specific and environmental residue preferences in genomic sequences as well as in folds. Scoring functions derived from this work will be useful in future protein engineering experiments and possibly in identifying horizontal transfer events.


Assuntos
Biologia Computacional/métodos , Dobramento de Proteína , Proteínas/química , Adaptação Fisiológica/genética , Animais , Proteínas Arqueais/química , Proteínas Arqueais/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Caenorhabditis elegans/química , Proteínas de Caenorhabditis elegans/genética , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Genoma , Genoma Arqueal , Genoma Bacteriano , Genoma Fúngico , Genoma Humano , Humanos , Valor Preditivo dos Testes , Estrutura Secundária de Proteína/genética , Proteínas/genética , Proteoma/química , Proteoma/genética , Proteômica/métodos , Especificidade da Espécie
20.
BMC Bioinformatics ; 3: 32, 2002 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-12401134

RESUMO

BACKGROUND: SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. RESULTS: SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. CONCLUSIONS: The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Software , Sequência de Aminoácidos , Sequência de Bases , Bases de Dados Genéticas/classificação , Armazenamento e Recuperação da Informação/métodos , Internet , Modelos Genéticos , Modelos Moleculares , Dados de Sequência Molecular , Relação Estrutura-Atividade
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA