Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
BMC Genomics ; 17(Suppl 13): 1025, 2016 12 22.
Artículo en Inglés | MEDLINE | ID: mdl-28155657

RESUMEN

BACKGROUND: The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). RESULTS: Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. CONCLUSION: This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.


Asunto(s)
Encéfalo/metabolismo , Perfilación de la Expresión Génica , Aprendizaje Automático , Organogénesis/genética , Análisis de la Célula Individual , Transcriptoma , Algoritmos , Biomarcadores , Encéfalo/embriología , Encéfalo/crecimiento & desarrollo , Modelos Estadísticos , Neurogénesis/genética , Especificidad de Órganos , Reproducibilidad de los Resultados , Análisis de la Célula Individual/métodos , Máquina de Vectores de Soporte
2.
BMC Genomics ; 14 Suppl 5: S15, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24564427

RESUMEN

BACKGROUND: Many biological processes are carried out by proteins interacting with each other in the form of protein complexes. However, large-scale detection of protein complexes has remained constrained by experimental limitations. As such, computational detection of protein complexes by applying clustering algorithms on the abundantly available protein-protein interaction (PPI) networks is an important alternative. However, many current algorithms have overlooked the importance of selecting seeds for expansion into clusters without excluding important proteins and including many noisy ones, while ensuring a high degree of functional homogeneity amongst the proteins detected for the complexes. RESULTS: We designed a novel method called Probabilistic Local Walks (PLW) which clusters regions in a PPI network with high functional similarity to find protein complex cores with high precision and efficiency in O (|V| log |V| + |E|) time. A seed selection strategy, which prioritises seeds with dense neighbourhoods, was devised. We defined a topological measure, called common neighbour similarity, to estimate the functional similarity of two proteins given the number of their common neighbours. CONCLUSIONS: Our proposed PLW algorithm achieved the highest F-measure (recall and precision) when compared to 11 state-of-the-art methods on yeast protein interaction data, with an improvement of 16.7% over the next highest score. Our experiments also demonstrated that our seed selection strategy is able to increase algorithm precision when applied to three previous protein complex mining techniques. AVAILABILITY: The software, datasets and predicted complexes are available at http://wonglkd.github.io/PLW.


Asunto(s)
Biología Computacional/métodos , Proteínas Fúngicas/análisis , Levaduras/metabolismo , Algoritmos , Mapeo de Interacción de Proteínas , Programas Informáticos
3.
Bioinformatics ; 28(20): 2640-7, 2012 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-22923290

RESUMEN

BACKGROUND: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes. However, such kind of classifiers is actually built from a noisy negative set N as there can be unknown disease genes in N itself. As a result, the classifiers do not perform as well as they could be. RESULT: Instead of treating the unknown genes as negative examples in N, we treat them as an unlabeled set U. We design a novel positive-unlabeled (PU) learning algorithm PUDI (PU learning for disease gene identification) to build a classifier using P and U. We first partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN and weak negative set WN. The weighted support vector machines are then used to build a multi-level classifier based on the four training sets and positive training set P to identify disease genes. Our experimental results demonstrate that our proposed PUDI algorithm outperformed the existing methods significantly. CONCLUSION: The proposed PUDI algorithm is able to identify disease genes more accurately by treating the unknown data more appropriately as unlabeled set U instead of negative set N. Given that many machine learning problems in biomedical research do involve positive and unlabeled data instead of negative data, it is possible that the machine learning methods for these problems can be further improved by adopting PU learning methods, as we have done here for disease gene identification. AVAILABILITY AND IMPLEMENTATION: The executable program and data are available at http://www1.i2r.a-star.edu.sg/~xlli/PUDI/PUDI.html.


Asunto(s)
Inteligencia Artificial , Enfermedad/genética , Genes , Algoritmos , Humanos , Máquina de Vectores de Soporte
4.
J Proteome Res ; 10(12): 5285-95, 2011 Dec 02.
Artículo en Inglés | MEDLINE | ID: mdl-22004555

RESUMEN

Many biologically important protein-protein interactions (PPIs) have been found to be mediated by short linear motifs (SLiMs). These interactions are mediated by the binding of a protein domain, often with a nonlinear interaction interface, to a SLiM. We propose a method called D-SLIMMER to mine for SLiMs in PPI data on the basis of the interaction density between a nonlinear motif (i.e., a protein domain) in one protein and a SLiM in the other protein. Our results on a benchmark of 113 experimentally verified reference SLiMs showed that D-SLIMMER outperformed existing methods notably for discovering domain-SLiMs interaction motifs. To illustrate the significance of the SLiMs detected, we highlighted two SLiMs discovered from the PPI data by D-SLIMMER that are variants of the known ELM SLiM, as well as a literature-backed SLiM that is yet to be listed in the reference databases. We also presented a novel SLiM predicted by D-SLIMMER that was strongly supported by existing biological literatures. These examples showed that D-SLIMMER is able to find SLiMs that are biologically relevant.


Asunto(s)
Algoritmos , Minería de Datos/métodos , Dominios y Motivos de Interacción de Proteínas , Mapeo de Interacción de Proteínas/métodos , Programas Informáticos , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Animales , Biología Computacional/métodos , Bases de Datos de Proteínas , Humanos , Ratones , Datos de Secuencia Molecular , Reproducibilidad de los Resultados , Alineación de Secuencia , Análisis de Secuencia de Proteína/métodos
5.
Bioinformatics ; 26(8): 1036-42, 2010 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-20167627

RESUMEN

MOTIVATION: An important class of protein interactions involves the binding of a protein's domain to a short linear motif (SLiM) on its interacting partner. Extracting such motifs, either experimentally or computationally, is challenging because of their weak binding and high degree of degeneracy. Recent rapid increase of available protein structures provides an excellent opportunity to study SLiMs directly from their 3D structures. RESULTS: Using domain interface extraction (Diet), we characterized 452 distinct SLiMs from the Protein Data Bank (PDB), of which 155 are validated in varying degrees-40 have literature validation, 54 are supported by at least one domain-peptide structural instance, and another 61 have overrepresentation in high-throughput PPI data. We further observed that the lacklustre coverage of existing computational SLiM detection methods could be due to the common assumption that most SLiMs occur outside globular domain regions. 198 of 452 SLiM that we reported are actually found on domain-domain interface; some of them are implicated in autoimmune and neurodegenerative diseases. We suggest that these SLiMs would be useful for designing inhibitors against the pathogenic protein complexes underlying these diseases. Our findings show that 3D structure-based SLiM detection algorithms can provide a more complete coverage of SLiM-mediated protein interactions than current sequence-based approaches.


Asunto(s)
Genómica/métodos , Dominios y Motivos de Interacción de Proteínas , Programas Informáticos , Secuencias de Aminoácidos , Bases de Datos de Proteínas , Análisis de Secuencia de Proteína/métodos
6.
Nucleic Acids Res ; 37(Database issue): D858-62, 2009 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-18948286

RESUMEN

Parkinson's disease (PD) is the second most common neurodegenerative disorder affecting millions of people. Both environmental and genetic factors play important roles in its causation and development. Genetic analysis has shown that over 100 genes are correlated with the etiology and pathology of PD. However, accessing genetic information in a consistent and fruitful way is not an easy task. The Mutation Database for Parkinson's Disease (MDPD) is designed to fulfill the need for information integration so that users can easily retrieve, inspect and enhance their knowledge on PD. The database contains 2391 entries on 202 genes extracted from 576 publications and manually examined by biomedical researchers. Each genetic substitution and the resulting impact are clearly labelled and linked to its primary reference. Every reported gene has a summary page that provides information on the variation impact, mutation type, the studied population, mutation position and reference collection. In addition, MDPD provides a unique functionality for users to compare the differences on the type of mutations among ethnic groups. As such, we hope that MDPD will serve as a valuable tool to bridge the gap between genetic analysis and clinical practice. MDPD is publicly accessible at http://datam.i2r.a-star.edu.sg/mdpd/.


Asunto(s)
Bases de Datos Genéticas , Mutación , Enfermedad de Parkinson/genética , Humanos , Polimorfismo de Nucleótido Simple , Integración de Sistemas
7.
BMC Bioinformatics ; 11 Suppl 7: S8, 2010 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-21106130

RESUMEN

BACKGROUND: Protein-protein interactions (PPIs) play important roles in various cellular processes. However, the low quality of current PPI data detected from high-throughput screening techniques has diminished the potential usefulness of the data. We need to develop a method to address the high data noise and incompleteness of PPI data, namely, to filter out inaccurate protein interactions (false positives) and predict putative protein interactions (false negatives). RESULTS: In this paper, we proposed a novel two-step method to integrate diverse biological and computational sources of supporting evidence for reliable PPIs. The first step, interaction binning or InterBIN, groups PPIs together to more accurately estimate the likelihood (Bin-Confidence score) that the protein pairs interact for each biological or computational evidence source. The second step, interaction classification or InterCLASS, integrates the collected Bin-Confidence scores to build classifiers and identify reliable interactions. CONCLUSIONS: We performed comprehensive experiments on two benchmark yeast PPI datasets. The experimental results showed that our proposed method can effectively eliminate false positives in detected PPIs and identify false negatives by predicting novel yet reliable PPIs. Our proposed method also performed significantly better than merely using each of individual evidence sources, illustrating the importance of integrating various biological and computational sources of data and evidence.


Asunto(s)
Biología Computacional/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Mapeo de Interacción de Proteínas/métodos , Reproducibilidad de los Resultados , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Programas Informáticos
8.
BMC Genomics ; 11 Suppl 1: S3, 2010 Feb 10.
Artículo en Inglés | MEDLINE | ID: mdl-20158874

RESUMEN

BACKGROUND: Most proteins form macromolecular complexes to perform their biological functions. However, experimentally determined protein complex data, especially of those involving more than two protein partners, are relatively limited in the current state-of-the-art high-throughput experimental techniques. Nevertheless, many techniques (such as yeast-two-hybrid) have enabled systematic screening of pairwise protein-protein interactions en masse. Thus computational approaches for detecting protein complexes from protein interaction data are useful complements to the limited experimental methods. They can be used together with the experimental methods for mapping the interactions of proteins to understand how different proteins are organized into higher-level substructures to perform various cellular functions. RESULTS: Given the abundance of pairwise protein interaction data from high-throughput genome-wide experimental screenings, a protein interaction network can be constructed from protein interaction data by considering individual proteins as the nodes, and the existence of a physical interaction between a pair of proteins as a link. This binary protein interaction graph can then be used for detecting protein complexes using graph clustering techniques. In this paper, we review and evaluate the state-of-the-art techniques for computational detection of protein complexes, and discuss some promising research directions in this field. CONCLUSIONS: Experimental results with yeast protein interaction data show that the interaction subgraphs discovered by various computational methods matched well with actual protein complexes. In addition, the computational approaches have also improved in performance over the years. Further improvements could be achieved if the quality of the underlying protein interaction data can be considered adequately to minimize the undesirable effects from the irrelevant and noisy sources, and the various biological evidences can be better incorporated into the detection process to maximize the exploitation of the increasing wealth of biological knowledge available.


Asunto(s)
Proteínas/análisis , Biología de Sistemas/métodos , Biometría , Perfilación de la Expresión Génica , Humanos , Unión Proteica , Mapeo de Interacción de Proteínas , Proteínas/metabolismo
9.
BMC Bioinformatics ; 10: 169, 2009 Jun 02.
Artículo en Inglés | MEDLINE | ID: mdl-19486541

RESUMEN

BACKGROUND: How to detect protein complexes is an important and challenging task in post genomic era. As the increasing amount of protein-protein interaction (PPI) data are available, we are able to identify protein complexes from PPI networks. However, most of current studies detect protein complexes based solely on the observation that dense regions in PPI networks may correspond to protein complexes, but fail to consider the inherent organization within protein complexes. RESULTS: To provide insights into the organization of protein complexes, this paper presents a novel core-attachment based method (COACH) which detects protein complexes in two stages. It first detects protein-complex cores as the "hearts" of protein complexes and then includes attachments into these cores to form biologically meaningful structures. We evaluate and analyze our predicted protein complexes from two aspects. First, we perform a comprehensive comparison between our proposed method and existing techniques by comparing the predicted complexes against benchmark complexes. Second, we also validate the core-attachment structures using various biological evidence and knowledge. CONCLUSION: Our proposed COACH method has been applied on two different yeast PPI networks and the experimental results show that COACH performs significantly better than the state-of-the-art techniques. In addition, the identified complexes with core-attachment structures are demonstrated to match very well with existing biological knowledge and thus provide more insights for future biological study.


Asunto(s)
Complejos Multiproteicos , Mapeo de Interacción de Proteínas/métodos , Proteínas , Programas Informáticos , Algoritmos , Interpretación Estadística de Datos , Bases de Datos de Proteínas , Complejos Multiproteicos/química , Complejos Multiproteicos/metabolismo , Dominios y Motivos de Interacción de Proteínas , Proteínas/química , Proteínas/metabolismo , Reproducibilidad de los Resultados
10.
J Bioinform Comput Biol ; 6(3): 415-33, 2008 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-18574857

RESUMEN

The biological mechanisms through which proteins interact with one another are best revealed by studying the structural interfaces between interacting proteins. Protein-protein interfaces can be extracted from three-dimensional (3D) structural data of protein complexes and then clustered to derive biological insights. However, conventional protein interface clustering methods lack computational scalability and statistical support. In this work, we present a new method named "PPiClust" to systematically encode, cluster, and analyze similar 3D interface patterns in protein complexes efficiently. Experimental results showed that our method is effective in discovering visually consistent and statistically significant clusters of interfaces, and at the same time sufficiently time-efficient to be performed on a single computer. The interface clusters are also useful for uncovering the structural basis of protein interactions. Analysis of the resulting interface clusters revealed groups of structurally diverse proteins having similar interface patterns. We also found, in some of the interface clusters, the presence of well-known linear binding motifs which were noncontiguous in the primary sequences. These results suggest that PPiClust can discover not only statistically significant, but also biologically significant, protein interface clusters from protein complex structural data.


Asunto(s)
Conformación Proteica , Análisis por Conglomerados , Unión Proteica/fisiología , Proteínas/química , Relación Estructura-Actividad
11.
Bioinformatics ; 22(16): 1998-2004, 2006 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-16787971

RESUMEN

MOTIVATION: Experimental limitations in high-throughput protein-protein interaction detection methods have resulted in low quality interaction datasets that contained sizable fractions of false positives and false negatives. Small-scale, focused experiments are then needed to complement the high-throughput methods to extract true protein interactions. However, the naturally vast interactomes would require much more scalable approaches. RESULTS: We describe a novel method called IRAP* as a computational complement for repurification of the highly erroneous experimentally derived protein interactomes. Our method involves an iterative process of removing interactions that are confidently identified as false positives and adding interactions detected as false negatives into the interactomes. Identification of both false positives and false negatives are performed in IRAP* using interaction confidence measures based on network topological metrics. Potential false positives are identified amongst the detected interactions as those with very low computed confidence values, while potential false negatives are discovered as the undetected interactions with high computed confidence values. Our results from applying IRAP* on large-scale interaction datasets generated by the popular yeast-two-hybrid assays for yeast, fruit fly and worm showed that the computationally repurified interaction datasets contained potentially lower fractions of false positive and false negative errors based on functional homogeneity. AVAILABILITY: The confidence indices for PPIs in yeast, fruit fly and worm as computed by our method can be found at our website http://www.comp.nus.edu.sg/~chenjin/fpfn.


Asunto(s)
Biología Computacional/métodos , Mapeo de Interacción de Proteínas , Proteómica/métodos , Animales , Caenorhabditis elegans , Simulación por Computador , Bases de Datos de Proteínas , Drosophila , Reacciones Falso Positivas , Unión Proteica , Saccharomyces cerevisiae/metabolismo , Análisis de Secuencia de Proteína , Programas Informáticos , Técnicas del Sistema de Dos Híbridos
12.
BMC Bioinformatics ; 7 Suppl 4: S23, 2006 Dec 12.
Artículo en Inglés | MEDLINE | ID: mdl-17217516

RESUMEN

BACKGROUND: Quantitative simultaneous monitoring of the expression levels of thousands of genes under various experimental conditions is now possible using microarray experiments. However, there are still gaps toward whole-genome functional annotation of genes using the gene expression data. RESULTS: In this paper, we propose a novel technique called Fuzzy Nearest Clusters for genome-wide functional annotation of unclassified genes. The technique consists of two steps: an initial hierarchical clustering step to detect homogeneous co-expressed gene subgroups or clusters in each possibly heterogeneous functional class; followed by a classification step to predict the functional roles of the unclassified genes based on their corresponding similarities to the detected functional clusters. CONCLUSION: Our experimental results with yeast gene expression data showed that the proposed method can accurately predict the genes' functions, even those with multiple functional roles, and the prediction performance is most independent of the underlying heterogeneity of the complex functional classes, as compared to the other conventional gene function prediction approaches.


Asunto(s)
Análisis por Conglomerados , Lógica Difusa , Perfilación de la Expresión Génica/métodos , Expresión Génica/fisiología , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Proteoma/metabolismo , Transducción de Señal/fisiología , Algoritmos , Inteligencia Artificial , Simulación por Computador , Modelos Biológicos , Proteoma/genética
13.
BMC Bioinformatics ; 7: 502, 2006 Nov 16.
Artículo en Inglés | MEDLINE | ID: mdl-17107624

RESUMEN

BACKGROUND: An important class of interaction switches for biological circuits and disease pathways are short binding motifs. However, the biological experiments to find these binding motifs are often laborious and expensive. With the availability of protein interaction data, novel binding motifs can be discovered computationally: by applying standard motif extracting algorithms on protein sequence sets each interacting with either a common protein or a protein group with similar properties. The underlying assumption is that proteins with common interacting partners will share some common binding motifs. Although novel binding motifs have been discovered with such approach, it is not applicable if a protein interacts with very few other proteins or when prior knowledge of protein group is not available or erroneous. Experimental noise in input interaction data can further deteriorate the dismal performance of such approaches. RESULTS: We propose a novel approach of finding correlated short sequence motifs from protein-protein interaction data to effectively circumvent the above-mentioned limitations. Correlated motifs are those motifs that consistently co-occur only in pairs of interacting protein sequences, and could possibly interact with each other directly or indirectly to mediate interactions. We adopted the (l, d)-motif model and formulate finding the correlated motifs as an (l, d)-motif pair finding problem. We present both an exact algorithm, D-MOTIF, as well as its approximation algorithm, D-STAR to solve this problem. Evaluation on extensive simulated data showed that our approach not only eliminated the need for any prior protein grouping, but is also more robust in extracting motifs from noisy interaction data. Application on two biological datasets (SH3 interaction network and TGFbeta signaling network) demonstrates that the approach can extract correlated motifs that correspond to actual interacting subsequences. CONCLUSION: The correlated motif approach outlined in this paper is able to find correlated linear motifs from sparse and noisy interaction data. This, in turn, will expedite the discovery of novel linear binding motifs, and facilitate the studies of biological pathways mediated by them.


Asunto(s)
Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Algoritmos , Secuencias de Aminoácidos , Bases de Datos de Proteínas , Modelos Biológicos , Modelos Moleculares , Modelos Estadísticos , Lenguajes de Programación , Estructura Terciaria de Proteína , Proteínas/química , Sensibilidad y Especificidad , Análisis de Secuencia de Proteína/métodos , Transducción de Señal , Programas Informáticos
14.
Genome Inform ; 17(2): 284-97, 2006.
Artículo en Inglés | MEDLINE | ID: mdl-17514831

RESUMEN

High-throughput experimental methods, such as yeast-two-hybrid and phage display, have fairly high levels of false positives (and false negatives). Thus the list of protein-protein interactions detected by such experiments would need additional wet laboratory validation. It would be useful if the list could be prioritized in some way. Advances in computational techniques for assessing the reliability of protein-protein interactions detected by such high-throughput methods are reviewed in this paper, with a focus on techniques that rely only on topological information of the protein interaction network derived from such high-throughput experiments. In particular, we discuss indices that are abstract mathematical characterizations of networks of reliable protein-protein interactions--e.g., "interaction generality" (IG), "interaction reliability by alternative pathways" (IRAP), and "functional similarity weighting" (FSWeight). We also present indices that are based on explicit motifs associated with true-positive protein interactions--e.g., "new interaction generality" (IG2) and "meso-scale motifs" (NeMoFinder).


Asunto(s)
Regiones Promotoras Genéticas , Mapeo de Interacción de Proteínas/métodos , Proteínas/metabolismo , Sitio de Iniciación de la Transcripción , Transcripción Genética , Empalme Alternativo , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Animales , Biología Computacional/métodos , ADN Complementario , Expresión Génica , Humanos , Ratones , Unión Proteica , Proteínas/química , Proteínas/clasificación , Proteínas/genética , Reproducibilidad de los Resultados , Análisis de Secuencia de Proteína
15.
Nucleic Acids Res ; 32(Web Server issue): W73-5, 2004 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-15215354

RESUMEN

InterWeaver is a web server for discovering potential protein interactions with online evidence automatically extracted from protein interaction databases, literature abstracts, domain fusion events and domain interactions. Given a new protein sequence, the server identifies potential interaction partners using two approaches. In the homology-based approach, the system performs sequence homology searches to find similar proteins in other species, and then searches the protein interaction databases and the biomedical literature for interaction partners. In the domain-based approach, the system detects the domains in the input protein sequence and searches databases of domain fusion events and putative domain interactions to suggest potential interacting partners. The results are compiled into a personalized and downloadable interaction report to aid biologists in their discovery of protein interactions. InterWeaver is freely available for academic users at http://interweaver.i2r.a-star.edu.sg/.


Asunto(s)
Mapeo de Interacción de Proteínas , Análisis de Secuencia de Proteína , Programas Informáticos , Biología Computacional , Internet , Estructura Terciaria de Proteína , Proteínas/metabolismo , Homología de Secuencia de Aminoácido
16.
Nucleic Acids Res ; 32(Web Server issue): W69-72, 2004 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-15215353

RESUMEN

ADVICE (Automated Detection and Validation of Interaction by Co-Evolution) is a web tool for predicting and validating protein-protein interactions using the observed co-evolution between interacting proteins. Interacting proteins are known to share similar evolutionary histories since they undergo coordinated evolutionary changes to preserve interactions and functionalities. The web tool automates a commonly adopted methodology to quantify the similarities in proteins' evolutionary histories for postulating potential protein-protein interactions. ADVICE can also be used to validate experimental data against spurious protein interactions by identifying those that have few similarities in their evolutionary histories. The web tool accepts a list of protein sequences or sequence pairs as input and retrieves orthologous sequences to compute the similarities in the proteins' evolutionary histories. To facilitate hypothesis generation, detected co-evolved proteins can be visualized as a network at the website. ADVICE is available at http://advice.i2r.a-star.edu.sg.


Asunto(s)
Evolución Molecular , Mapeo de Interacción de Proteínas , Análisis de Secuencia de Proteína , Programas Informáticos , Internet , Proteínas/genética , Proteínas/metabolismo , Reproducibilidad de los Resultados , Homología de Secuencia de Aminoácido , Interfaz Usuario-Computador
17.
Nucleic Acids Res ; 31(1): 251-4, 2003 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-12519994

RESUMEN

Advances in proteomics technology have enabled new proteins to be discovered at an unprecedented speed, and high throughput experimental methods have been developed to detect protein interactions and complexes en masse. Such bottom-up, data-driven approach has resulted in data that may be uninformative or potentially errorful, requiring further validation and annotation. The InterDom database focuses on providing supporting evidence for the detected protein interactions based on putative protein domain interactions. Using an integrative approach, InterDom derives potential domain interactions by combining data from multiple sources, ranging from domain fusions, protein interactions and complexes, to scientific literature. The InterDom database is available at http://InterDom.lit.org.sg.


Asunto(s)
Bases de Datos de Proteínas , Estructura Terciaria de Proteína , Proteínas/química , Proteínas/metabolismo , Internet , Sustancias Macromoleculares , Proteínas/genética , Proteínas Recombinantes de Fusión/química , Proteínas Recombinantes de Fusión/metabolismo , Reproducibilidad de los Resultados , Técnicas del Sistema de Dos Híbridos
18.
Bioinformatics ; 19 Suppl 2: ii93-102, 2003 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-14534178

RESUMEN

METHODS AND RESULTS: We introduce a new method to discover many diversified and significant rules from high dimensional profiling data. We also propose to aggregate the discriminating power of these rules for reliable predictions. The discovered rules are found to contain low-ranked features; these features are found to be sometimes necessary for classifiers to achieve perfect accuracy. The use of low-ranked but essential features in our method is in contrast to the prevailing use of an ad-hoc number of only top-ranked features. On a wide range of data sets, our method displayed highly competitive accuracy compared to the best performance of other kinds of classification models. In addition to accuracy, our method also provides comprehensible rules to help elucidate the translation between raw data and useful knowledge.


Asunto(s)
Algoritmos , Biomarcadores de Tumor/análisis , Diagnóstico por Computador/métodos , Perfilación de la Expresión Génica/métodos , Proteínas de Neoplasias/análisis , Neoplasias/diagnóstico , Neoplasias/metabolismo , Humanos , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
20.
Genome Inform ; 16(2): 260-9, 2005.
Artículo en Inglés | MEDLINE | ID: mdl-16901108

RESUMEN

While recent technological advances have made available large datasets of experimentally-detected pairwise protein-protein interactions, there is still a lack of experimentally-determined protein complex data. To make up for this lack of protein complex data, we explore the mining of existing protein interaction graphs for protein complexes. This paper proposes a novel graph mining algorithm to detect the dense neighborhoods (highly connected regions) in an interaction graph which may correspond to protein complexes. Our algorithm first locates local cliques for each graph vertex (protein) and then merge the detected local cliques according to their affinity to form maximal dense regions. We present experimental results with yeast protein interaction data to demonstrate the effectiveness of our proposed method. Compared with other existing techniques, our predicted complexes can match or overlap significantly better with the known protein complexes in the MIPS benchmark database. Novel protein complexes were also predicted to help biologists in their search for new protein complexes.


Asunto(s)
Algoritmos , Complejos Multiproteicos/fisiología , Mapeo de Interacción de Proteínas/estadística & datos numéricos , Complejos Multiproteicos/química , Valor Predictivo de las Pruebas , Mapeo de Interacción de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA