Pesquisa | Portal de Pesquisa da BVS Enfermagem

1.

A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data.

Hu, Yongli; Hase, Takeshi; Li, Hui Peng; Prabhakar, Shyam; Kitano, Hiroaki; Ng, See Kiong; Ghosh, Samik; Wee, Lawrence Jin Kiat.

BMC Genomics ; 17(Suppl 13): 1025, 2016 12 22.

Artigo em Inglês | MEDLINE | ID: mdl-28155657

RESUMO

BACKGROUND: The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). RESULTS: Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. CONCLUSION: This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.

Assuntos

Encéfalo/metabolismo , Perfilação da Expressão Gênica , Aprendizado de Máquina , Organogênese/genética , Análise de Célula Única , Transcriptoma , Algoritmos , Biomarcadores , Encéfalo/embriologia , Encéfalo/crescimento & desenvolvimento , Modelos Estatísticos , Neurogênese/genética , Especificidade de Órgãos , Reprodutibilidade dos Testes , Análise de Célula Única/métodos , Máquina de Vetores de Suporte

2.

PLW: Probabilistic Local Walks for detecting protein complexes from protein interaction networks.

Wong, Daniel; Li, Xiao-Li; Wu, Min; Zheng, Jie; Ng, See-Kiong.

BMC Genomics ; 14 Suppl 5: S15, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24564427

RESUMO

BACKGROUND: Many biological processes are carried out by proteins interacting with each other in the form of protein complexes. However, large-scale detection of protein complexes has remained constrained by experimental limitations. As such, computational detection of protein complexes by applying clustering algorithms on the abundantly available protein-protein interaction (PPI) networks is an important alternative. However, many current algorithms have overlooked the importance of selecting seeds for expansion into clusters without excluding important proteins and including many noisy ones, while ensuring a high degree of functional homogeneity amongst the proteins detected for the complexes. RESULTS: We designed a novel method called Probabilistic Local Walks (PLW) which clusters regions in a PPI network with high functional similarity to find protein complex cores with high precision and efficiency in O (|V| log |V| + |E|) time. A seed selection strategy, which prioritises seeds with dense neighbourhoods, was devised. We defined a topological measure, called common neighbour similarity, to estimate the functional similarity of two proteins given the number of their common neighbours. CONCLUSIONS: Our proposed PLW algorithm achieved the highest F-measure (recall and precision) when compared to 11 state-of-the-art methods on yeast protein interaction data, with an improvement of 16.7% over the next highest score. Our experiments also demonstrated that our seed selection strategy is able to increase algorithm precision when applied to three previous protein complex mining techniques. AVAILABILITY: The software, datasets and predicted complexes are available at http://wonglkd.github.io/PLW.

Assuntos

Biologia Computacional/métodos , Proteínas Fúngicas/análise , Leveduras/metabolismo , Algoritmos , Mapeamento de Interação de Proteínas , Software

3.

Positive-unlabeled learning for disease gene identification.

Yang, Peng; Li, Xiao-Li; Mei, Jian-Ping; Kwoh, Chee-Keong; Ng, See-Kiong.

Bioinformatics ; 28(20): 2640-7, 2012 Oct 15.

Artigo em Inglês | MEDLINE | ID: mdl-22923290

RESUMO

BACKGROUND: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes. However, such kind of classifiers is actually built from a noisy negative set N as there can be unknown disease genes in N itself. As a result, the classifiers do not perform as well as they could be. RESULT: Instead of treating the unknown genes as negative examples in N, we treat them as an unlabeled set U. We design a novel positive-unlabeled (PU) learning algorithm PUDI (PU learning for disease gene identification) to build a classifier using P and U. We first partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN and weak negative set WN. The weighted support vector machines are then used to build a multi-level classifier based on the four training sets and positive training set P to identify disease genes. Our experimental results demonstrate that our proposed PUDI algorithm outperformed the existing methods significantly. CONCLUSION: The proposed PUDI algorithm is able to identify disease genes more accurately by treating the unknown data more appropriately as unlabeled set U instead of negative set N. Given that many machine learning problems in biomedical research do involve positive and unlabeled data instead of negative data, it is possible that the machine learning methods for these problems can be further improved by adopting PU learning methods, as we have done here for disease gene identification. AVAILABILITY AND IMPLEMENTATION: The executable program and data are available at http://www1.i2r.a-star.edu.sg/~xlli/PUDI/PUDI.html.

Assuntos

Inteligência Artificial , Doença/genética , Genes , Algoritmos , Humanos , Máquina de Vetores de Suporte

4.

D-SLIMMER: domain-SLiM interaction motifs miner for sequence based protein-protein interaction data.

Hugo, Willy; Ng, See-Kiong; Sung, Wing-Kin.

J Proteome Res ; 10(12): 5285-95, 2011 Dec 02.

Artigo em Inglês | MEDLINE | ID: mdl-22004555

RESUMO

Many biologically important protein-protein interactions (PPIs) have been found to be mediated by short linear motifs (SLiMs). These interactions are mediated by the binding of a protein domain, often with a nonlinear interaction interface, to a SLiM. We propose a method called D-SLIMMER to mine for SLiMs in PPI data on the basis of the interaction density between a nonlinear motif (i.e., a protein domain) in one protein and a SLiM in the other protein. Our results on a benchmark of 113 experimentally verified reference SLiMs showed that D-SLIMMER outperformed existing methods notably for discovering domain-SLiMs interaction motifs. To illustrate the significance of the SLiMs detected, we highlighted two SLiMs discovered from the PPI data by D-SLIMMER that are variants of the known ELM SLiM, as well as a literature-backed SLiM that is yet to be listed in the reference databases. We also presented a novel SLiM predicted by D-SLIMMER that was strongly supported by existing biological literatures. These examples showed that D-SLIMMER is able to find SLiMs that are biologically relevant.

Assuntos

Algoritmos , Mineração de Dados/métodos , Domínios e Motivos de Interação entre Proteínas , Mapeamento de Interação de Proteínas/métodos , Software , Motivos de Aminoácidos , Sequência de Aminoácidos , Animais , Biologia Computacional/métodos , Bases de Dados de Proteínas , Humanos , Camundongos , Dados de Sequência Molecular , Reprodutibilidade dos Testes , Alinhamento de Sequência , Análise de Sequência de Proteína/métodos

5.

SLiM on Diet: finding short linear motifs on domain interaction interfaces in Protein Data Bank.

Hugo, Willy; Song, Fushan; Aung, Zeyar; Ng, See-Kiong; Sung, Wing-Kin.

Bioinformatics ; 26(8): 1036-42, 2010 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-20167627

RESUMO

MOTIVATION: An important class of protein interactions involves the binding of a protein's domain to a short linear motif (SLiM) on its interacting partner. Extracting such motifs, either experimentally or computationally, is challenging because of their weak binding and high degree of degeneracy. Recent rapid increase of available protein structures provides an excellent opportunity to study SLiMs directly from their 3D structures. RESULTS: Using domain interface extraction (Diet), we characterized 452 distinct SLiMs from the Protein Data Bank (PDB), of which 155 are validated in varying degrees-40 have literature validation, 54 are supported by at least one domain-peptide structural instance, and another 61 have overrepresentation in high-throughput PPI data. We further observed that the lacklustre coverage of existing computational SLiM detection methods could be due to the common assumption that most SLiMs occur outside globular domain regions. 198 of 452 SLiM that we reported are actually found on domain-domain interface; some of them are implicated in autoimmune and neurodegenerative diseases. We suggest that these SLiMs would be useful for designing inhibitors against the pathogenic protein complexes underlying these diseases. Our findings show that 3D structure-based SLiM detection algorithms can provide a more complete coverage of SLiM-mediated protein interactions than current sequence-based approaches.

Assuntos

Genômica/métodos , Domínios e Motivos de Interação entre Proteínas , Software , Motivos de Aminoácidos , Bases de Dados de Proteínas , Análise de Sequência de Proteína/métodos

6.

MDPD: an integrated genetic information resource for Parkinson's disease.

Tang, Suisheng; Zhang, Zhuo; Kavitha, Gopalakrishnan; Tan, Eng-King; Ng, See Kiong.

Nucleic Acids Res ; 37(Database issue): D858-62, 2009 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-18948286

RESUMO

Parkinson's disease (PD) is the second most common neurodegenerative disorder affecting millions of people. Both environmental and genetic factors play important roles in its causation and development. Genetic analysis has shown that over 100 genes are correlated with the etiology and pathology of PD. However, accessing genetic information in a consistent and fruitful way is not an easy task. The Mutation Database for Parkinson's Disease (MDPD) is designed to fulfill the need for information integration so that users can easily retrieve, inspect and enhance their knowledge on PD. The database contains 2391 entries on 202 genes extracted from 576 publications and manually examined by biomedical researchers. Each genetic substitution and the resulting impact are clearly labelled and linked to its primary reference. Every reported gene has a summary page that provides information on the variation impact, mutation type, the studied population, mutation position and reference collection. In addition, MDPD provides a unique functionality for users to compare the differences on the type of mutations among ethnic groups. As such, we hope that MDPD will serve as a valuable tool to bridge the gap between genetic analysis and clinical practice. MDPD is publicly accessible at http://datam.i2r.a-star.edu.sg/mdpd/.

Assuntos

Bases de Dados Genéticas , Mutação , Doença de Parkinson/genética , Humanos , Polimorfismo de Nucleotídeo Único , Integração de Sistemas

7.

Integrating diverse biological and computational sources for reliable protein-protein interactions.

Wu, Min; Li, Xiaoli; Chua, Hon Nian; Kwoh, Chee-Keong; Ng, See-Kiong.

BMC Bioinformatics ; 11 Suppl 7: S8, 2010 Oct 15.

Artigo em Inglês | MEDLINE | ID: mdl-21106130

RESUMO

BACKGROUND: Protein-protein interactions (PPIs) play important roles in various cellular processes. However, the low quality of current PPI data detected from high-throughput screening techniques has diminished the potential usefulness of the data. We need to develop a method to address the high data noise and incompleteness of PPI data, namely, to filter out inaccurate protein interactions (false positives) and predict putative protein interactions (false negatives). RESULTS: In this paper, we proposed a novel two-step method to integrate diverse biological and computational sources of supporting evidence for reliable PPIs. The first step, interaction binning or InterBIN, groups PPIs together to more accurately estimate the likelihood (Bin-Confidence score) that the protein pairs interact for each biological or computational evidence source. The second step, interaction classification or InterCLASS, integrates the collected Bin-Confidence scores to build classifiers and identify reliable interactions. CONCLUSIONS: We performed comprehensive experiments on two benchmark yeast PPI datasets. The experimental results showed that our proposed method can effectively eliminate false positives in detected PPIs and identify false negatives by predicting novel yet reliable PPIs. Our proposed method also performed significantly better than merely using each of individual evidence sources, illustrating the importance of integrating various biological and computational sources of data and evidence.

Assuntos

Biologia Computacional/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Mapeamento de Interação de Proteínas/métodos , Reprodutibilidade dos Testes , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Software

8.

Computational approaches for detecting protein complexes from protein interaction networks: a survey.

Li, Xiaoli; Wu, Min; Kwoh, Chee-Keong; Ng, See-Kiong.

BMC Genomics ; 11 Suppl 1: S3, 2010 Feb 10.

Artigo em Inglês | MEDLINE | ID: mdl-20158874

RESUMO

BACKGROUND: Most proteins form macromolecular complexes to perform their biological functions. However, experimentally determined protein complex data, especially of those involving more than two protein partners, are relatively limited in the current state-of-the-art high-throughput experimental techniques. Nevertheless, many techniques (such as yeast-two-hybrid) have enabled systematic screening of pairwise protein-protein interactions en masse. Thus computational approaches for detecting protein complexes from protein interaction data are useful complements to the limited experimental methods. They can be used together with the experimental methods for mapping the interactions of proteins to understand how different proteins are organized into higher-level substructures to perform various cellular functions. RESULTS: Given the abundance of pairwise protein interaction data from high-throughput genome-wide experimental screenings, a protein interaction network can be constructed from protein interaction data by considering individual proteins as the nodes, and the existence of a physical interaction between a pair of proteins as a link. This binary protein interaction graph can then be used for detecting protein complexes using graph clustering techniques. In this paper, we review and evaluate the state-of-the-art techniques for computational detection of protein complexes, and discuss some promising research directions in this field. CONCLUSIONS: Experimental results with yeast protein interaction data show that the interaction subgraphs discovered by various computational methods matched well with actual protein complexes. In addition, the computational approaches have also improved in performance over the years. Further improvements could be achieved if the quality of the underlying protein interaction data can be considered adequately to minimize the undesirable effects from the irrelevant and noisy sources, and the various biological evidences can be better incorporated into the detection process to maximize the exploitation of the increasing wealth of biological knowledge available.

Assuntos

Proteínas/análise , Biologia de Sistemas/métodos , Biometria , Perfilação da Expressão Gênica , Humanos , Ligação Proteica , Mapeamento de Interação de Proteínas , Proteínas/metabolismo

9.

A core-attachment based method to detect protein complexes in PPI networks.

Wu, Min; Li, Xiaoli; Kwoh, Chee-Keong; Ng, See-Kiong.

BMC Bioinformatics ; 10: 169, 2009 Jun 02.

Artigo em Inglês | MEDLINE | ID: mdl-19486541

RESUMO

BACKGROUND: How to detect protein complexes is an important and challenging task in post genomic era. As the increasing amount of protein-protein interaction (PPI) data are available, we are able to identify protein complexes from PPI networks. However, most of current studies detect protein complexes based solely on the observation that dense regions in PPI networks may correspond to protein complexes, but fail to consider the inherent organization within protein complexes. RESULTS: To provide insights into the organization of protein complexes, this paper presents a novel core-attachment based method (COACH) which detects protein complexes in two stages. It first detects protein-complex cores as the "hearts" of protein complexes and then includes attachments into these cores to form biologically meaningful structures. We evaluate and analyze our predicted protein complexes from two aspects. First, we perform a comprehensive comparison between our proposed method and existing techniques by comparing the predicted complexes against benchmark complexes. Second, we also validate the core-attachment structures using various biological evidence and knowledge. CONCLUSION: Our proposed COACH method has been applied on two different yeast PPI networks and the experimental results show that COACH performs significantly better than the state-of-the-art techniques. In addition, the identified complexes with core-attachment structures are demonstrated to match very well with existing biological knowledge and thus provide more insights for future biological study.

Assuntos

Complexos Multiproteicos , Mapeamento de Interação de Proteínas/métodos , Proteínas , Software , Algoritmos , Interpretação Estatística de Dados , Bases de Dados de Proteínas , Complexos Multiproteicos/química , Complexos Multiproteicos/metabolismo , Domínios e Motivos de Interação entre Proteínas , Proteínas/química , Proteínas/metabolismo , Reprodutibilidade dos Testes

10.

PPiClust: efficient clustering of 3D protein-protein interaction interfaces.

Aung, Zeyar; Tan, Soon-Heng; Ng, See-Kiong; Tan, Kian-Lee.

J Bioinform Comput Biol ; 6(3): 415-33, 2008 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-18574857

RESUMO

The biological mechanisms through which proteins interact with one another are best revealed by studying the structural interfaces between interacting proteins. Protein-protein interfaces can be extracted from three-dimensional (3D) structural data of protein complexes and then clustered to derive biological insights. However, conventional protein interface clustering methods lack computational scalability and statistical support. In this work, we present a new method named "PPiClust" to systematically encode, cluster, and analyze similar 3D interface patterns in protein complexes efficiently. Experimental results showed that our method is effective in discovering visually consistent and statistically significant clusters of interfaces, and at the same time sufficiently time-efficient to be performed on a single computer. The interface clusters are also useful for uncovering the structural basis of protein interactions. Analysis of the resulting interface clusters revealed groups of structurally diverse proteins having similar interface patterns. We also found, in some of the interface clusters, the presence of well-known linear binding motifs which were noncontiguous in the primary sequences. These results suggest that PPiClust can discover not only statistically significant, but also biologically significant, protein interface clusters from protein complex structural data.

Assuntos

Conformação Proteica , Análise por Conglomerados , Ligação Proteica/fisiologia , Proteínas/química , Relação Estrutura-Atividade

11.

Increasing confidence of protein interactomes using network topological metrics.

Chen, Jin; Hsu, Wynne; Lee, Mong Li; Ng, See-Kiong.

Bioinformatics ; 22(16): 1998-2004, 2006 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-16787971

RESUMO

MOTIVATION: Experimental limitations in high-throughput protein-protein interaction detection methods have resulted in low quality interaction datasets that contained sizable fractions of false positives and false negatives. Small-scale, focused experiments are then needed to complement the high-throughput methods to extract true protein interactions. However, the naturally vast interactomes would require much more scalable approaches. RESULTS: We describe a novel method called IRAP* as a computational complement for repurification of the highly erroneous experimentally derived protein interactomes. Our method involves an iterative process of removing interactions that are confidently identified as false positives and adding interactions detected as false negatives into the interactomes. Identification of both false positives and false negatives are performed in IRAP* using interaction confidence measures based on network topological metrics. Potential false positives are identified amongst the detected interactions as those with very low computed confidence values, while potential false negatives are discovered as the undetected interactions with high computed confidence values. Our results from applying IRAP* on large-scale interaction datasets generated by the popular yeast-two-hybrid assays for yeast, fruit fly and worm showed that the computationally repurified interaction datasets contained potentially lower fractions of false positive and false negative errors based on functional homogeneity. AVAILABILITY: The confidence indices for PPIs in yeast, fruit fly and worm as computed by our method can be found at our website http://www.comp.nus.edu.sg/~chenjin/fpfn.

Assuntos

Biologia Computacional/métodos , Mapeamento de Interação de Proteínas , Proteômica/métodos , Animais , Caenorhabditis elegans , Simulação por Computador , Bases de Dados de Proteínas , Drosophila , Reações Falso-Positivas , Ligação Proteica , Saccharomyces cerevisiae/metabolismo , Análise de Sequência de Proteína , Software , Técnicas do Sistema de Duplo-Híbrido

12.

Systematic gene function prediction from gene expression data by using a fuzzy nearest-cluster method.

Li, Xiao-Li; Tan, Yin-Chet; Ng, See-Kiong.

BMC Bioinformatics ; 7 Suppl 4: S23, 2006 Dec 12.

Artigo em Inglês | MEDLINE | ID: mdl-17217516

RESUMO

BACKGROUND: Quantitative simultaneous monitoring of the expression levels of thousands of genes under various experimental conditions is now possible using microarray experiments. However, there are still gaps toward whole-genome functional annotation of genes using the gene expression data. RESULTS: In this paper, we propose a novel technique called Fuzzy Nearest Clusters for genome-wide functional annotation of unclassified genes. The technique consists of two steps: an initial hierarchical clustering step to detect homogeneous co-expressed gene subgroups or clusters in each possibly heterogeneous functional class; followed by a classification step to predict the functional roles of the unclassified genes based on their corresponding similarities to the detected functional clusters. CONCLUSION: Our experimental results with yeast gene expression data showed that the proposed method can accurately predict the genes' functions, even those with multiple functional roles, and the prediction performance is most independent of the underlying heterogeneity of the complex functional classes, as compared to the other conventional gene function prediction approaches.

Assuntos

Análise por Conglomerados , Lógica Fuzzy , Perfilação da Expressão Gênica/métodos , Expressão Gênica/fisiologia , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Algoritmos , Inteligência Artificial , Simulação por Computador , Modelos Biológicos , Proteoma/genética

13.

A correlated motif approach for finding short linear motifs from protein interaction networks.

Tan, Soon-Heng; Hugo, Willy; Sung, Wing-Kin; Ng, See-Kiong.

BMC Bioinformatics ; 7: 502, 2006 Nov 16.

Artigo em Inglês | MEDLINE | ID: mdl-17107624

RESUMO

BACKGROUND: An important class of interaction switches for biological circuits and disease pathways are short binding motifs. However, the biological experiments to find these binding motifs are often laborious and expensive. With the availability of protein interaction data, novel binding motifs can be discovered computationally: by applying standard motif extracting algorithms on protein sequence sets each interacting with either a common protein or a protein group with similar properties. The underlying assumption is that proteins with common interacting partners will share some common binding motifs. Although novel binding motifs have been discovered with such approach, it is not applicable if a protein interacts with very few other proteins or when prior knowledge of protein group is not available or erroneous. Experimental noise in input interaction data can further deteriorate the dismal performance of such approaches. RESULTS: We propose a novel approach of finding correlated short sequence motifs from protein-protein interaction data to effectively circumvent the above-mentioned limitations. Correlated motifs are those motifs that consistently co-occur only in pairs of interacting protein sequences, and could possibly interact with each other directly or indirectly to mediate interactions. We adopted the (l, d)-motif model and formulate finding the correlated motifs as an (l, d)-motif pair finding problem. We present both an exact algorithm, D-MOTIF, as well as its approximation algorithm, D-STAR to solve this problem. Evaluation on extensive simulated data showed that our approach not only eliminated the need for any prior protein grouping, but is also more robust in extracting motifs from noisy interaction data. Application on two biological datasets (SH3 interaction network and TGFbeta signaling network) demonstrates that the approach can extract correlated motifs that correspond to actual interacting subsequences. CONCLUSION: The correlated motif approach outlined in this paper is able to find correlated linear motifs from sparse and noisy interaction data. This, in turn, will expedite the discovery of novel linear binding motifs, and facilitate the studies of biological pathways mediated by them.

Assuntos

Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Algoritmos , Motivos de Aminoácidos , Bases de Dados de Proteínas , Modelos Biológicos , Modelos Moleculares , Modelos Estatísticos , Linguagens de Programação , Estrutura Terciária de Proteína , Proteínas/química , Sensibilidade e Especificidade , Análise de Sequência de Proteína/métodos , Transdução de Sinais , Software

14.

Increasing confidence of protein-protein interactomes.

Chen, Jin; Chua, Hon Nian; Hsu, Wynne; Lee, Mong-Li; Ng, See-Kiong; Saito, Rintaro; Sung, Wing-Kin; Wong, Limsoon.

Genome Inform ; 17(2): 284-97, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-17514831

RESUMO

High-throughput experimental methods, such as yeast-two-hybrid and phage display, have fairly high levels of false positives (and false negatives). Thus the list of protein-protein interactions detected by such experiments would need additional wet laboratory validation. It would be useful if the list could be prioritized in some way. Advances in computational techniques for assessing the reliability of protein-protein interactions detected by such high-throughput methods are reviewed in this paper, with a focus on techniques that rely only on topological information of the protein interaction network derived from such high-throughput experiments. In particular, we discuss indices that are abstract mathematical characterizations of networks of reliable protein-protein interactions--e.g., "interaction generality" (IG), "interaction reliability by alternative pathways" (IRAP), and "functional similarity weighting" (FSWeight). We also present indices that are based on explicit motifs associated with true-positive protein interactions--e.g., "new interaction generality" (IG2) and "meso-scale motifs" (NeMoFinder).

Assuntos

Regiões Promotoras Genéticas , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Sítio de Iniciação de Transcrição , Transcrição Gênica , Processamento Alternativo , Motivos de Aminoácidos , Sequência de Aminoácidos , Animais , Biologia Computacional/métodos , DNA Complementar , Expressão Gênica , Humanos , Camundongos , Ligação Proteica , Proteínas/química , Proteínas/classificação , Proteínas/genética , Reprodutibilidade dos Testes , Análise de Sequência de Proteína

15.

InterWeaver: interaction reports for discovering potential protein interaction partners with online evidence.

Zhang, Zhuo; Ng, See-Kiong.

Nucleic Acids Res ; 32(Web Server issue): W73-5, 2004 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-15215354

RESUMO

InterWeaver is a web server for discovering potential protein interactions with online evidence automatically extracted from protein interaction databases, literature abstracts, domain fusion events and domain interactions. Given a new protein sequence, the server identifies potential interaction partners using two approaches. In the homology-based approach, the system performs sequence homology searches to find similar proteins in other species, and then searches the protein interaction databases and the biomedical literature for interaction partners. In the domain-based approach, the system detects the domains in the input protein sequence and searches databases of domain fusion events and putative domain interactions to suggest potential interacting partners. The results are compiled into a personalized and downloadable interaction report to aid biologists in their discovery of protein interactions. InterWeaver is freely available for academic users at http://interweaver.i2r.a-star.edu.sg/.

Assuntos

Mapeamento de Interação de Proteínas , Análise de Sequência de Proteína , Software , Biologia Computacional , Internet , Estrutura Terciária de Proteína , Proteínas/metabolismo , Homologia de Sequência de Aminoácidos

16.

ADVICE: Automated Detection and Validation of Interaction by Co-Evolution.

Tan, Soon-Heng; Zhang, Zhuo; Ng, See-Kiong.

Nucleic Acids Res ; 32(Web Server issue): W69-72, 2004 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-15215353

RESUMO

ADVICE (Automated Detection and Validation of Interaction by Co-Evolution) is a web tool for predicting and validating protein-protein interactions using the observed co-evolution between interacting proteins. Interacting proteins are known to share similar evolutionary histories since they undergo coordinated evolutionary changes to preserve interactions and functionalities. The web tool automates a commonly adopted methodology to quantify the similarities in proteins' evolutionary histories for postulating potential protein-protein interactions. ADVICE can also be used to validate experimental data against spurious protein interactions by identifying those that have few similarities in their evolutionary histories. The web tool accepts a list of protein sequences or sequence pairs as input and retrieves orthologous sequences to compute the similarities in the proteins' evolutionary histories. To facilitate hypothesis generation, detected co-evolved proteins can be visualized as a network at the website. ADVICE is available at http://advice.i2r.a-star.edu.sg.

Assuntos

Evolução Molecular , Mapeamento de Interação de Proteínas , Análise de Sequência de Proteína , Software , Internet , Proteínas/genética , Proteínas/metabolismo , Reprodutibilidade dos Testes , Homologia de Sequência de Aminoácidos , Interface Usuário-Computador

17.

InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes.

Ng, See-Kiong; Zhang, Zhuo; Tan, Soon-Heng; Lin, Kui.

Nucleic Acids Res ; 31(1): 251-4, 2003 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-12519994

RESUMO

Advances in proteomics technology have enabled new proteins to be discovered at an unprecedented speed, and high throughput experimental methods have been developed to detect protein interactions and complexes en masse. Such bottom-up, data-driven approach has resulted in data that may be uninformative or potentially errorful, requiring further validation and annotation. The InterDom database focuses on providing supporting evidence for the detected protein interactions based on putative protein domain interactions. Using an integrative approach, InterDom derives potential domain interactions by combining data from multiple sources, ranging from domain fusions, protein interactions and complexes, to scientific literature. The InterDom database is available at http://InterDom.lit.org.sg.

Assuntos

Bases de Dados de Proteínas , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/metabolismo , Internet , Substâncias Macromoleculares , Proteínas/genética , Proteínas Recombinantes de Fusão/química , Proteínas Recombinantes de Fusão/metabolismo , Reprodutibilidade dos Testes , Técnicas do Sistema de Duplo-Híbrido

18.

Discovery of significant rules for classifying cancer diagnosis data.

Li, Jinyan; Liu, Huiqing; Ng, See-Kiong; Wong, Limsoon.

Bioinformatics ; 19 Suppl 2: ii93-102, 2003 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-14534178

RESUMO

METHODS AND RESULTS: We introduce a new method to discover many diversified and significant rules from high dimensional profiling data. We also propose to aggregate the discriminating power of these rules for reliable predictions. The discovered rules are found to contain low-ranked features; these features are found to be sometimes necessary for classifiers to achieve perfect accuracy. The use of low-ranked but essential features in our method is in contrast to the prevailing use of an ad-hoc number of only top-ranked features. On a wide range of data sets, our method displayed highly competitive accuracy compared to the best performance of other kinds of classification models. In addition to accuracy, our method also provides comprehensible rules to help elucidate the translation between raw data and useful knowledge.

Assuntos

Algoritmos , Biomarcadores Tumorais/análise , Diagnóstico por Computador/métodos , Perfilação da Expressão Gênica/métodos , Proteínas de Neoplasias/análise , Neoplasias/diagnóstico , Neoplasias/metabolismo , Humanos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

19.

Brief overview of bioinformatics activities in Singapore.

Eisenhaber, Frank; Kwoh, Chee-Keong; Ng, See-Kiong; Sung, Wing-Kin; Sung, Wing-King; Wong, Limsoon.

PLoS Comput Biol ; 5(9): e1000508, 2009 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-19779544

Assuntos

Pesquisa Biomédica , Biologia Computacional , Singapura

20.

Interaction graph mining for protein complexes using local clique merging.

Li, Xiao-Li; Tan, Soon-Heng; Foo, Chuan-Sheng; Ng, See-Kiong.

Genome Inform ; 16(2): 260-9, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-16901108

RESUMO

While recent technological advances have made available large datasets of experimentally-detected pairwise protein-protein interactions, there is still a lack of experimentally-determined protein complex data. To make up for this lack of protein complex data, we explore the mining of existing protein interaction graphs for protein complexes. This paper proposes a novel graph mining algorithm to detect the dense neighborhoods (highly connected regions) in an interaction graph which may correspond to protein complexes. Our algorithm first locates local cliques for each graph vertex (protein) and then merge the detected local cliques according to their affinity to form maximal dense regions. We present experimental results with yeast protein interaction data to demonstrate the effectiveness of our proposed method. Compared with other existing techniques, our predicted complexes can match or overlap significantly better with the known protein complexes in the MIPS benchmark database. Novel protein complexes were also predicted to help biologists in their search for new protein complexes.

Assuntos

Algoritmos , Complexos Multiproteicos/fisiologia , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Complexos Multiproteicos/química , Valor Preditivo dos Testes , Mapeamento de Interação de Proteínas/métodos , Proteínas de Saccharomyces cerevisiae/química

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA