Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Más filtros











Base de datos
Intervalo de año de publicación
1.
Sci Total Environ ; 950: 175385, 2024 Nov 10.
Artículo en Inglés | MEDLINE | ID: mdl-39122048

RESUMEN

In silico modelling takes the advantage of accelerating ecotoxicological assessments on hazardous chemicals without conducting risky in vivo experiments under ethic regulation. To date, the prevailing strategy of one model for one species cannot be well generalized to multi-species modelling. In this work, we propose a new strategy of one model for multiple species to facilitate knowledge transfer across aquatic species. The available lethal concentration values of 4952 pesticides on 651 fish species are aggregated into one toxicity response matrix, purely through which we attempt to unravel fish toxicosis-phylogenesis relationships and pesticide toxicity-structure relationships via clustering techniques including non-negative matrix factorization (NMF) and hierarchical clustering. The clustering results suggest that (1) close NMF weights indicate close species-toxicosis and pesticide-toxicity profiles; (2) and that species toxicosis patterns are related with species phylogenetic relationships; (3) and that close pesticide-toxicity profiles indicate similar atom-pair structural fingerprints. These environmental, chemical and biological insights can be used as expert knowledge for environmentalists to manually gain knowledge about untested species/pesticides from tested species/pesticides, and meanwhile provide support for us to build in silico models from species phylogenetic and pesticide structural points of view. Besides unravelling the mechanisms behind toxicity response, we also adopt stratified cross validation and external test to validate the reliability of using NMF to predict missing toxicity values. Independent test on external data shows that NMF achieves 0.8404-0.9397 R2 on four fish species. In the context of toxicity prediction, non-negative matrix factorization can be viewed as a model based on quantitative activity-activity relationships (QAAR), and provides an alternative approach of inferring toxicity values on untested species from tested species.


Asunto(s)
Peces , Plaguicidas , Contaminantes Químicos del Agua , Plaguicidas/toxicidad , Contaminantes Químicos del Agua/toxicidad , Animales , Análisis por Conglomerados , Ecotoxicología , Organismos Acuáticos/efectos de los fármacos
2.
J Chem Ecol ; 49(11-12): 681-695, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37779180

RESUMEN

Natural products (NP) or secondary metabolites, as a class of small chemical molecules that are naturally synthesized by chromosomally clustered biosynthesis genes (also called biosynthetic gene clusters, BGCs) encoded enzymes or enzyme complexes, mediates the bioecological interactions between host and microbiota and provides a natural reservoir for screening drug-like therapeutic pharmaceuticals. In this work, we propose a multi-label learning framework to functionally annotate natural products or secondary metabolites solely from their catalytical biosynthetic gene clusters without experimentally conducting NP structural resolutions. All chemical classes and bioactivities constitute the label space, and the sequence domains of biosynthetic gene clusters that catalyse the biosynthesis of natural products constitute the feature space. In this multi-label learning framework, a joint representation of features (BGCs domains) and labels (natural products annotations) is efficiently learnt in an integral and low-dimensional space to accurately define the inter-class boundaries and scale to the learning problem of many imbalanced labels. Computational results on experimental data show that the proposed framework achieves satisfactory multi-label learning performance, and the learnt patterns of BGCs domains are transferrable across bacteria, or even across kingdom, for instance, from bacteria to Arabidopsis thaliana. Lastly, take Arabidopsis thaliana and its rhizosphere microbiome for example, we propose a pipeline combining existing BGCs identification tools and this proposed framework to find and functionally annotate novel natural products for downstream bioecological studies in terms of plant-microbiota-soil interactions and plant environmental adaption.


Asunto(s)
Arabidopsis , Productos Biológicos , Microbiota , Biología Computacional/métodos , Arabidopsis/genética , Microbiota/genética , Familia de Multigenes , Vías Biosintéticas/genética
3.
Sci Rep ; 11(1): 17619, 2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-34475500

RESUMEN

Understanding drug-drug interactions is an essential step to reduce the risk of adverse drug events before clinical drug co-prescription. Existing methods, commonly integrating heterogeneous data to increase model performance, often suffer from a high model complexity, As such, how to elucidate the molecular mechanisms underlying drug-drug interactions while preserving rational biological interpretability is a challenging task in computational modeling for drug discovery. In this study, we attempt to investigate drug-drug interactions via the associations between genes that two drugs target. For this purpose, we propose a simple f drug target profile representation to depict drugs and drug pairs, from which an l2-regularized logistic regression model is built to predict drug-drug interactions. Furthermore, we define several statistical metrics in the context of human protein-protein interaction networks and signaling pathways to measure the interaction intensity, interaction efficacy and action range between two drugs. Large-scale empirical studies including both cross validation and independent test show that the proposed drug target profiles-based machine learning framework outperforms existing data integration-based methods. The proposed statistical metrics show that two drugs easily interact in the cases that they target common genes; or their target genes connect via short paths in protein-protein interaction networks; or their target genes are located at signaling pathways that have cross-talks. The unravelled mechanisms could provide biological insights into potential adverse drug reactions of co-prescribed drugs.


Asunto(s)
Descubrimiento de Drogas , Interacciones Farmacológicas , Aprendizaje Automático , Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Humanos , Mapas de Interacción de Proteínas/efectos de los fármacos , Transducción de Señal/efectos de los fármacos
4.
Comput Struct Biotechnol J ; 18: 100-113, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-31956393

RESUMEN

Pathogen-host protein interactions are fundamental for pathogens to manipulate host signaling pathways and subvert host immune defense. For most pathogens, very few or no experimental studies have been conducted to investigate their signaling cross-talks with host. In this study, we propose a computational framework to validate the biological assumption that human protein-protein interaction (PPI) networks alone are sufficient to infer pathogen-host PPIs via pathogen functional mimicry. Pathogen functional mimicry assumes that a pathogen functionally mimics and substitutes host counterpart proteins in order for the pathogen to get involved in or hijack the host cellular processes. Through pathogen functional mimicry defined via gene ontology (GO) semantic similarity, we first use the known human PPIs as templates to infer pathogen-host PPIs, and the PPIs are further used as training data to build an l2-regularized logistic regression model for novel pathogen-host PPI prediction. Independent tests on the experimental data from human immunodeficiency virus and Francisella tularensis validate the effectiveness of the proposed pathogen functional mimicry technique. Performance comparisons also show that the proposed technique y excels the existing pathogen sequence mimicry approaches and transfer learning methods. The proposed framework provides a new avenue to study the experimentally less-studied pathogens in the worst scenarios that very few or no experimental pathogen-host PPIs are available. As two case studies, we apply the proposed framework to Salmonella typhimurium and Human respiratory syncytial virus to reconstruct the pathogen-host PPI networks and further investigate the interference of these two pathogens with human immune signaling and transcription regulatory system.

5.
Biomolecules ; 9(11)2019 10 25.
Artículo en Inglés | MEDLINE | ID: mdl-31717703

RESUMEN

Understanding the physical arrangement of subunits within protein complexes potentially provides valuable clues about how the subunits work together and how the complexes function. The majority of recent research focuses on identifying protein complexes as a whole and seldom studies the inner structures within complexes. In this study, we propose a computational framework to predict direct contacts and substructures within protein complexes. In this framework, we first train a supervised learning model of l2-regularized logistic regression to learn the patterns of direct and indirect interactions within complexes, from where physical subunit interaction networks are predicted. Then, to infer substructures within complexes, we apply a graph clustering method (i.e., maximum modularity clustering (MMC)) and a gene ontology (GO) semantic similarity based functional clustering on partially- and fully-connected networks, respectively. Computational results show that the proposed framework achieves fairly good performance of cross validation and independent test in terms of detecting direct contacts between subunits. Functional analyses further demonstrate the rationality of partitioning the subunits into substructures via the MMC algorithm and functional clustering.


Asunto(s)
Modelos Teóricos , Mapas de Interacción de Proteínas , Algoritmos , Análisis por Conglomerados , Ontología de Genes , Humanos
6.
Int J Mol Sci ; 20(20)2019 Oct 12.
Artículo en Inglés | MEDLINE | ID: mdl-31614890

RESUMEN

Rapid reconstruction of genome-scale protein-protein interaction (PPI) networks is instrumental in understanding the cellular processes and disease pathogenesis and drug reactions. However, lack of experimentally verified negative data (i.e., pairs of proteins that do not interact) is still a major issue that needs to be properly addressed in computational modeling. In this study, we take advantage of the very limited experimentally verified negative data from Negatome to infer more negative data for computational modeling. We assume that the paralogs or orthologs of two non-interacting proteins also do not interact with high probability. We coin an assumption as "Neglog" this assumption is to some extent supported by paralogous/orthologous structure conservation. To reduce the risk of bias toward the negative data from Negatome, we combine Neglog with less biased random sampling according to a certain ratio to construct training data. L2-regularized logistic regression is used as the base classifier to counteract noise and train on a large dataset. Computational results show that the proposed Neglog method outperforms pure random sampling method with sound biological interpretability. In addition, we find that independent test on negative data is indispensable for bias control, which is usually neglected by existing studies. Lastly, we use the Neglog method to validate the PPIs in STRING, which are supported by gene ontology (GO) enrichment analyses.


Asunto(s)
Genómica/métodos , Mapeo de Interacción de Proteínas/métodos , Homología de Secuencia de Aminoácido , Programas Informáticos , Humanos , Aprendizaje Automático , Mapas de Interacción de Proteínas
7.
Pharmaceutics ; 11(9)2019 Sep 09.
Artículo en Inglés | MEDLINE | ID: mdl-31505805

RESUMEN

Drug repurposing plays an important role in screening old drugs for new therapeutic efficacy. The existing methods commonly treat prediction of drug-target interaction as a problem of binary classification, in which a large number of randomly sampled drug-target pairs accounting for over 50% of the entire training dataset are necessarily required. Such a large number of negative examples that do not come from experimental observations inevitably decrease the credibility of predictions. In this study, we propose a multi-label learning framework to find new uses for old drugs and discover new drugs for known target genes. In the framework, each drug is treated as a class label and its target genes are treated as the class-specific training data to train a supervised learning model of l2-regularized logistic regression. As such, the inter-drug associations are explicitly modelled into the framework and all the class-specific training data come from experimental observations. In addition, the data constraint is less demanding, for instance, the chemical substructures of a drug are no longer needed and the novel target genes are inferred only from the underlying patterns of the known genes targeted by the drug. Stratified multi-label cross-validation shows that 84.9% of known target genes have at least one drug correctly recognized, and the proposed framework correctly recognizes 86.73% of the independent test drug-target interactions (DTIs) from DrugBank. These results show that the proposed framework could generalize well in the large drug/class space without the information of drug chemical structures and target protein structures. Furthermore, we use the trained model to predict new drugs for the known target genes, identify new genes for the old drugs, and infer new associations between old drugs and new disease phenotypes via the OMIM database. Gene ontology (GO) enrichment analyses and the disease associations reported in recent literature provide supporting evidences to the computational results, which potentially shed light on new clinical therapies for new and/or old disease phenotypes.

8.
BMC Genomics ; 19(1): 505, 2018 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-29954330

RESUMEN

BACKGROUND: Bacterial invasive infection and host immune response is fundamental to the understanding of pathogen pathogenesis and the discovery of effective therapeutic drugs. However, there are very few experimental studies on the signaling cross-talks between bacteria and human host to date. METHODS: In this work, taking M. tuberculosis H37Rv (MTB) that is co-evolving with its human host as an example, we propose a general computational framework that exploits the known bacterial pathogen protein interaction networks in STRING database to predict pathogen-host protein interactions and their signaling cross-talks. In this framework, significant interlogs are derived from the known pathogen protein interaction networks to train a predictive l2-regularized logistic regression model. RESULTS: The computational results show that the proposed method achieves excellent performance of cross validation as well as low predicted positive rates on the less significant interlogs and non-interlogs, indicating a low risk of false discovery. We further conduct gene ontology (GO) and pathway enrichment analyses of the predicted pathogen-host protein interaction networks, which potentially provides insights into the machinery that M. tuberculosis H37Rv targets human genes and signaling pathways. In addition, we analyse the pathogen-host protein interactions related to drug resistance, inhibition of which potentially provides an alternative solution to M. tuberculosis H37Rv drug resistance. CONCLUSIONS: The proposed machine learning framework has been verified effective for predicting bacteria-host protein interactions via known bacterial protein interaction networks. For a vast majority of bacterial pathogens that lacks experimental studies of bacteria-host protein interactions, this framework is supposed to achieve a general-purpose applicability. The predicted protein interaction networks between M. tuberculosis H37Rv and Homo sapiens, provided in the Additional files, promise to gain applications in the two fields: (1) providing an alternative solution to drug resistance; (2) revealing the patterns that M. tuberculosis H37Rv genes target human immune signaling pathways.


Asunto(s)
Proteínas Bacterianas/metabolismo , Interacciones Huésped-Patógeno/genética , Mycobacterium tuberculosis/metabolismo , Mapas de Interacción de Proteínas/genética , Transducción de Señal/genética , Tuberculosis/genética , Área Bajo la Curva , Bases de Datos Genéticas , Farmacorresistencia Bacteriana/genética , Ontología de Genes , Humanos , Sistema Inmunológico/metabolismo , Sistema Inmunológico/microbiología , Modelos Logísticos , Curva ROC , Tuberculosis/inmunología , Tuberculosis/microbiología , Tuberculosis/patología
9.
J Proteome Res ; 17(5): 1749-1760, 2018 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-29611419

RESUMEN

Bacterial protein-protein interaction (PPI) networks are significant to reveal the machinery of signal transduction and drug resistance within bacterial cells. The database STRING has collected a large number of bacterial pathogen PPI networks, but most of the data are of low quality without being experimentally or computationally validated, thus restricting its further biomedical applications. We exploit the experimental data via four solutions to enhance the quality of M. tuberculosis H37Rv (MTB) PPI networks in STRING. Computational results show that the experimental data derived jointly by two-hybrid and copurification approaches are the most reliable to train an L2-regularized logistic regression model for MTB PPI network validation. On the basis of the validated MTB PPI networks, we further study the three problems via breadth-first graph search algorithm: (1) discovery of MTB drug-resistance pathways through searching for the paths between known drug-target genes and drug-resistance genes, (2) choosing potential cotarget genes via searching for the critical genes located on multiple pathways, and (3) choosing essential drug-target genes via analysis of network degree distribution. In addition, we further combine the validated MTB PPI networks with human PPI networks to analyze the potential pharmacological risks of known and candidate drug-target genes from the point of view of system pharmacology. The evidence from protein structure alignment demonstrates that the drugs that act on MTB target genes could also adversely act on human signaling pathways.


Asunto(s)
Proteínas Bacterianas/metabolismo , Simulación por Computador , Resistencia a Medicamentos , Mycobacterium tuberculosis/química , Mapas de Interacción de Proteínas , Algoritmos , Humanos , Modelos Logísticos , Riesgo , Transducción de Señal/efectos de los fármacos
10.
Integr Biol (Camb) ; 9(7): 595-606, 2017 07 17.
Artículo en Inglés | MEDLINE | ID: mdl-28524201

RESUMEN

Recognition of indirect interactions is instrumental to in silico reconstruction of signaling pathways and sheds light on the exploration of unknown physical paths between two indirectly interacting genes. However, very limited computational methods have explicitly exploited the indirect interactions with experimental evidence thus far. In this work, we attempt to distinguish direct versus indirect interactions in human functional protein-protein interaction (PPI) networks via a predictive l2-regularized logistic regression model built on the experimental data. The l2-regularized logistic regression method is adopted to counteract the potential homolog noise and reduce the computational complexity on large training data. Computational results show that the proposed model demonstrates promising performance even though the training data are highly skewed. From the 304 799 PPIs that are curated in several databases, the proposed method detects 23 131 indirect interactions, most of which have been verified by the breadth-first graph search algorithm to find dozens of physical paths between the interacting partners. Pathway enrichment analysis shows that most of the physical paths can be mapped onto more than one human signaling pathway, indicating that there do exist a series of biochemical signals between the two indirectly interacting genes. The interactome-scale computational results promise to provide useful cues to the following applications: (1) exploration of unknown physical PPIs or physical paths between two indirectly interacting genes; (2) amending or extending the existing signaling pathways; (3) recognition of the physical PPIs for druggable target discovery.


Asunto(s)
Mapas de Interacción de Proteínas , Algoritmos , Biología Computacional , Simulación por Computador , Bases de Datos de Proteínas , Ontología de Genes , Humanos , Modelos Logísticos , Modelos Biológicos , Mapas de Interacción de Proteínas/genética , Transducción de Señal
11.
Sci Rep ; 6: 36453, 2016 11 07.
Artículo en Inglés | MEDLINE | ID: mdl-27819359

RESUMEN

Protein-protein interaction (PPI) networks are naturally viewed as infrastructure to infer signalling pathways. The descriptors of signal events between two interacting proteins such as upstream/downstream signal flow, activation/inhibition relationship and protein modification are indispensable for inferring signalling pathways from PPI networks. However, such descriptors are not available in most cases as most PPI networks are seldom semantically annotated. In this work, we extend ℓ2-regularized logistic regression to the scenario of multi-label learning for predicting the activation/inhibition relationships in human PPI networks. The phenomenon that both activation and inhibition relationships exist between two interacting proteins is computationally modelled by multi-label learning framework. The problem of GO (gene ontology) sparsity is tackled by introducing the homolog knowledge as independent homolog instances. ℓ2-regularized logistic regression is accordingly adopted here to penalize the homolog noise and to reduce the computational complexity of the double-sized training data. Computational results show that the proposed method achieves satisfactory multi-label learning performance and outperforms the existing phenotype correlation method on the experimental data of Drosophila melanogaster. Several predictions have been validated against recent literature. The predicted activation/inhibition relationships in human PPI networks are provided in the supplementary file for further biomedical research.


Asunto(s)
Biología Computacional/métodos , Proteínas/metabolismo , Algoritmos , Factor Neurotrófico Derivado del Encéfalo/química , Factor Neurotrófico Derivado del Encéfalo/metabolismo , Bases de Datos de Proteínas , Ontología de Genes , Humanos , Interleucinas/química , Interleucinas/metabolismo , Modelos Logísticos , Mapas de Interacción de Proteínas , Proteínas/química , Receptores Androgénicos/química , Receptores Androgénicos/metabolismo , Transducción de Señal
12.
Sci Rep ; 6: 30612, 2016 07 29.
Artículo en Inglés | MEDLINE | ID: mdl-27470517

RESUMEN

Epstein-Barr virus (EBV) plays important roles in the origin and the progression of human carcinomas, e.g. diffuse large B cell tumors, T cell lymphomas, etc. Discovering EBV targeted human genes and signaling pathways is vital to understand EBV tumorigenesis. In this study we propose a noise-tolerant homolog knowledge transfer method to reconstruct functional protein-protein interactions (PPI) networks between Epstein-Barr virus and Homo sapiens. The training set is augmented via homolog instances and the homolog noise is counteracted by support vector machine (SVM). Additionally we propose two methods to define subcellular co-localization (i.e. stringent and relaxed), based on which to further derive physical PPI networks. Computational results show that the proposed method achieves sound performance of cross validation and independent test. In the space of 648,672 EBV-human protein pairs, we obtain 51,485 functional interactions (7.94%), 869 stringent physical PPIs and 46,050 relaxed physical PPIs. Fifty-eight evidences are found from the latest database and recent literature to validate the model. This study reveals that Epstein-Barr virus interferes with normal human cell life, such as cholesterol homeostasis, blood coagulation, EGFR binding, p53 binding, Notch signaling, Hedgehog signaling, etc. The proteome-wide predictions are provided in the supplementary file for further biomedical research.


Asunto(s)
Simulación por Computador , Bases de Datos de Ácidos Nucleicos , Infecciones por Virus de Epstein-Barr , Herpesvirus Humano 4 , Transducción de Señal/genética , Infecciones por Virus de Epstein-Barr/genética , Infecciones por Virus de Epstein-Barr/metabolismo , Herpesvirus Humano 4/genética , Herpesvirus Humano 4/metabolismo , Humanos
13.
Sci Rep ; 5: 17983, 2015 Dec 09.
Artículo en Inglés | MEDLINE | ID: mdl-26648121

RESUMEN

Signaling pathways play important roles in understanding the underlying mechanism of cell growth, cell apoptosis, organismal development and pathways-aberrant diseases. Protein-protein interaction (PPI) networks are commonly-used infrastructure to infer signaling pathways. However, PPI networks generally carry no information of upstream/downstream relationship between interacting proteins, which retards our inferring the signal flow of signaling pathways. In this work, we propose a simple feature construction method to train a SVM (support vector machine) classifier to predict PPI upstream/downstream relations. The domain based asymmetric feature representation naturally embodies domain-domain upstream/downstream relations, providing an unconventional avenue to predict the directionality between two objects. Moreover, we propose a semantically interpretable decision function and a macro bag-level performance metric to satisfy the need of two-instance depiction of an interacting protein pair. Experimental results show that the proposed method achieves satisfactory cross validation performance and independent test performance. Lastly, we use the trained model to predict the PPIs in HPRD, Reactome and IntAct. Some predictions have been validated against recent literature.


Asunto(s)
Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Mapas de Interacción de Proteínas , Algoritmos , Bases de Datos de Proteínas , Receptores ErbB , Humanos , Modelos Biológicos , Curva ROC , Reproducibilidad de los Resultados , Transducción de Señal , Factores de Crecimiento Transformadores/metabolismo
14.
Sci Rep ; 5: 8034, 2015 Jan 26.
Artículo en Inglés | MEDLINE | ID: mdl-25620466

RESUMEN

Protein-protein interaction (PPI) prediction is generally treated as a problem of binary classification wherein negative data sampling is still an open problem to be addressed. The commonly used random sampling is prone to yield less representative negative data with considerable false negatives. Meanwhile rational constraints are seldom exerted on model selection to reduce the risk of false positive predictions for most of the existing computational methods. In this work, we propose a novel negative data sampling method based on one-class SVM (support vector machine, SVM) to predict proteome-wide protein interactions between HTLV retrovirus and Homo sapiens, wherein one-class SVM is used to choose reliable and representative negative data, and two-class SVM is used to yield proteome-wide outcomes as predictive feedback for rational model selection. Computational results suggest that one-class SVM is more suited to be used as negative data sampling method than two-class PPI predictor, and the predictive feedback constrained model selection helps to yield a rational predictive model that reduces the risk of false positive predictions. Some predictions have been validated by the recent literature. Lastly, gene ontology based clustering of the predicted PPI networks is conducted to provide valuable cues for the pathogenesis of HTLV retrovirus.


Asunto(s)
Deltaretrovirus/genética , Mapas de Interacción de Proteínas/genética , Proteínas/genética , Proteoma/genética , Algoritmos , Biología Computacional , Bases de Datos de Proteínas , Ontología de Genes , Humanos , Máquina de Vectores de Soporte
15.
BMC Bioinformatics ; 16: 417, 2015 Dec 30.
Artículo en Inglés | MEDLINE | ID: mdl-26718335

RESUMEN

BACKGROUND: Signaling pathways play important roles in the life processes of cell growth, cell apoptosis and organism development. At present the signal transduction networks are far from complete. As an effective complement to experimental methods, computational modeling is suited to rapidly reconstruct the signaling pathways at low cost. To our knowledge, the existing computational methods seldom simultaneously exploit more than three signaling pathways into one predictive model for the discovery of novel signaling components and the cross-talk modeling between signaling pathways. RESULTS: In this work, we propose a multi-label multi-instance transfer learning method to simultaneously reconstruct 27 human signaling pathways and model their cross-talks. Computational results show that the proposed method demonstrates satisfactory multi-label learning performance and rational proteome-wide predictions. Some predicted signaling components or pathway targeted proteins have been validated by recent literature. The predicted signaling components are further linked to pathways using the experimentally derived PPIs (protein-protein interactions) to reconstruct the human signaling pathways. Thus the map of the cross-talks via common signaling components and common signaling PPIs is conveniently inferred to provide valuable insights into the regulatory and cooperative relationships between signaling pathways. Lastly, gene ontology enrichment analysis is conducted to gain statistical knowledge about the reconstructed human signaling pathways. CONCLUSIONS: Multi-label learning framework has been demonstrated effective in this work to model the phenomena that a signaling protein belongs to more than one signaling pathway. As results, novel signaling components and pathways targeted proteins are predicted to simultaneously reconstruct multiple human signaling pathways and the static map of their cross-talks for further biomedical research.


Asunto(s)
Modelos Biológicos , Transducción de Señal , Bases de Datos de Proteínas , Ontología de Genes , Humanos , Mapeo de Interacción de Proteínas , Transporte de Proteínas , Reproducibilidad de los Resultados , Coloración y Etiquetado
16.
PLoS One ; 9(10): e110488, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25330226

RESUMEN

Pathogen-host protein-protein interaction (PPI) plays an important role in revealing the underlying pathogenesis of viruses and bacteria. The need of rapidly mapping proteome-wide pathogen-host interactome opens avenues for and imposes burdens on computational modeling. For Salmonella typhimurium, only 62 interactions with human proteins are reported to date, and the computational modeling based on such a small training data is prone to yield model overfitting. In this work, we propose a multi-instance transfer learning method to reconstruct the proteome-wide Salmonella-human PPI networks, wherein the training data is augmented by homolog knowledge transfer in the form of independent homolog instances. We use AdaBoost instance reweighting to counteract the noise from homolog instances, and deliberately design three experimental settings to validate the assumption that the homolog instances are effective to address the problems of data scarcity and data unavailability. The experimental results show that the proposed method outperforms the existing models and some predictions are validated by the findings from recent literature. Lastly, we conduct gene ontology based clustering analysis of the predicted networks to provide insights into the pathogenesis of Salmonella.


Asunto(s)
Interacciones Huésped-Patógeno/genética , Mapas de Interacción de Proteínas/genética , Infecciones por Salmonella/genética , Salmonella typhimurium/genética , Bases de Datos de Proteínas , Ontología de Genes , Humanos , Proteoma/genética , Infecciones por Salmonella/microbiología , Salmonella typhimurium/patogenicidad
17.
BMC Bioinformatics ; 15: 245, 2014 Jul 18.
Artículo en Inglés | MEDLINE | ID: mdl-25037487

RESUMEN

BACKGROUND: Human T-cell leukemia viruses (HTLV) tend to induce some fatal human diseases like Adult T-cell Leukemia (ATL) by targeting human T lymphocytes. To indentify the protein-protein interactions (PPI) between HTLV viruses and Homo sapiens is one of the significant approaches to reveal the underlying mechanism of HTLV infection and host defence. At present, as biological experiments are labor-intensive and expensive, the identified part of the HTLV-human PPI networks is rather small. Although recent years have witnessed much progress in computational modeling for reconstructing pathogen-host PPI networks, data scarcity and data unavailability are two major challenges to be effectively addressed. To our knowledge, no computational method for proteome-wide HTLV-human PPI networks reconstruction has been reported. RESULTS: In this work we develop Multi-instance Adaboost method to conduct homolog knowledge transfer for computationally reconstructing proteome-wide HTLV-human PPI networks. In this method, the homolog knowledge in the form of gene ontology (GO) is treated as auxiliary homolog instance to address the problems of data scarcity and data unavailability, while the potential negative knowledge transfer is automatically attenuated by AdaBoost instance reweighting. The cross validation experiments show that the homolog knowledge transfer in the form of independent homolog instances can effectively enrich the feature information and substitute for the missing GO information. Moreover, the independent tests show that the method can validate 70.3% of the recently curated interactions, significantly exceeding the 2.1% recognition rate by the HT-Y2H experiment. We have used the method to reconstruct the proteome-wide HTLV-human PPI networks and further conducted gene ontology based clustering of the predicted networks for further biomedical research. The gene ontology based clustering analysis of the predictions provides much biological insight into the pathogenesis of HTLV retroviruses. CONCLUSIONS: The Multi-instance AdaBoost method can effectively address the problems of data scarcity and data unavailability for the proteome-wide HTLV-human PPI interaction networks reconstruction. The gene ontology based clustering analysis of the predictions reveals some important signaling pathways and biological modules that HTLV retroviruses are likely to target.


Asunto(s)
Deltaretrovirus/genética , Mapeo de Interacción de Proteínas/métodos , Proteómica/métodos , Análisis por Conglomerados , Ontología de Genes , Interacciones Huésped-Patógeno , Humanos
18.
J Theor Biol ; 340: 105-10, 2014 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-24050851

RESUMEN

Membrane proteins play important roles in molecular trans-membrane transport, ligand-receptor recognition, cell-cell interaction, enzyme catalysis, host immune defense response and infectious disease pathways. Up to present, discriminating membrane proteins remains a challenging problem from the viewpoints of biological experimental determination and computational modeling. This work presents SVM ensemble based transfer learning model for membrane proteins discrimination (SVM-TLM). To reduce the data constraints on computational modeling, this method investigates the effectiveness of transferring the homolog knowledge to the target membrane proteins under the framework of probability weighted ensemble learning. As compared to multiple kernel learning based transfer learning model, the method takes the advantages of sparseness based SVM optimization on large data, thus more computationally efficient for large protein data analysis. The experiments on large membrane protein benchmark dataset show that SVM-TLM achieves significantly better cross validation performance than the baseline model.


Asunto(s)
Biología Computacional/métodos , Proteínas de la Membrana/química , Máquina de Vectores de Soporte , Algoritmos , Comunicación Celular , Membrana Celular/metabolismo , Simulación por Computador , Bases de Datos de Proteínas , Ligandos , Modelos Teóricos , Distribución Normal , Reproducibilidad de los Resultados , Programas Informáticos , Factores de Tiempo
19.
PLoS One ; 8(11): e79606, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24260261

RESUMEN

Reconstruction of host-pathogen protein interaction networks is of great significance to reveal the underlying microbic pathogenesis. However, the current experimentally-derived networks are generally small and should be augmented by computational methods for less-biased biological inference. From the point of view of computational modelling, data scarcity, data unavailability and negative data sampling are the three major problems for host-pathogen protein interaction networks reconstruction. In this work, we are motivated to address the three concerns and propose a probability weighted ensemble transfer learning model for HIV-human protein interaction prediction (PWEN-TLM), where support vector machine (SVM) is adopted as the individual classifier of the ensemble model. In the model, data scarcity and data unavailability are tackled by homolog knowledge transfer. The importance of homolog knowledge is measured by the ROC-AUC metric of the individual classifiers, whose outputs are probability weighted to yield the final decision. In addition, we further validate the assumption that only the homolog knowledge is sufficient to train a satisfactory model for host-pathogen protein interaction prediction. Thus the model is more robust against data unavailability with less demanding data constraint. As regards with negative data construction, experiments show that exclusiveness of subcellular co-localized proteins is unbiased and more reliable than random sampling. Last, we conduct analysis of overlapped predictions between our model and the existing models, and apply the model to novel host-pathogen PPIs recognition for further biological research.


Asunto(s)
VIH-1/metabolismo , Proteínas/metabolismo , Bases de Datos de Proteínas , Humanos , Modelos Teóricos , Mapeo de Interacción de Proteínas , Máquina de Vectores de Soporte
20.
J Theor Biol ; 310: 80-7, 2012 Oct 07.
Artículo en Inglés | MEDLINE | ID: mdl-22750634

RESUMEN

Recent years have witnessed much progress in computational modeling for protein subcellular localization. However, there are far few computational models for predicting plant protein subcellular multi-localization. In this paper, we propose a multi-label multi-kernel transfer learning model for predicting multiple subcellular locations of plant proteins (MLMK-TLM). The method proposes a multi-label confusion matrix and adapts one-against-all multi-class probabilistic outputs to multi-label learning scenario, based on which we further extend our published work MK-TLM (multi-kernel transfer learning based on Chou's PseAAC formulation for protein submitochondria localization) for plant protein subcellular multi-localization. By proper homolog knowledge transfer, MLMK-TLM is applicable to novel plant protein subcellular localization in multi-label learning scenario. The experiments on plant protein benchmark dataset show that MLMK-TLM outperforms the baseline model. Unlike the existing models, MLMK-TLM also reports its misleading tendency, which is important for comprehensive survey of model's multi-labeling performance.


Asunto(s)
Inteligencia Artificial , Biología Computacional/métodos , Proteínas de Plantas/metabolismo , Homología de Secuencia de Aminoácido , Programas Informáticos , Bases de Datos de Proteínas , Transporte de Proteínas , Fracciones Subcelulares/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA