Búsqueda | Biblioteca Virtual en Salud Odontología. Uruguay

1.

N-GlycoPred: A hybrid deep learning model for accurate identification of N-glycosylation sites.

Hu, Fengzhu; Gao, Jie; Zheng, Jia; Kwoh, Cheekeong; Jia, Cangzhi.

Methods ; 227: 48-57, 2024 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-38734394

RESUMEN

Studies have shown that protein glycosylation in cells reflects the real-time dynamics of biological processes, and the occurrence and development of many diseases are closely related to protein glycosylation. Abnormal protein glycosylation can be used as a potential diagnostic and prognostic marker of a disease, as well as a therapeutic target and a new breakthrough point for exploring pathogenesis. To address the issue of significant differences in the prediction results of previous models for different species, we constructed a hybrid deep learning model N-GlycoPred on the basis of dual-layer convolution, a paired attention mechanism and BiLSTM for accurate identification of N-glycosylation sites. By adopting one-hot encoding or the AAindex, we specifically selected the optimum combination of features and deep learning frameworks for human and mouse to refine the models. Based on six independent test datasets, our N-GlycoPred model achieved an average AUC of 0.9553, which is 0.23% higher than MusiteDeep. The comparison results indicate that our model can serve as a powerful tool for N-glycosylation site prescreening for biological researchers.

Asunto(s)

Aprendizaje Profundo , Glicosilación , Humanos , Animales , Ratones

2.

Heterogeneous information network and its application to human health and disease.

Ding, Pingjian; Ouyang, Wenjue; Luo, Jiawei; Kwoh, Chee-Keong.

Brief Bioinform ; 21(4): 1327-1346, 2020 07 15.

Artículo en Inglés | MEDLINE | ID: mdl-31566212

RESUMEN

The molecular components with the functional interdependencies in human cell form complicated biological network. Diseases are mostly caused by the perturbations of the composite of the interaction multi-biomolecules, rather than an abnormality of a single biomolecule. Furthermore, new biological functions and processes could be revealed by discovering novel biological entity relationships. Hence, more and more biologists focus on studying the complex biological system instead of the individual biological components. The emergence of heterogeneous information network (HIN) offers a promising way to systematically explore complicated and heterogeneous relationships between various molecules for apparently distinct phenotypes. In this review, we first present the basic definition of HIN and the biological system considered as a complex HIN. Then, we discuss the topological properties of HIN and how these can be applied to detect network motif and functional module. Afterwards, methodologies of discovering relationships between disease and biomolecule are presented. Useful insights on how HIN aids in drug development and explores human interactome are provided. Finally, we analyze the challenges and opportunities for uncovering combinatorial patterns among pharmacogenomics and cell-type detection based on single-cell genomic data.

Asunto(s)

Biología Computacional/métodos , Servicios de Información/organización & administración , Desarrollo de Medicamentos , Predisposición Genética a la Enfermedad , Humanos , MicroARNs/genética , ARN Largo no Codificante/genética

3.

Computational prediction of drug-target interactions using chemogenomic approaches: an empirical survey.

Ezzat, Ali; Wu, Min; Li, Xiao-Li; Kwoh, Chee-Keong.

Brief Bioinform ; 20(4): 1337-1357, 2019 07 19.

Artículo en Inglés | MEDLINE | ID: mdl-29377981

RESUMEN

Computational prediction of drug-target interactions (DTIs) has become an essential task in the drug discovery process. It narrows down the search space for interactions by suggesting potential interaction candidates for validation via wet-lab experiments that are well known to be expensive and time-consuming. In this article, we aim to provide a comprehensive overview and empirical evaluation on the computational DTI prediction techniques, to act as a guide and reference for our fellow researchers. Specifically, we first describe the data used in such computational DTI prediction efforts. We then categorize and elaborate the state-of-the-art methods for predicting DTIs. Next, an empirical comparison is performed to demonstrate the prediction performance of some representative methods under different scenarios. We also present interesting findings from our evaluation study, discussing the advantages and disadvantages of each method. Finally, we highlight potential avenues for further enhancement of DTI prediction performance as well as related research directions.

Asunto(s)

Desarrollo de Medicamentos/métodos , Descubrimiento de Drogas/métodos , Teorema de Bayes , Quimioinformática , Biología Computacional , Simulación por Computador , Árboles de Decisión , Desarrollo de Medicamentos/estadística & datos numéricos , Descubrimiento de Drogas/estadística & datos numéricos , Interacciones Farmacológicas , Reposicionamiento de Medicamentos/métodos , Reposicionamiento de Medicamentos/estadística & datos numéricos , Lógica Difusa , Humanos , Análisis de los Mínimos Cuadrados , Aprendizaje Automático , Modelos Estadísticos , Pruebas de Farmacogenómica/métodos , Pruebas de Farmacogenómica/estadística & datos numéricos , Máquina de Vectores de Soporte , Encuestas y Cuestionarios

4.

An improved random forest-based computational model for predicting novel miRNA-disease associations.

Yao, Dengju; Zhan, Xiaojuan; Kwoh, Chee-Keong.

BMC Bioinformatics ; 20(1): 624, 2019 Dec 03.

Artículo en Inglés | MEDLINE | ID: mdl-31795954

RESUMEN

BACKGROUND: A large body of evidence shows that miRNA regulates the expression of its target genes at post-transcriptional level and the dysregulation of miRNA is related to many complex human diseases. Accurately discovering disease-related miRNAs is conductive to the exploring of the pathogenesis and treatment of diseases. However, because of the limitation of time-consuming and expensive experimental methods, predicting miRNA-disease associations by computational models has become a more economical and effective mean. RESULTS: Inspired by the work of predecessors, we proposed an improved computational model based on random forest (RF) for identifying miRNA-disease associations (IRFMDA). First, the integrated similarity of diseases and the integrated similarity of miRNAs were calculated by combining the semantic similarity and Gaussian interaction profile kernel (GIPK) similarity of diseases, the functional similarity and GIPK similarity of miRNAs, respectively. Then, the integrated similarity of diseases and the integrated similarity of miRNAs were combined to represent each miRNA-disease relationship pair. Next, the miRNA-disease relationship pairs contained in the HMDD (v2.0) database were considered positive samples, and the randomly constructed miRNA-disease relationship pairs not included in HMDD (v2.0) were considered negative samples. Next, the feature selection based on the variable importance score of RF was performed to choose more useful features to represent samples to optimize the model's ability of inferring miRNA-disease associations. Finally, a RF regression model was trained on reduced sample space to score the unknown miRNA-disease associations. The AUCs of IRFMDA under local leave-one-out cross-validation (LOOCV), global LOOCV and 5-fold cross-validation achieved 0.8728, 0.9398 and 0.9363, which were better than several excellent models for predicting miRNA-disease associations. Moreover, case studies on oesophageal cancer, lymphoma and lung cancer showed that 94 (oesophageal cancer), 98 (lymphoma) and 100 (lung cancer) of the top 100 disease-associated miRNAs predicted by IRFMDA were supported by the experimental data in the dbDEMC (v2.0) database. CONCLUSIONS: Cross-validation and case studies demonstrated that IRFMDA is an excellent miRNA-disease association prediction model, and can provide guidance and help for experimental studies on the regulatory mechanism of miRNAs in complex human diseases in the future.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Simulación por Computador , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , MicroARNs/genética , Área Bajo la Curva , Humanos , MicroARNs/metabolismo , Neoplasias/genética , Factores de Riesgo

5.

A context-free encoding scheme of protein sequences for predicting antigenicity of diverse influenza A viruses.

Zhou, Xinrui; Yin, Rui; Kwoh, Chee-Keong; Zheng, Jie.

BMC Genomics ; 19(Suppl 10): 936, 2018 Dec 31.

Artículo en Inglés | MEDLINE | ID: mdl-30598102

RESUMEN

BACKGROUND: The evolution of influenza A viruses leads to the antigenic changes. Serological diagnosis of the antigenicity is usually labor-intensive, time-consuming and not suitable for early-stage detection. Computational prediction of the antigenic relationship between emerging and old strains of influenza viruses using viral sequences can facilitate large-scale antigenic characterization, especially for those viruses requiring high biosafety facilities, such as H5 and H7 influenza A viruses. However, most computational models require carefully designed subtype-specific features, thereby being restricted to only one subtype. METHODS: In this paper, we propose a Context-FreeEncoding Scheme (CFreeEnS) for pairs of protein sequences, which encodes a protein sequence dataset into a numeric matrix and then feeds the matrix into a downstream machine learning model. CFreeEnS is not only free from subtype-specific selected features but also able to improve the accuracy of predicting the antigenicity of influenza. Since CFreeEnS is subtype-free, it is applicable to predicting the antigenicity of diverse influenza subtypes, hopefully saving the biologists from conducting serological assays for highly pathogenic strains. RESULTS: The accuracy of prediction on each subtype tested (A/H1N1, A/H3N2, A/H5N1, A/H9N2) is over 85%, and can be as high as 91.5%. This outperforms existing methods that use carefully designed subtype-specific features. Furthermore, we tested the CFreeEnS on the combined dataset of the four subtypes. The accuracy reaches 84.6%, much higher than the best performance 75.1% reported by other subtype-free models, i.e. regional band-based model and residue-based model, for predicting the antigenicity of influenza. Also, we investigate the performance of CFreeEnS when the model is trained and tested on different subtypes (i.e. transfer learning). The prediction accuracy using CFreeEnS is 84.3% when the model is trained on the A/H1N1 dataset and tested on the A/H5N1, better than the 75.2% using a regional band-based model. CONCLUSIONS: The CFreeEnS not only improves the prediction of antigenicity on datasets with only one subtype but also outperforms existing methods when tested on a combined dataset with four subtypes of influenza viruses.

Asunto(s)

Antígenos Virales/inmunología , Biología Computacional/métodos , Virus de la Influenza A/inmunología , Proteínas Virales/química , Subtipo H1N1 del Virus de la Influenza A/inmunología , Subtipo H3N2 del Virus de la Influenza A/inmunología , Subtipo H5N1 del Virus de la Influenza A/inmunología , Subtipo H9N2 del Virus de la Influenza A/inmunología , Proteínas Virales/metabolismo

6.

Computational analysis of the receptor binding specificity of novel influenza A/H7N9 viruses.

Zhou, Xinrui; Zheng, Jie; Ivan, Fransiskus Xaverius; Yin, Rui; Ranganathan, Shoba; Chow, Vincent T K; Kwoh, Chee-Keong.

BMC Genomics ; 19(Suppl 2): 88, 2018 May 09.

Artículo en Inglés | MEDLINE | ID: mdl-29764421

RESUMEN

BACKGROUND: Influenza viruses are undergoing continuous and rapid evolution. The fatal influenza A/H7N9 has drawn attention since the first wave of infections in March 2013, and raised more grave concerns with its increased potential to spread among humans. Experimental studies have revealed several host and virulence markers, indicating differential host binding preferences which can help estimate the potential of causing a pandemic. Here we systematically investigate the sequence pattern and structural characteristics of novel influenza A/H7N9 using computational approaches. RESULTS: The sequence analysis highlighted mutations in protein functional domains of influenza viruses. Molecular docking and molecular dynamics simulation revealed that the hemagglutinin (HA) of A/Taiwan/1/2017(H7N9) strain enhanced the binding with both avian and human receptor analogs, compared with the previous A/Shanghai/02/2013(H7N9) strain. The Molecular Mechanics - Poisson Boltzmann Surface Area (MM-PBSA) calculation revealed the change of residue-ligand interaction energy and detected the residues with conspicuous binding preference. CONCLUSION: The results are novel and specific to the emerging influenza A/Taiwan/1/2017(H7N9) strain compared with A/Shanghai/02/2013(H7N9). Its enhanced ability to bind human receptor analogs, which are abundant in the human upper respiratory tract, may be responsible for the recent outbreak. Residues showing binding preference were detected, which could facilitate monitoring the circulating influenza viruses.

Asunto(s)

Biología Computacional/métodos , Glicoproteínas Hemaglutininas del Virus de la Influenza/química , Glicoproteínas Hemaglutininas del Virus de la Influenza/genética , Subtipo H7N9 del Virus de la Influenza A/fisiología , Mutación , Animales , Proteínas Aviares/metabolismo , Aves , Glicoproteínas Hemaglutininas del Virus de la Influenza/metabolismo , Interacciones Microbiota-Huesped , Humanos , Subtipo H7N9 del Virus de la Influenza A/clasificación , Subtipo H7N9 del Virus de la Influenza A/genética , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Filogenia , Unión Proteica , Análisis de Secuencia de ARN/métodos , Proteínas Virales/química , Proteínas Virales/genética

7.

Drug-target interaction prediction using ensemble learning and dimensionality reduction.

Ezzat, Ali; Wu, Min; Li, Xiao-Li; Kwoh, Chee-Keong.

Methods ; 129: 81-88, 2017 10 01.

Artículo en Inglés | MEDLINE | ID: mdl-28549952

RESUMEN

Experimental prediction of drug-target interactions is expensive, time-consuming and tedious. Fortunately, computational methods help narrow down the search space for interaction candidates to be further examined via wet-lab techniques. Nowadays, the number of attributes/features for drugs and targets, as well as the amount of their interactions, are increasing, making these computational methods inefficient or occasionally prohibitive. This motivates us to derive a reduced feature set for prediction. In addition, since ensemble learning techniques are widely used to improve the classification performance, it is also worthwhile to design an ensemble learning framework to enhance the performance for drug-target interaction prediction. In this paper, we propose a framework for drug-target interaction prediction leveraging both feature dimensionality reduction and ensemble learning. First, we conducted feature subspacing to inject diversity into the classifier ensemble. Second, we applied three different dimensionality reduction methods to the subspaced features. Third, we trained homogeneous base learners with the reduced features and then aggregated their scores to derive the final predictions. For base learners, we selected two classifiers, namely Decision Tree and Kernel Ridge Regression, resulting in two variants of ensemble models, EnsemDT and EnsemKRR, respectively. In our experiments, we utilized AUC (Area under ROC Curve) as an evaluation metric. We compared our proposed methods with various state-of-the-art methods under 5-fold cross validation. Experimental results showed EnsemKRR achieving the highest AUC (94.3%) for predicting drug-target interactions. In addition, dimensionality reduction helped improve the performance of EnsemDT. In conclusion, our proposed methods produced significant improvements for drug-target interaction prediction.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Sistemas de Liberación de Medicamentos , Inteligencia Artificial , Humanos , Curva ROC

8.

The effect of genotype and in utero environment on interindividual variation in neonate DNA methylomes.

Teh, Ai Ling; Pan, Hong; Chen, Li; Ong, Mei-Lyn; Dogra, Shaillay; Wong, Johnny; MacIsaac, Julia L; Mah, Sarah M; McEwen, Lisa M; Saw, Seang-Mei; Godfrey, Keith M; Chong, Yap-Seng; Kwek, Kenneth; Kwoh, Chee-Keong; Soh, Shu-E; Chong, Mary F F; Barton, Sheila; Karnani, Neerja; Cheong, Clara Y; Buschdorf, Jan Paul; Stünkel, Walter; Kobor, Michael S; Meaney, Michael J; Gluckman, Peter D; Holbrook, Joanna D.

Genome Res ; 24(7): 1064-74, 2014 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-24709820

RESUMEN

Integrating the genotype with epigenetic marks holds the promise of better understanding the biology that underlies the complex interactions of inherited and environmental components that define the developmental origins of a range of disorders. The quality of the in utero environment significantly influences health over the lifecourse. Epigenetics, and in particular DNA methylation marks, have been postulated as a mechanism for the enduring effects of the prenatal environment. Accordingly, neonate methylomes contain molecular memory of the individual in utero experience. However, interindividual variation in methylation can also be a consequence of DNA sequence polymorphisms that result in methylation quantitative trait loci (methQTLs) and, potentially, the interaction between fixed genetic variation and environmental influences. We surveyed the genotypes and DNA methylomes of 237 neonates and found 1423 punctuate regions of the methylome that were highly variable across individuals, termed variably methylated regions (VMRs), against a backdrop of homogeneity. MethQTLs were readily detected in neonatal methylomes, and genotype alone best explained â¼25% of the VMRs. We found that the best explanation for 75% of VMRs was the interaction of genotype with different in utero environments, including maternal smoking, maternal depression, maternal BMI, infant birth weight, gestational age, and birth order. Our study sheds new light on the complex relationship between biological inheritance as represented by genotype and individual prenatal experience and suggests the importance of considering both fixed genetic variation and environmental factors in interpreting epigenetic variation.

Asunto(s)

Metilación de ADN , Ambiente , Epigénesis Genética , Interacción Gen-Ambiente , Heterogeneidad Genética , Genotipo , Transcriptoma , Biología Computacional/métodos , Islas de CpG , Epigenómica/métodos , Femenino , Humanos , Recién Nacido , Masculino , Polimorfismo de Nucleótido Simple , Embarazo , Sitios de Carácter Cuantitativo , Factores de Riesgo

9.

Drug-target interaction prediction via class imbalance-aware ensemble learning.

Ezzat, Ali; Wu, Min; Li, Xiao-Li; Kwoh, Chee-Keong.

BMC Bioinformatics ; 17(Suppl 19): 509, 2016 Dec 22.

Artículo en Inglés | MEDLINE | ID: mdl-28155697

RESUMEN

BACKGROUND: Multiple computational methods for predicting drug-target interactions have been developed to facilitate the drug discovery process. These methods use available data on known drug-target interactions to train classifiers with the purpose of predicting new undiscovered interactions. However, a key challenge regarding this data that has not yet been addressed by these methods, namely class imbalance, is potentially degrading the prediction performance. Class imbalance can be divided into two sub-problems. Firstly, the number of known interacting drug-target pairs is much smaller than that of non-interacting drug-target pairs. This imbalance ratio between interacting and non-interacting drug-target pairs is referred to as the between-class imbalance. Between-class imbalance degrades prediction performance due to the bias in prediction results towards the majority class (i.e. the non-interacting pairs), leading to more prediction errors in the minority class (i.e. the interacting pairs). Secondly, there are multiple types of drug-target interactions in the data with some types having relatively fewer members (or are less represented) than others. This variation in representation of the different interaction types leads to another kind of imbalance referred to as the within-class imbalance. In within-class imbalance, prediction results are biased towards the better represented interaction types, leading to more prediction errors in the less represented interaction types. RESULTS: We propose an ensemble learning method that incorporates techniques to address the issues of between-class imbalance and within-class imbalance. Experiments show that the proposed method improves results over 4 state-of-the-art methods. In addition, we simulated cases for new drugs and targets to see how our method would perform in predicting their interactions. New drugs and targets are those for which no prior interactions are known. Our method displayed satisfactory prediction performance and was able to predict many of the interactions successfully. CONCLUSIONS: Our proposed method has improved the prediction performance over the existing work, thus proving the importance of addressing problems pertaining to class imbalance in the data.

Asunto(s)

Algoritmos , Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Interacciones Farmacológicas , Preparaciones Farmacéuticas/metabolismo , Proteínas/metabolismo , Humanos , Proteínas/química

10.

Fast, accurate, and reliable molecular docking with QuickVina 2.

Alhossary, Amr; Handoko, Stephanus Daniel; Mu, Yuguang; Kwoh, Chee-Keong.

Bioinformatics ; 31(13): 2214-6, 2015 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-25717194

RESUMEN

MOTIVATION: The need for efficient molecular docking tools for high-throughput screening is growing alongside the rapid growth of drug-fragment databases. AutoDock Vina ('Vina') is a widely used docking tool with parallelization for speed. QuickVina ('QVina 1') then further enhanced the speed via a heuristics, requiring high exhaustiveness. With low exhaustiveness, its accuracy was compromised. We present in this article the latest version of QuickVina ('QVina 2') that inherits both the speed of QVina 1 and the reliability of the original Vina. RESULTS: We tested the efficacy of QVina 2 on the core set of PDBbind 2014. With the default exhaustiveness level of Vina (i.e. 8), a maximum of 20.49-fold and an average of 2.30-fold acceleration with a correlation coefficient of 0.967 for the first mode and 0.911 for the sum of all modes were attained over the original Vina. A tendency for higher acceleration with increased number of rotatable bonds as the design variables was observed. On the accuracy, Vina wins over QVina 2 on 30% of the data with average energy difference of only 0.58 kcal/mol. On the same dataset, GOLD produced RMSD smaller than 2 Å on 56.9% of the data while QVina 2 attained 63.1%. AVAILABILITY AND IMPLEMENTATION: The C++ source code of QVina 2 is available at (www.qvina.org). CONTACT: aalhossary@pmail.ntu.edu.sg SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Biología Computacional/métodos , Diseño de Fármacos , Simulación del Acoplamiento Molecular/métodos , Proteínas/química , Programas Informáticos , Bases de Datos Farmacéuticas , Humanos , Ligandos , Proteínas/metabolismo

11.

IFACEwat: the interfacial water-implemented re-ranking algorithm to improve the discrimination of near native structures for protein rigid docking.

Su, Chinh; Nguyen, Thuy-Diem; Zheng, Jie; Kwoh, Chee-Keong.

BMC Bioinformatics ; 15 Suppl 16: S9, 2014.

Artículo en Inglés | MEDLINE | ID: mdl-25521441

RESUMEN

BACKGROUND: Protein-protein docking is an in silico method to predict the formation of protein complexes. Due to limited computational resources, the protein-protein docking approach has been developed under the assumption of rigid docking, in which one of the two protein partners remains rigid during the protein associations and water contribution is ignored or implicitly presented. Despite obtaining a number of acceptable complex predictions, it seems to-date that most initial rigid docking algorithms still find it difficult or even fail to discriminate successfully the correct predictions from the other incorrect or false positive ones. To improve the rigid docking results, re-ranking is one of the effective methods that help re-locate the correct predictions in top high ranks, discriminating them from the other incorrect ones. RESULTS: Our results showed that the IFACEwat increased both the numbers of the near-native structures and improved their ranks as compared to the initial rigid docking ZDOCK3.0.2. In fact, the IFACEwat achieved a success rate of 83.8% for Antigen/Antibody complexes, which is 10% better than ZDOCK3.0.2. As compared to another re-ranking technique ZRANK, the IFACEwat obtains success rates of 92.3% (8% better) and 90% (5% better) respectively for medium and difficult cases. When comparing with the latest published re-ranking method F2Dock, the IFACEwat performed equivalently well or even better for several Antigen/Antibody complexes. CONCLUSIONS: With the inclusion of interfacial water, the IFACEwat improves mostly results of the initial rigid docking, especially for Antigen/Antibody complexes. The improvement is achieved by explicitly taking into account the contribution of water during the protein interactions, which was ignored or not fully presented by the initial rigid docking and other re-ranking techniques. In addition, the IFACEwat maintains sufficient computational efficiency of the initial docking algorithm, yet improves the ranks as well as the number of the near native structures found. As our implementation so far targeted to improve the results of ZDOCK3.0.2, and particularly for the Antigen/Antibody complexes, it is expected in the near future that more implementations will be conducted to be applicable for other initial rigid docking algorithms.

Asunto(s)

Algoritmos , Complejo Antígeno-Anticuerpo/química , Proteínas/química , Programas Informáticos , Agua/química , Complejo Antígeno-Anticuerpo/metabolismo , Simulación por Computador , Humanos , Enlace de Hidrógeno , Modelos Moleculares , Unión Proteica , Conformación Proteica

12.

Drug-target interaction prediction by learning from local information and neighbors.

Mei, Jian-Ping; Kwoh, Chee-Keong; Yang, Peng; Li, Xiao-Li; Zheng, Jie.

Bioinformatics ; 29(2): 238-45, 2013 Jan 15.

Artículo en Inglés | MEDLINE | ID: mdl-23162055

RESUMEN

MOTIVATION: In silico methods provide efficient ways to predict possible interactions between drugs and targets. Supervised learning approach, bipartite local model (BLM), has recently been shown to be effective in prediction of drug-target interactions. However, for drug-candidate compounds or target-candidate proteins that currently have no known interactions available, its pure 'local' model is not able to be learned and hence BLM may fail to make correct prediction when involving such kind of new candidates. RESULTS: We present a simple procedure called neighbor-based interaction-profile inferring (NII) and integrate it into the existing BLM method to handle the new candidate problem. Specifically, the inferred interaction profile is treated as label information and is used for model learning of new candidates. This functionality is particularly important in practice to find targets for new drug-candidate compounds and identify targeting drugs for new target-candidate proteins. Consistent good performance of the new BLM-NII approach has been observed in the experiment for the prediction of interactions between drugs and four categories of target proteins. Especially for nuclear receptors, BLM-NII achieves the most significant improvement as this dataset contains many drugs/targets with no interactions in the cross-validation. This demonstrates the effectiveness of the NII strategy and also shows the great potential of BLM-NII for prediction of compound-protein interactions. CONTACT: jpmei@ntu.edu.sg SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Inteligencia Artificial , Descubrimiento de Drogas , Proteínas/efectos de los fármacos , Algoritmos , Modelos Teóricos , Preparaciones Farmacéuticas/química , Proteínas/química , Receptores Citoplasmáticos y Nucleares/efectos de los fármacos

13.

Molecular docking analysis of 2009-H1N1 and 2004-H5N1 influenza virus HLA-B*4405-restricted HA epitope candidates: implications for TCR cross-recognition and vaccine development.

Su, Chinh T T; Schönbach, Christian; Kwoh, Chee-Keong.

BMC Bioinformatics ; 14 Suppl 2: S21, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-23368875

RESUMEN

BACKGROUND: The pandemic 2009-H1N1 influenza virus circulated in the human population and caused thousands deaths worldwide. Studies on pandemic influenza vaccines have shown that T cell recognition to conserved epitopes and cross-reactive T cell responses are important when new strains emerge, especially in the absence of antibody cross-reactivity. In this work, using HLA-B*4405 and DM1-TCR structure model, we systematically generated high confidence conserved 2009-H1N1 T cell epitope candidates and investigated their potential cross-reactivity against H5N1 avian flu virus. RESULTS: Molecular docking analysis of differential DM1-TCR recognition of the 2009-H1N1 epitope candidates yielded a mosaic epitope (KEKMNTEFW) and potential H5N1 HA cross-reactive epitopes that could be applied as multivalent peptide towards influenza A vaccine development. Structural models of TCR cross-recognition between 2009-H1N1 and 2004-H5N1 revealed steric and topological effects of TCR contact residue mutations on TCR binding affinity. CONCLUSIONS: The results are novel with regard to HA epitopes and useful for developing possible vaccination strategies against the rapidly changing influenza viruses. Yet, the challenge of identifying epitope candidates that result in heterologous T cell immunity under natural influenza infection conditions can only be overcome if more structural data on the TCR repertoire become available.

Asunto(s)

Epítopos de Linfocito T/química , Antígenos HLA/química , Subtipo H1N1 del Virus de la Influenza A , Subtipo H5N1 del Virus de la Influenza A , Simulación del Acoplamiento Molecular , Reacciones Cruzadas , Estructura Terciaria de Proteína , Receptores de Antígenos de Linfocitos T/química

14.

Structural analysis of the novel influenza A (H7N9) viral Neuraminidase interactions with current approved neuraminidase inhibitors Oseltamivir, Zanamivir, and Peramivir in the presence of mutation R289K.

Tran-To Su, Chinh; Ouyang, Xuchang; Zheng, Jie; Kwoh, Chee-Keong.

BMC Bioinformatics ; 14 Suppl 16: S7, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-24564719

RESUMEN

BACKGROUND: Since late March 2013, there has been another global health concern with a sudden wave of flu infections by a novel strain of avian influenza A (H7N9) virus in China. To-date, there have been more than 100 infections with 23 deaths. It is more worrying as this viral strain has never been detected in humans and only been found to be of low-pathogenicity. Currently, there are 3 effective neuraminidase inhibitors for this H7N9 virus strain, i.e. oseltamivir, zanamivir, and peramivir. These drugs have been used for treatment of the H7N9 influenza in China. However, how these inhibitors work and affect the binding cavity of the novel H7N9 neuraminidase in the presence of potential mutations has not been disclosed. In our study, we investigate steric effects and subsequently show the conformational restraints of the inhibitor-binding site of the non-mutated and mutated H7N9 neuraminidase structures to different drug compounds. RESULTS: Combination of molecular docking and Molecular Dynamics simulation reveal that zanamivir forms more favorable and stable complex than oseltamivir and peramivir when binding to the active site of the H7N9 neuraminidase. And it is likely that the novel influenza A (H7N9) virus adopts a higher probability to acquire resistance to peramivir than the other two inhibitors. Conformational changes induced by the mutation R289K causes loss of number of hydrogen bonds between the inhibitors and the H7N9 viral neuraminidase in 2 out of 3 complexes. In addition, our results of binding-affinity relationships of the 3 inhibitors with the viral neuraminidase proteins of previous pandemics (H1N1, H5N1) and the current novel H7N9 reflected the extent of binding effectiveness of the 3 inhibitors to the novel H7N9 neuraminidase. CONCLUSIONS: The results are novel and specific for the A/Hangzhou/1/2013(H7N9) influenza strain. Furthermore, the protocol could be useful for further drug-binding analysis and prediction of future viral mutations to which the virus evolves through adaptation and acquires resistance to the current available drugs.

Asunto(s)

Antivirales/química , Inhibidores Enzimáticos/química , Subtipo H7N9 del Virus de la Influenza A/enzimología , Neuraminidasa/antagonistas & inhibidores , Proteínas Virales/antagonistas & inhibidores , Ácidos Carbocíclicos , Antivirales/farmacología , Ciclopentanos/química , Ciclopentanos/farmacología , Farmacorresistencia Viral , Inhibidores Enzimáticos/farmacología , Guanidinas/química , Guanidinas/farmacología , Subtipo H7N9 del Virus de la Influenza A/efectos de los fármacos , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Mutación , Neuraminidasa/química , Neuraminidasa/genética , Oseltamivir/química , Oseltamivir/farmacología , Proteínas Virales/química , Proteínas Virales/genética , Zanamivir/química , Zanamivir/farmacología

15.

Identifying protein complexes from heterogeneous biological data.

Wu, Min; Xie, Zhipeng; Li, Xiaoli; Kwoh, Chee-Keong; Zheng, Jie.

Proteins ; 81(11): 2023-33, 2013 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-23852772

RESUMEN

With the increasing availability of diverse biological information for proteins, integration of heterogeneous data becomes more useful for many problems in proteomics, such as annotating protein functions, predicting novel protein-protein interactions and so on. In this paper, we present an integrative approach called InteHC (Integrative Hierarchical Clustering) to identify protein complexes from multiple data sources. Although integrating multiple sources could effectively improve the coverage of current insufficient protein interactome (the false negative issue), it could also introduce potential false-positive interactions that could hurt the performance of protein complex prediction. Our proposed InteHC method can effectively address these issues to facilitate accurate protein complex prediction and it is summarized into the following three steps. First, for each individual source/feature, InteHC computes the matrices to store the affinity scores between a protein pair that indicate their propensity to interact or co-complex relationship. Second, InteHC computes a final score matrix, which is the weighted sum of affinity scores from individual sources. In particular, the weights indicating the reliability of individual sources are learned from a supervised model (i.e., a linear ranking SVM). Finally, a hierarchical clustering algorithm is performed on the final score matrix to generate clusters as predicted protein complexes. In our experiments, we compared the results collected by our hierarchical clustering on each individual feature with those predicted by InteHC on the combined matrix. We observed that integration of heterogeneous data significantly benefits the identification of protein complexes. Moreover, a comprehensive comparison demonstrates that InteHC performs much better than 14 state-of-the-art approaches. All the experimental data and results can be downloaded from http://www.ntu.edu.sg/home/zhengjie/data/InteHC.

Asunto(s)

Proteínas/química , Algoritmos , Bases de Datos de Proteínas , Unión Proteica , Mapeo de Interacción de Proteínas , Proteínas/metabolismo

16.

Positive-unlabeled learning for disease gene identification.

Yang, Peng; Li, Xiao-Li; Mei, Jian-Ping; Kwoh, Chee-Keong; Ng, See-Kiong.

Bioinformatics ; 28(20): 2640-7, 2012 Oct 15.

Artículo en Inglés | MEDLINE | ID: mdl-22923290

RESUMEN

BACKGROUND: Identifying disease genes from human genome is an important but challenging task in biomedical research. Machine learning methods can be applied to discover new disease genes based on the known ones. Existing machine learning methods typically use the known disease genes as the positive training set P and the unknown genes as the negative training set N (non-disease gene set does not exist) to build classifiers to identify new disease genes from the unknown genes. However, such kind of classifiers is actually built from a noisy negative set N as there can be unknown disease genes in N itself. As a result, the classifiers do not perform as well as they could be. RESULT: Instead of treating the unknown genes as negative examples in N, we treat them as an unlabeled set U. We design a novel positive-unlabeled (PU) learning algorithm PUDI (PU learning for disease gene identification) to build a classifier using P and U. We first partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN and weak negative set WN. The weighted support vector machines are then used to build a multi-level classifier based on the four training sets and positive training set P to identify disease genes. Our experimental results demonstrate that our proposed PUDI algorithm outperformed the existing methods significantly. CONCLUSION: The proposed PUDI algorithm is able to identify disease genes more accurately by treating the unknown data more appropriately as unlabeled set U instead of negative set N. Given that many machine learning problems in biomedical research do involve positive and unlabeled data instead of negative data, it is possible that the machine learning methods for these problems can be further improved by adopting PU learning methods, as we have done here for disease gene identification. AVAILABILITY AND IMPLEMENTATION: The executable program and data are available at http://www1.i2r.a-star.edu.sg/~xlli/PUDI/PUDI.html.

Asunto(s)

Inteligencia Artificial , Enfermedad/genética , Genes , Algoritmos , Humanos , Máquina de Vectores de Soporte

17.

Self-Supervised Contrastive Representation Learning for Semi-Supervised Time-Series Classification.

Eldele, Emadeldeen; Ragab, Mohamed; Chen, Zhenghua; Wu, Min; Kwoh, Chee-Keong; Li, Xiaoli; Guan, Cuntai.

IEEE Trans Pattern Anal Mach Intell ; 45(12): 15604-15618, 2023 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-37639415

RESUMEN

Learning time-series representations when only unlabeled data or few labeled samples are available can be a challenging task. Recently, contrastive self-supervised learning has shown great improvement in extracting useful representations from unlabeled data via contrasting different augmented views of data. In this work, we propose a novel Time-Series representation learning framework via Temporal and Contextual Contrasting (TS-TCC) that learns representations from unlabeled data with contrastive learning. Specifically, we propose time-series-specific weak and strong augmentations and use their views to learn robust temporal relations in the proposed temporal contrasting module, besides learning discriminative representations by our proposed contextual contrasting module. Additionally, we conduct a systematic study of time-series data augmentation selection, which is a key part of contrastive learning. We also extend TS-TCC to the semi-supervised learning settings and propose a Class-Aware TS-TCC (CA-TCC) that benefits from the available few labeled data to further improve representations learned by TS-TCC. Specifically, we leverage the robust pseudo labels produced by TS-TCC to realize a class-aware contrastive loss. Extensive experiments show that the linear evaluation of the features learned by our proposed framework performs comparably with the fully supervised training. Additionally, our framework shows high efficiency in few labeled data and transfer learning scenarios.

18.

Self-supervised Learning for Label-Efficient Sleep Stage Classification: A Comprehensive Evaluation.

Eldele, Emadeldeen; Ragab, Mohamed; Chen, Zhenghua; Wu, Min; Kwoh, Chee-Keong; Li, Xiaoli.

IEEE Trans Neural Syst Rehabil Eng ; PP2023 Feb 14.

Artículo en Inglés | MEDLINE | ID: mdl-37022869

RESUMEN

The past few years have witnessed a remarkable advance in deep learning for EEG-based sleep stage classification (SSC). However, the success of these models is attributed to possessing a massive amount of labeled data for training, limiting their applicability in real-world scenarios. In such scenarios, sleep labs can generate a massive amount of data, but labeling can be expensive and time-consuming. Recently, the self-supervised learning (SSL) paradigm has emerged as one of the most successful techniques to overcome labels' scarcity. In this paper, we evaluate the efficacy of SSL to boost the performance of existing SSC models in the few-labels regime. We conduct a thorough study on three SSC datasets, and we find that fine-tuning the pretrained SSC models with only 5% of labeled data can achieve competitive performance to the supervised training with full labels. Moreover, self-supervised pretraining helps SSC models to be more robust to data imbalance and domain shift problems.

19.

Structural analysis of the hot spots in the binding between H1N1 HA and the 2D1 antibody: do mutations of H1N1 from 1918 to 2009 affect much on this binding?

Liu, Qian; Hoi, Steven C H; Su, Chinh T T; Li, Zhenhua; Kwoh, Chee-Keong; Wong, Limsoon; Li, Jinyan.

Bioinformatics ; 27(18): 2529-36, 2011 Sep 15.

Artículo en Inglés | MEDLINE | ID: mdl-21784793

RESUMEN

MOTIVATION: Worldwide and substantial mortality caused by the 2009 H1N1 influenza A has stimulated a new surge of research on H1N1 viruses. An epitope conservation has been learned in the HA1 protein that allows antibodies to cross-neutralize both 1918 and 2009 H1N1. However, few works have thoroughly studied the binding hot spots in those two antigen-antibody interfaces which are responsible for the antibody cross-neutralization. RESULTS: We apply predictive methods to identify binding hot spots at the epitope sites of the HA1 proteins and at the paratope sites of the 2D1 antibody. We find that the six mutations at the HA1's epitope from 1918 to 2009 should not harm its binding to 2D1. Instead, the change of binding free energy on the whole exhibits an increased tendency after these mutations, making the binding stronger. This is consistent with the observation that the 1918 H1N1 neutralizing antibody can cross-react with 2009 H1N1. We identified three distinguished hot spot residues, including Lys(166), common between the two epitopes. These common hot spots again can explain why 2D1 cross-reacted. We believe that these hot spot residues are mutation candidates which may help H1N1 viruses to evade the immune system. We also identified eight residues at the paratope site of 2D1, five from its heavy chain and three from its light chain, that are predicted to be energetically important in the HA1 recognition. The identification of these hot spot residues and their structural analysis are potentially useful to fight against H1N1 viruses. CONTACT: jinyan.li@uts.edu.au AVAILABILITY: Z-score is available at http://155.69.2.25/liuqian/indexz.py SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Sitios de Unión de Anticuerpos/genética , Epítopos/genética , Subtipo H1N1 del Virus de la Influenza A/genética , Gripe Humana/genética , Anticuerpos/genética , Anticuerpos/inmunología , Anticuerpos Neutralizantes/inmunología , Sitios de Unión de Anticuerpos/inmunología , Reacciones Cruzadas , Epítopos/inmunología , Humanos , Subtipo H1N1 del Virus de la Influenza A/inmunología , Gripe Humana/inmunología , Mutación , Unión Proteica/genética , Unión Proteica/inmunología , Proteínas/genética , Proteínas/inmunología , Alineación de Secuencia

20.

Epigenetic functions enriched in transcription factors binding to mouse recombination hotspots.

Wu, Min; Kwoh, Chee-Keong; Przytycka, Teresa M; Li, Jing; Zheng, Jie.

Proteome Sci ; 10 Suppl 1: S11, 2012 Jun 21.

Artículo en Inglés | MEDLINE | ID: mdl-22759569

RESUMEN

The regulatory mechanism of recombination is a fundamental problem in genomics, with wide applications in genome-wide association studies, birth-defect diseases, molecular evolution, cancer research, etc. In mammalian genomes, recombination events cluster into short genomic regions called "recombination hotspots". Recently, a 13-mer motif enriched in hotspots is identified as a candidate cis-regulatory element of human recombination hotspots; moreover, a zinc finger protein, PRDM9, binds to this motif and is associated with variation of recombination phenotype in human and mouse genomes, thus is a trans-acting regulator of recombination hotspots. However, this pair of cis and trans-regulators covers only a fraction of hotspots, thus other regulators of recombination hotspots remain to be discovered. In this paper, we propose an approach to predicting additional trans-regulators from DNA-binding proteins by comparing their enrichment of binding sites in hotspots. Applying this approach on newly mapped mouse hotspots genome-wide, we confirmed that PRDM9 is a major trans-regulator of hotspots. In addition, a list of top candidate trans-regulators of mouse hotspots is reported. Using GO analysis we observed that the top genes are enriched with function of histone modification, highlighting the epigenetic regulatory mechanisms of recombination hotspots.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA