Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 83
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 23(2)2022 03 10.
Artículo en Inglés | MEDLINE | ID: mdl-35136933

RESUMEN

The advances in single-cell RNA sequencing (scRNA-seq) technologies enable the characterization of transcriptomic profiles at the cellular level and demonstrate great promise in bulk sample analysis thereby offering opportunities to transfer gene signature from scRNA-seq to bulk data. However, the gene expression signatures identified from single cells are typically inapplicable to bulk RNA-seq data due to the profiling differences of distinct sequencing technologies. Here, we propose single-cell pair-wise gene expression (scPAGE), a novel method to develop single-cell gene pair signatures (scGPSs) that were beneficial to bulk RNA-seq classification to transfer knowledge across platforms. PAGE was adopted to tackle the challenge of profiling differences. We applied the method to acute myeloid leukemia (AML) and identified the scGPS from mouse scRNA-seq that allowed discriminating between AML and control cells. The scGPS was validated in bulk RNA-seq datasets and demonstrated better performance (average area under the curve [AUC] = 0.96) than the conventional gene expression strategies (average AUC$\le$ 0.88) suggesting its potential in disclosing the molecular mechanism of AML. The scGPS also outperformed its bulk counterpart, which highlighted the benefit of gene signature transfer. Furthermore, we confirmed the utility of scPAGE in sepsis as an example of other disease scenarios. scPAGE leveraged the advantages of single-cell profiles to enhance the analysis of bulk samples revealing great potential of transferring knowledge from single-cell to bulk transcriptome studies.


Asunto(s)
Leucemia Mieloide Aguda , Análisis de la Célula Individual , Animales , Perfilación de la Expresión Génica/métodos , Leucemia Mieloide Aguda/genética , Ratones , RNA-Seq , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Transcriptoma
2.
Bioinformatics ; 39(5)2023 05 04.
Artículo en Inglés | MEDLINE | ID: mdl-37084255

RESUMEN

MOTIVATION: Human gut microbiota plays a vital role in maintaining body health. The dysbiosis of gut microbiota is associated with a variety of diseases. It is critical to uncover the associations between gut microbiota and disease states as well as other intrinsic or environmental factors. However, inferring alterations of individual microbial taxa based on relative abundance data likely leads to false associations and conflicting discoveries in different studies. Moreover, the effects of underlying factors and microbe-microbe interactions could lead to the alteration of larger sets of taxa. It might be more robust to investigate gut microbiota using groups of related taxa instead of the composition of individual taxa. RESULTS: We proposed a novel method to identify underlying microbial modules, i.e. groups of taxa with similar abundance patterns affected by a common latent factor, from longitudinal gut microbiota and applied it to inflammatory bowel disease (IBD). The identified modules demonstrated closer intragroup relationships, indicating potential microbe-microbe interactions and influences of underlying factors. Associations between the modules and several clinical factors were investigated, especially disease states. The IBD-associated modules performed better in stratifying the subjects compared with the relative abundance of individual taxa. The modules were further validated in external cohorts, demonstrating the efficacy of the proposed method in identifying general and robust microbial modules. The study reveals the benefit of considering the ecological effects in gut microbiota analysis and the great promise of linking clinical factors with underlying microbial modules. AVAILABILITY AND IMPLEMENTATION: https://github.com/rwang-z/microbial_module.git.


Asunto(s)
Microbioma Gastrointestinal , Enfermedades Inflamatorias del Intestino , Humanos , Interacciones Microbianas
3.
Bioinformatics ; 39(3)2023 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-36857587

RESUMEN

MOTIVATION: The confusion of acute inflammation infected by virus and bacteria or noninfectious inflammation will lead to missing the best therapy occasion resulting in poor prognoses. The diagnostic model based on host gene expression has been widely used to diagnose acute infections, but the clinical usage was hindered by the capability across different samples and cohorts due to the small sample size for signature training and discovery. RESULTS: Here, we construct a large-scale dataset integrating multiple host transcriptomic data and analyze it using a sophisticated strategy which removes batch effect and extracts the common information from different cohorts based on the relative expression alteration of gene pairs. We assemble 2680 samples across 16 cohorts and separately build gene pair signature (GPS) for bacterial, viral, and noninfected patients. The three GPSs are further assembled into an antibiotic decision model (bacterial-viral-noninfected GPS, bvnGPS) using multiclass neural networks, which is able to determine whether a patient is bacterial infected, viral infected, or noninfected. bvnGPS can distinguish bacterial infection with area under the receiver operating characteristic curve (AUC) of 0.953 (95% confidence interval, 0.948-0.958) and viral infection with AUC of 0.956 (0.951-0.961) in the test set (N = 760). In the validation set (N = 147), bvnGPS also shows strong performance by attaining an AUC of 0.988 (0.978-0.998) on bacterial-versus-other and an AUC of 0.994 (0.984-1.000) on viral-versus-other. bvnGPS has the potential to be used in clinical practice and the proposed procedure provides insight into data integration, feature selection and multiclass classification for host transcriptomics data. AVAILABILITY AND IMPLEMENTATION: The codes implementing bvnGPS are available at https://github.com/Ritchiegit/bvnGPS. The construction of iPAGE algorithm and the training of neural network was conducted on Python 3.7 with Scikit-learn 0.24.1 and PyTorch 1.7. The visualization of the results was implemented on R 4.2, Python 3.7, and Matplotlib 3.3.4.


Asunto(s)
Transcriptoma , Virosis , Humanos , Redes Neurales de la Computación , Bacterias , Virosis/diagnóstico , Virosis/genética , Inflamación
4.
Brief Bioinform ; 22(1): 581-588, 2021 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-32003790

RESUMEN

Moonlighting proteins provide more options for cells to execute multiple functions without increasing the genome and transcriptome complexity. Although there have long been calls for computational methods for the prediction of moonlighting proteins, no method has been designed for determining moonlighting long noncoding ribonucleicacidz (RNAs) (mlncRNAs). Previously, we developed an algorithm MoonFinder for the identification of mlncRNAs at the genome level based on the functional annotation and interactome data of lncRNAs and proteins. Here, we update MoonFinder to MoonFinder v2.0 by providing an extensive framework for the detection of protein modules and the establishment of RNA-module associations in human. A novel measure, moonlighting coefficient, was also proposed to assess the confidence of an ncRNA acting in a moonlighting manner. Moreover, we explored the expression characteristics of mlncRNAs in sepsis, in which we found that mlncRNAs tend to be upregulated and differentially expressed. Interestingly, the mlncRNAs are mutually exclusive in terms of coexpression when compared to the other lncRNAs. Overall, MoonFinder v2.0 is dedicated to the prediction of human mlncRNAs and thus bears great promise to serve as a valuable R package for worldwide research communities (https://cran.r-project.org/web/packages/MoonFinder/index.html). Also, our analyses provide the first attempt to characterize mlncRNA expression and coexpression properties in adult sepsis patients, which will facilitate the understanding of the interaction and expression patterns of mlncRNAs.


Asunto(s)
Redes Reguladoras de Genes , Genómica/métodos , ARN Largo no Codificante/genética , Sepsis/genética , Humanos , Mapas de Interacción de Proteínas , Proteoma/genética , Proteoma/metabolismo , ARN Largo no Codificante/metabolismo , Sepsis/metabolismo , Programas Informáticos
5.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34169324

RESUMEN

The superior performance of machine-learning scoring functions for docking has caused a series of debates on whether it is due to learning knowledge from training data that are similar in some sense to the test data. With a systematically revised methodology and a blind benchmark realistically mimicking the process of prospective prediction of binding affinity, we have evaluated three broadly used classical scoring functions and five machine-learning counterparts calibrated with both random forest and extreme gradient boosting using both solo and hybrid features, showing for the first time that machine-learning scoring functions trained exclusively on a proportion of as low as 8% complexes dissimilar to the test set already outperform classical scoring functions, a percentage that is far lower than what has been recently reported on all the three CASF benchmarks. The performance of machine-learning scoring functions is underestimated due to the absence of similar samples in some artificially created training sets that discard the full spectrum of complexes to be found in a prospective environment. Given the inevitability of any degree of similarity contained in a large dataset, the criteria for scoring function selection depend on which one can make the best use of all available materials. Software code and data are provided at https://github.com/cusdulab/MLSF for interested readers to rapidly rebuild the scoring functions and reproduce our results, even to make extended analyses on their own benchmarks.


Asunto(s)
Benchmarking/métodos , Aprendizaje Automático , Modelos Estadísticos , Algoritmos , Benchmarking/normas , Bases de Datos Factuales , Ligandos , Modelos Moleculares , Conformación Molecular , Unión Proteica , Análisis de Regresión , Reproducibilidad de los Resultados , Flujo de Trabajo
6.
Bioinformatics ; 38(14): 3513-3522, 2022 07 11.
Artículo en Inglés | MEDLINE | ID: mdl-35674358

RESUMEN

MOTIVATION: Hepatocellular carcinoma (HCC) is a primary malignancy with a poor prognosis. Recently, multi-omics molecular-level measurement enables HCC diagnosis and prognosis prediction, which is crucial for early intervention of personalized therapy to diminish mortality. Here, we introduce a novel strategy utilizing DNA methylation and RNA expression data to achieve a multi-omics gene pair signature (GPS) for HCC discrimination. RESULTS: The immune genes with negative correlations between expression and promoter methylation are enriched in the highly connected cancer-related pathway network, which are considered as the candidates for HCC detection. After that, we separately construct a methylation GPS (mGPS) and an expression GPS (eGPS), and then assemble them as a meGPS with five gene pairs, in which the significant methylation and expression changes occur between HCC tumor and non-tumor groups. Reliable performance has been validated by independent tissue (age, gender and etiology) and blood datasets. This study proposes a procedure for multi-omics GPS identification and develops a novel HCC signature using both methylome and transcriptome data, suggesting potential molecular targets for the detection and therapy of HCC. AVAILABILITY AND IMPLEMENTATION: Models are available at https://github.com/bioinformaticStudy/meGPS.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/diagnóstico , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/metabolismo , Transcriptoma , Neoplasias Hepáticas/diagnóstico , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo , Epigenoma , Metilación de ADN , Biomarcadores de Tumor/genética , Regulación Neoplásica de la Expresión Génica
7.
Plant J ; 105(3): 708-720, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-33128829

RESUMEN

Autophagy is a self-degradative process that is crucial for maintaining cellular homeostasis by removing damaged cytoplasmic components and recycling nutrients. Such an evolutionary conserved proteolysis process is regulated by the autophagy-related (Atg) proteins. The incomplete understanding of plant autophagy proteome and the importance of a proteome-wide understanding of the autophagy pathway prompted us to predict Atg proteins and regulators in Arabidopsis. Here, we developed a systems-level algorithm to identify autophagy-related modules (ARMs) based on protein subcellular localization, protein-protein interactions, and known Atg proteins. This generates a detailed landscape of the autophagic modules in Arabidopsis. We found that the newly identified genes in each ARM tend to be upregulated and coexpressed during the senescence stage of Arabidopsis. We also demonstrated that the Golgi apparatus ARM, ARM13, functions in the autophagy process by module clustering and functional analysis. To verify the in silico analysis, the Atg candidates in ARM13 that are functionally similar to the core Atg proteins were selected for experimental validation. Interestingly, two of the previously uncharacterized proteins identified from the ARM analysis, AGD1 and Sec14, exhibited bona fide association with the autophagy protein complex in plant cells, which provides evidence for a cross-talk between intracellular pathways and autophagy. Thus, the computational framework has facilitated the identification and characterization of plant-specific autophagy-related proteins and novel autophagy proteins/regulators in higher eukaryotes.


Asunto(s)
Arabidopsis/metabolismo , Proteínas Relacionadas con la Autofagia/metabolismo , Autofagia/fisiología , Algoritmos , Arabidopsis/citología , Arabidopsis/genética , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Familia de las Proteínas 8 Relacionadas con la Autofagia/genética , Familia de las Proteínas 8 Relacionadas con la Autofagia/metabolismo , Proteínas Relacionadas con la Autofagia/genética , Beclina-1/genética , Beclina-1/metabolismo , Biología Computacional/métodos , Regulación de la Expresión Génica de las Plantas , Reproducibilidad de los Resultados
8.
BMC Genomics ; 22(1): 275, 2021 Apr 16.
Artículo en Inglés | MEDLINE | ID: mdl-33863291

RESUMEN

BACKGROUND: Sepsis is the major cause of death in Intensive Care Unit (ICU) globally. Molecular detection enables rapid diagnosis that allows early intervention to minimize the death rate. Recent studies showed that long non-coding RNAs (lncRNAs) regulate proinflammatory genes and are related to the dysfunction of organs in sepsis. Identifying lncRNA signature with absolute abundance is challenging because of the technical variation and the systematic experimental bias. RESULTS: Cohorts (n = 768) containing whole blood lncRNA profiling of sepsis patients in the Gene Expression Omnibus (GEO) database were included. We proposed a novel diagnostic strategy that made use of the relative expressions of lncRNA pairs, which are reversed between sepsis patients and normal controls (eg. lncRNAi > lncRNAj in sepsis patients and lncRNAi < lncRNAj in normal controls), to identify 14 lncRNA pairs as a sepsis diagnostic signature. The signature was then applied to independent cohorts (n = 644) to evaluate its predictive performance across different ages and normalization methods. Comparing to common machine learning models and existing signatures, SepSigLnc consistently attains better performance on the validation cohorts from the same age group (AUC = 0.990 & 0.995 in two cohorts) and across different groups (AUC = 0.878 on average), as well as cohorts processed by an alternative normalization method (AUC = 0.953 on average). Functional analysis demonstrates that the lncRNA pairs in SepsigLnc are functionally similar and tend to implicate in the same biological processes including cell fate commitment and cellular response to steroid hormone stimulus. CONCLUSION: Our study identified 14 lncRNA pairs as signature that can facilitate the diagnosis of septic patients at an intervenable point when clinical manifestations are not dramatic. Also, the computational procedure can be generalized to a standard procedure for discovering diagnostic molecule signatures.


Asunto(s)
Neoplasias Encefálicas/mortalidad , ARN Largo no Codificante/genética , Sepsis/diagnóstico , Estudios de Cohortes , Humanos , Sepsis/genética
9.
Mol Med ; 27(1): 15, 2021 02 12.
Artículo en Inglés | MEDLINE | ID: mdl-33579185

RESUMEN

BACKGROUND: Cyclin-dependent kinases 2/4/6 (CDK2/4/6) play critical roles in cell cycle progression, and their deregulations are hallmarks of hepatocellular carcinoma (HCC). METHODS: We used the combination of computational and experimental approaches to discover a CDK2/4/6 triple-inhibitor from FDA approved small-molecule drugs for the treatment of HCC. RESULTS: We identified vanoxerine dihydrochloride as a new CDK2/4/6 inhibitor, and a strong cytotoxicdrugin human HCC QGY7703 and Huh7 cells (IC50: 3.79 µM for QGY7703and 4.04 µM for Huh7 cells). In QGY7703 and Huh7 cells, vanoxerine dihydrochloride treatment caused G1-arrest, induced apoptosis, and reduced the expressions of CDK2/4/6, cyclin D/E, retinoblastoma protein (Rb), as well as the phosphorylation of CDK2/4/6 and Rb. Drug combination study indicated that vanoxerine dihydrochloride and 5-Fu produced synergistic cytotoxicity in vitro in Huh7 cells. Finally, in vivo study in BALB/C nude mice subcutaneously xenografted with Huh7 cells, vanoxerine dihydrochloride (40 mg/kg, i.p.) injection for 21 days produced significant anti-tumor activity (p < 0.05), which was comparable to that achieved by 5-Fu (10 mg/kg, i.p.), with the combination treatment resulted in synergistic effect. Immunohistochemistry staining of the tumor tissues also revealed significantly reduced expressions of Rb and CDK2/4/6in vanoxerinedihydrochloride treatment group. CONCLUSIONS: The present study isthe first report identifying a new CDK2/4/6 triple inhibitor vanoxerine dihydrochloride, and demonstrated that this drug represents a novel therapeutic strategy for HCC treatment.


Asunto(s)
Carcinoma Hepatocelular/tratamiento farmacológico , Quinasa 2 Dependiente de la Ciclina/antagonistas & inhibidores , Quinasa 4 Dependiente de la Ciclina/antagonistas & inhibidores , Quinasa 6 Dependiente de la Ciclina/antagonistas & inhibidores , Fluorouracilo/administración & dosificación , Neoplasias Hepáticas/tratamiento farmacológico , Piperazinas/administración & dosificación , Animales , Carcinoma Hepatocelular/metabolismo , Línea Celular Tumoral , Proliferación Celular/efectos de los fármacos , Supervivencia Celular/efectos de los fármacos , Quinasa 2 Dependiente de la Ciclina/metabolismo , Quinasa 4 Dependiente de la Ciclina/metabolismo , Quinasa 6 Dependiente de la Ciclina/metabolismo , Regulación hacia Abajo , Sinergismo Farmacológico , Femenino , Fluorouracilo/farmacología , Regulación Neoplásica de la Expresión Génica/efectos de los fármacos , Humanos , Inyecciones Subcutáneas , Neoplasias Hepáticas/metabolismo , Ratones , Ratones Endogámicos BALB C , Ratones Desnudos , Fosforilación/efectos de los fármacos , Piperazinas/farmacología , Ensayos Antitumor por Modelo de Xenoinjerto
10.
Evol Comput ; 29(2): 239-268, 2021 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-33047611

RESUMEN

Genetic Programming is a method to automatically create computer programs based on the principles of evolution. The problem of deceptiveness caused by complex dependencies among components of programs is challenging. It is important because it can misguide Genetic Programming to create suboptimal programs. Besides, a minor modification in the programs may lead to a notable change in the program behaviours and affect the final outputs. This article presents Grammar-Based Genetic Programming with Bayesian Classifiers (GBGPBC) in which the probabilistic dependencies among components of programs are captured using a set of Bayesian network classifiers. Our system was evaluated using a set of benchmark problems (the deceptive maximum problems, the royal tree problems, and the bipolar asymmetric royal tree problems). It was shown to be often more robust and more efficient in searching the best programs than other related Genetic Programming approaches in terms of the total number of fitness evaluation. We studied what factors affect the performance of GBGPBC and discovered that robust variants of GBGPBC were consistently weakly correlated with some complexity measures. Furthermore, our approach has been applied to learn a ranking program on a set of customers in direct marketing. Our suggested solutions help companies to earn significantly more when compared with other solutions produced by several well-known machine learning algorithms, such as neural networks, logistic regression, and Bayesian networks.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Teorema de Bayes , Aprendizaje Automático , Programas Informáticos
11.
Bioinformatics ; 35(20): 3989-3995, 2019 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-30873528

RESUMEN

MOTIVATION: Studies have shown that the accuracy of random forest (RF)-based scoring functions (SFs), such as RF-Score-v3, increases with more training samples, whereas that of classical SFs, such as X-Score, does not. Nevertheless, the impact of the similarity between training and test samples on this matter has not been studied in a systematic manner. It is therefore unclear how these SFs would perform when only trained on protein-ligand complexes that are highly dissimilar or highly similar to the test set. It is also unclear whether SFs based on machine learning algorithms other than RF can also improve accuracy with increasing training set size and to what extent they learn from dissimilar or similar training complexes. RESULTS: We present a systematic study to investigate how the accuracy of classical and machine-learning SFs varies with protein-ligand complex similarities between training and test sets. We considered three types of similarity metrics, based on the comparison of either protein structures, protein sequences or ligand structures. Regardless of the similarity metric, we found that incorporating a larger proportion of similar complexes to the training set did not make classical SFs more accurate. In contrast, RF-Score-v3 was able to outperform X-Score even when trained on just 32% of the most dissimilar complexes, showing that its superior performance owes considerably to learning from dissimilar training complexes to those in the test set. In addition, we generated the first SF employing Extreme Gradient Boosting (XGBoost), XGB-Score, and observed that it also improves with training set size while outperforming the rest of SFs. Given the continuous growth of training datasets, the development of machine-learning SFs has become very appealing. AVAILABILITY AND IMPLEMENTATION: https://github.com/HongjianLi/MLSF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Aprendizaje Automático , Ligandos , Unión Proteica , Proteínas
12.
BMC Bioinformatics ; 20(Suppl 26): 628, 2019 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-31839008

RESUMEN

BACKGROUND: Development of new drugs is a time-consuming and costly process, and the cost is still increasing in recent years. However, the number of drugs approved by FDA every year per dollar spent on development is declining. Drug repositioning, which aims to find new use of existing drugs, attracts attention of pharmaceutical researchers due to its high efficiency. A variety of computational methods for drug repositioning have been proposed based on machine learning approaches, network-based approaches, matrix decomposition approaches, etc. RESULTS: We propose a novel computational method for drug repositioning. We construct and decompose three-dimensional tensors, which consist of the associations among drugs, targets and diseases, to derive latent factors reflecting the functional patterns of the three kinds of entities. The proposed method outperforms several baseline methods in recovering missing associations. Most of the top predictions are validated by literature search and computational docking. Latent factors are used to cluster the drugs, targets and diseases into functional groups. Topological Data Analysis (TDA) is applied to investigate the properties of the clusters. We find that the latent factors are able to capture the functional patterns and underlying molecular mechanisms of drugs, targets and diseases. In addition, we focus on repurposing drugs for cancer and discover not only new therapeutic use but also adverse effects of the drugs. In the in-depth study of associations among the clusters of drugs, targets and cancer subtypes, we find there exist strong associations between particular clusters. CONCLUSIONS: The proposed method is able to recover missing associations, discover new predictions and uncover functional clusters of drugs, targets and diseases. The clustering of drugs, targets and diseases, as well as the associations among the clusters, provides a new guiding framework for drug repositioning.


Asunto(s)
Biología Computacional , Reposicionamiento de Medicamentos , Análisis por Conglomerados , Biología Computacional/métodos , Reposicionamiento de Medicamentos/métodos , Humanos , Aprendizaje Automático
13.
BMC Bioinformatics ; 20(1): 408, 2019 Jul 29.
Artículo en Inglés | MEDLINE | ID: mdl-31357929

RESUMEN

BACKGROUND: Understanding the phenotypic drug response on cancer cell lines plays a vital role in anti-cancer drug discovery and re-purposing. The Genomics of Drug Sensitivity in Cancer (GDSC) database provides open data for researchers in phenotypic screening to build and test their models. Previously, most research in these areas starts from the molecular fingerprints or physiochemical features of drugs, instead of their structures. RESULTS: In this paper, a model called twin Convolutional Neural Network for drugs in SMILES format (tCNNS) is introduced for phenotypic screening. tCNNS uses a convolutional network to extract features for drugs from their simplified molecular input line entry specification (SMILES) format and uses another convolutional network to extract features for cancer cell lines from the genetic feature vectors respectively. After that, a fully connected network is used to predict the interaction between the drugs and the cancer cell lines. When the training set and the testing set are divided based on the interaction pairs between drugs and cell lines, tCNNS achieves 0.826, 0.831 for the mean and top quartile of the coefficient of determinant (R2) respectively and 0.909, 0.912 for the mean and top quartile of the Pearson correlation (Rp) respectively, which are significantly better than those of the previous works (Ammad-Ud-Din et al., J Chem Inf Model 54:2347-9, 2014), (Haider et al., PLoS ONE 10:0144490, 2015), (Menden et al., PLoS ONE 8:61318, 2013). However, when the training set and the testing set are divided exclusively based on drugs or cell lines, the performance of tCNNS decreases significantly and Rp and R2 drop to barely above 0. CONCLUSIONS: Our approach is able to predict the drug effects on cancer cell lines with high accuracy, and its performance remains stable with less but high-quality data, and with fewer features for the cancer cell lines. tCNNS can also solve the problem of outliers in other feature space. Besides achieving high scores in these statistical metrics, tCNNS also provides some insights into the phenotypic screening. However, the performance of tCNNS drops in the blind test.


Asunto(s)
Antineoplásicos/uso terapéutico , Aprendizaje Profundo , Neoplasias/tratamiento farmacológico , Redes Neurales de la Computación , Antineoplásicos/farmacología , Línea Celular Tumoral , Bases de Datos Factuales , Genómica , Humanos , Concentración 50 Inhibidora , Especificidad de Órganos/efectos de los fármacos , Fenotipo , Análisis de Regresión
14.
BMC Bioinformatics ; 20(1): 23, 2019 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-30642247

RESUMEN

BACKGROUND: Clustering molecular network is a typical method in system biology, which is effective in predicting protein complexes or functional modules. However, few studies have realized that biological molecules are spatial-temporally regulated to form a dynamic cellular network and only a subset of interactions take place at the same location in cells. RESULTS: In this study, considering the subcellular localization of proteins, we first construct a co-localization human protein interaction network (PIN) and systematically investigate the relationship between subcellular localization and biological functions. After that, we propose a Locational and Topological Overlap Model (LTOM) to preprocess the co-localization PIN to identify functional modules. LTOM requires the topological overlaps, the common partners shared by two proteins, to be annotated in the same localization as the two proteins. We observed the model has better correspondence with the reference protein complexes and shows more relevance to cancers based on both human and yeast datasets and two clustering algorithms, ClusterONE and MCL. CONCLUSION: Taking into consideration of protein localization and topological overlap can improve the performance of module detection from protein interaction networks.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Bases de Datos de Proteínas , Proteínas de Neoplasias/metabolismo , Mapeo de Interacción de Proteínas/métodos , Mapas de Interacción de Proteínas , Proteínas de Saccharomyces cerevisiae/metabolismo , Humanos , Proteínas de Neoplasias/química , Proteínas de Saccharomyces cerevisiae/química
15.
Bioinformatics ; 34(20): 3519-3528, 2018 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-29771280

RESUMEN

Motivation: Moonlighting proteins are a class of proteins having multiple distinct functions, which play essential roles in a variety of cellular and enzymatic functioning systems. Although there have long been calls for computational algorithms for the identification of moonlighting proteins, research on approaches to identify moonlighting long non-coding RNAs (lncRNAs) has never been undertaken. Here, we introduce a novel methodology, MoonFinder, for the identification of moonlighting lncRNAs. MoonFinder is a statistical algorithm identifying moonlighting lncRNAs without a priori knowledge through the integration of protein interactome, RNA-protein interactions and functional annotation of proteins. Results: We identify 155 moonlighting lncRNA candidates and uncover that they are a distinct class of lncRNAs characterized by specific sequence and cellular localization features. The non-coding genes that transcript moonlighting lncRNAs tend to have shorter but more exons and the moonlighting lncRNAs have a variable localization pattern with a high chance of residing in the cytoplasmic compartment in comparison to the other lncRNAs. Moreover, moonlighting lncRNAs and moonlighting proteins are rather mutually exclusive in terms of both their direct interactions and interacting partners. Our results also shed light on how the moonlighting candidates and their interacting proteins implicated in the formation and development of cancers and other diseases. Availability and implementation: The code implementing MoonFinder is supplied as an R package in the supplementary material. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteínas/genética , ARN Largo no Codificante/genética , Exones , Genómica , Humanos , Análisis de Secuencia de ARN/métodos
16.
J Proteome Res ; 16(8): 3019-3029, 2017 08 04.
Artículo en Inglés | MEDLINE | ID: mdl-28707887

RESUMEN

Spatial-temporal regulation among proteins forms dynamic networks in cells. Coexistence in common cell compartments can improve biological reliability of the protein-protein interactions. However, this is usually overlooked by most proteomic studies and leads to unrealistic discoveries. In this paper, we systematically characterize the interaction localization diversity in the human protein interactome using the localization coefficient, a novel metric proposed for assessing how diversely the interactions localize among cell compartments. Our analysis reveals the following: (1) the subcellular networks of the nucleus, cytosol, and mitochondrion are dense but the interactions tend to localize in specific cell compartments, whereas the subnetworks of the secretory-pathway, membrane, and extracellular region are sparse but the interactions are diversely localized; (2) the housekeeping proteins tend to appear in multiple compartments, while the tissue-specific proteins present a relatively flat profile of localization breadth; (3) the autophagy proteins tend to diversely localize in multiple compartments, especially those with high connectivity, compared with the apoptosis proteins; (4) the proteins targeted by small-molecule drugs show no preference for compartments, whereas the proteins directed by antibody-based drugs tend to belong to transmembrane regions with a strong diversity. In summary, our analysis provides a comprehensive view of the subcellular localization for interacting proteins, demonstrates that localization diversity is an important feature of protein interactions, and shows its ability to highlight meaningful biological functions.


Asunto(s)
Compartimento Celular , Mapas de Interacción de Proteínas/fisiología , Proteoma/análisis , Fracciones Subcelulares/química , Humanos , Espacio Intracelular/química , Mapeo de Interacción de Proteínas , Proteómica/métodos , Análisis Espacio-Temporal , Fracciones Subcelulares/fisiología
17.
J Pharmacol Sci ; 135(3): 114-120, 2017 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-29132796

RESUMEN

Hyperuricemia, a long-term purine metabolic disorder, is a well-known risk factor for gout, hypertension and diabetes. In maintaining normal whole-body purine levels, xanthine oxidase (XOD) is a key enzyme in the purine metabolic pathway, as it catalyzes the oxidation of hypoxanthine to xanthine and finally to uric acid. Here we used the protein-ligand docking software idock to virtually screen potential XOD inhibitors from 3167 approved small compounds/drugs. The inhibitory activities of the ten compounds with the highest scores were tested on XOD in vitro. Interestingly, all the ten compounds inhibited the activity of XOD at certain degrees. Particularly, the anti-ulcerative-colitis drug olsalazine sodium demonstrated a great inhibitory activity for XOD (IC50 = 3.4 mg/L). Enzymatic kinetic studies revealed that the drug was a hybrid-type inhibitor of xanthine oxidase. Furthermore, the drug strikingly decreased serum urate levels, serum/hepatic activities of XOD at a dose-dependent manner in vivo. Thus, we demonstrated a successful hunting process of compounds/drugs for hyperuricemia through virtual screening, supporting a potential usage of olsalazine sodium in the treatment of hyperuricemia.


Asunto(s)
Ácidos Aminosalicílicos/farmacología , Antiulcerosos/farmacología , Ácido Úrico/sangre , Xantina Deshidrogenasa/antagonistas & inhibidores , Xantina Deshidrogenasa/metabolismo , Ácidos Aminosalicílicos/uso terapéutico , Animales , Relación Dosis-Respuesta a Droga , Evaluación Preclínica de Medicamentos , Hiperuricemia/tratamiento farmacológico , Técnicas In Vitro , Masculino , Ratones , Relación Estructura-Actividad
18.
Nucleic Acids Res ; 43(Database issue): D578-82, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25274736

RESUMEN

Increasing evidence reveals that diverse non-coding RNAs (ncRNAs) play critically important roles in viral infection. Viruses can use diverse ncRNAs to manipulate both cellular and viral gene expression to establish a host environment conducive to the completion of the viral life cycle. Many host cellular ncRNAs can also directly or indirectly influence viral replication and even target virus genomes. ViRBase (http://www.rna-society.org/virbase) aims to provide the scientific community with a resource for efficient browsing and visualization of virus-host ncRNA-associated interactions and interaction networks in viral infection. The current version of ViRBase documents more than 12,000 viral and cellular ncRNA-associated virus-virus, virus-host, host-virus and host-host interactions involving more than 460 non-redundant ncRNAs and 4400 protein-coding genes from between more than 60 viruses and 20 hosts. Users can query, browse and manipulate these virus-host ncRNA-associated interactions. ViRBase will be of help in uncovering the generic organizing principles of cellular virus-host ncRNA-associated interaction networks in viral infection.


Asunto(s)
Bases de Datos Genéticas , ARN no Traducido/metabolismo , Virosis/genética , Virosis/virología , Sitios de Unión , Internet , Proteínas/metabolismo , Virosis/metabolismo , Virus/metabolismo
19.
Sensors (Basel) ; 18(1)2017 Dec 22.
Artículo en Inglés | MEDLINE | ID: mdl-29271952

RESUMEN

Urban air pollution has caused public concern globally because it seriously affects human life. Modern monitoring systems providing pollution information with high spatio-temporal resolution have been developed to identify personal exposures. However, these systems' hardware specifications and configurations are usually fixed according to the applications. They can be inconvenient to maintain, and difficult to reconfigure and expand with respect to sensing capabilities. This paper aims at tackling these issues by adopting the proposed Modular Sensor System (MSS) architecture and Universal Sensor Interface (USI), and modular design in a sensor node. A compact MSS sensor node is implemented and evaluated. It has expandable sensor modules with plug-and-play feature and supports multiple Wireless Sensor Networks (WSNs). Evaluation results show that MSS sensor nodes can easily fit in different scenarios, adapt to reconfigurations dynamically, and detect low concentration air pollution with high energy efficiency and good data accuracy. We anticipate that the efforts on system maintenance, adaptation, and evolution can be significantly reduced when deploying the system in the field.


Asunto(s)
Contaminación del Aire/análisis , Computadores , Tecnología Inalámbrica
20.
BMC Bioinformatics ; 17(Suppl 11): 308, 2016 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-28185549

RESUMEN

BACKGROUND: Pose generation error is usually quantified as the difference between the geometry of the pose generated by the docking software and that of the same molecule co-crystallised with the considered protein. Surprisingly, the impact of this error on binding affinity prediction is yet to be systematically analysed across diverse protein-ligand complexes. RESULTS: Against commonly-held views, we have found that pose generation error has generally a small impact on the accuracy of binding affinity prediction. This is also true for large pose generation errors and it is not only observed with machine-learning scoring functions, but also with classical scoring functions such as AutoDock Vina. Furthermore, we propose a procedure to correct a substantial part of this error which consists of calibrating the scoring functions with re-docked, rather than co-crystallised, poses. In this way, the relationship between Vina-generated protein-ligand poses and their binding affinities is directly learned. As a result, test set performance after this error-correcting procedure is much closer to that of predicting the binding affinity in the absence of pose generation error (i.e. on crystal structures). We evaluated several strategies, obtaining better results for those using a single docked pose per ligand than those using multiple docked poses per ligand. CONCLUSIONS: Binding affinity prediction is often carried out on the docked pose of a known binder rather than its co-crystallised pose. Our results suggest than pose generation error is in general far less damaging for binding affinity prediction than it is currently believed. Another contribution of our study is the proposal of a procedure that largely corrects for this error. The resulting machine-learning scoring function is freely available at http://istar.cse.cuhk.edu.hk/rf-score-4.tgz and http://ballester.marseille.inserm.fr/rf-score-4.tgz .


Asunto(s)
Simulación del Acoplamiento Molecular/normas , Proteínas Nucleares/metabolismo , Pirazinas/metabolismo , Programas Informáticos , Factores de Transcripción/metabolismo , Humanos , Ligandos , Proteínas Nucleares/química , Unión Proteica , Conformación Proteica , Pirazinas/química , Factores de Transcripción/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA