Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38754408

RESUMEN

MOTIVATION: The technology for analyzing single-cell multi-omics data has advanced rapidly and has provided comprehensive and accurate cellular information by exploring cell heterogeneity in genomics, transcriptomics, epigenomics, metabolomics and proteomics data. However, because of the high-dimensional and sparse characteristics of single-cell multi-omics data, as well as the limitations of various analysis algorithms, the clustering performance is generally poor. Matrix factorization is an unsupervised, dimensionality reduction-based method that can cluster individuals and discover related omics variables from different blocks. Here, we present a novel algorithm that performs joint dimensionality reduction learning and cell clustering analysis on single-cell multi-omics data using non-negative matrix factorization that we named scMNMF. We formulate the objective function of joint learning as a constrained optimization problem and derive the corresponding iterative formulas through alternating iterative algorithms. The major advantage of the scMNMF algorithm remains its capability to explore hidden related features among omics data. Additionally, the feature selection for dimensionality reduction and cell clustering mutually influence each other iteratively, leading to a more effective discovery of cell types. We validated the performance of the scMNMF algorithm using two simulated and five real datasets. The results show that scMNMF outperformed seven other state-of-the-art algorithms in various measurements. AVAILABILITY AND IMPLEMENTATION: scMNMF code can be found at https://github.com/yushanqiu/scMNMF.


Asunto(s)
Algoritmos , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Humanos , Genómica/métodos , Biología Computacional/métodos , Proteómica/métodos , Metabolómica/métodos , Epigenómica/métodos , Multiómica
2.
Brief Bioinform ; 24(6)2023 Sep 22.
Artículo en Inglés | MEDLINE | ID: mdl-37965808

RESUMEN

Spatial transcriptomics is a rapidly growing field that aims to comprehensively characterize tissue organization and architecture at single-cell or sub-cellular resolution using spatial information. Such techniques provide a solid foundation for the mechanistic understanding of many biological processes in both health and disease that cannot be obtained using traditional technologies. Several methods have been proposed to decipher the spatial context of spots in tissue using spatial information. However, when spatial information and gene expression profiles are integrated, most methods only consider the local similarity of spatial information. As they do not consider the global semantic structure, spatial domain identification methods encounter poor or over-smoothed clusters. We developed ConSpaS, a novel node representation learning framework that precisely deciphers spatial domains by integrating local and global similarities based on graph autoencoder (GAE) and contrastive learning (CL). The GAE effectively integrates spatial information using local similarity and gene expression profiles, thereby ensuring that cluster assignment is spatially continuous. To improve the characterization of the global similarity of gene expression data, we adopt CL to consider the global semantic information. We propose an augmentation-free mechanism to construct global positive samples and use a semi-easy sampling strategy to define negative samples. We validated ConSpaS on multiple tissue types and technology platforms by comparing it with existing typical methods. The experimental results confirmed that ConSpaS effectively improved the identification accuracy of spatial domains with biologically meaningful spatial patterns, and denoised gene expression data while maintaining the spatial expression pattern. Furthermore, our proposed method better depicted the spatial trajectory by integrating local and global similarities.


Asunto(s)
Perfilación de la Expresión Génica , Aprendizaje , Prueba de Histocompatibilidad , Semántica
3.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37122068

RESUMEN

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) technology attracts extensive attention in the biomedical field. It can be used to measure gene expression and analyze the transcriptome at the single-cell level, enabling the identification of cell types based on unsupervised clustering. Data imputation and dimension reduction are conducted before clustering because scRNA-seq has a high 'dropout' rate, noise and linear inseparability. However, independence of dimension reduction, imputation and clustering cannot fully characterize the pattern of the scRNA-seq data, resulting in poor clustering performance. Herein, we propose a novel and accurate algorithm, SSNMDI, that utilizes a joint learning approach to simultaneously perform imputation, dimensionality reduction and cell clustering in a non-negative matrix factorization (NMF) framework. In addition, we integrate the cell annotation as prior information, then transform the joint learning into a semi-supervised NMF model. Through experiments on 14 datasets, we demonstrate that SSNMDI has a faster convergence speed, better dimensionality reduction performance and a more accurate cell clustering performance than previous methods, providing an accurate and robust strategy for analyzing scRNA-seq data. Biological analysis are also conducted to validate the biological significance of our method, including pseudotime analysis, gene ontology and survival analysis. We believe that we are among the first to introduce imputation, partial label information, dimension reduction and clustering to the single-cell field. AVAILABILITY AND IMPLEMENTATION: The source code for SSNMDI is available at https://github.com/yushanqiu/SSNMDI.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de Expresión Génica de una Sola Célula , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Análisis por Conglomerados
4.
Bioinformatics ; 40(5)2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38684178

RESUMEN

MOTIVATION: Continuous advancements in single-cell RNA sequencing (scRNA-seq) technology have enabled researchers to further explore the study of cell heterogeneity, trajectory inference, identification of rare cell types, and neurology. Accurate scRNA-seq data clustering is crucial in single-cell sequencing data analysis. However, the high dimensionality, sparsity, and presence of "false" zero values in the data can pose challenges to clustering. Furthermore, current unsupervised clustering algorithms have not effectively leveraged prior biological knowledge, making cell clustering even more challenging. RESULTS: This study investigates a semisupervised clustering model called scTPC, which integrates the triplet constraint, pairwise constraint, and cross-entropy constraint based on deep learning. Specifically, the model begins by pretraining a denoising autoencoder based on a zero-inflated negative binomial distribution. Deep clustering is then performed in the learned latent feature space using triplet constraints and pairwise constraints generated from partial labeled cells. Finally, to address imbalanced cell-type datasets, a weighted cross-entropy loss is introduced to optimize the model. A series of experimental results on 10 real scRNA-seq datasets and five simulated datasets demonstrate that scTPC achieves accurate clustering with a well-designed framework. AVAILABILITY AND IMPLEMENTATION: scTPC is a Python-based algorithm, and the code is available from https://github.com/LF-Yang/Code or https://zenodo.org/records/10951780.


Asunto(s)
Algoritmos , Análisis de la Célula Individual , Análisis de la Célula Individual/métodos , Análisis por Conglomerados , Humanos , Análisis de Secuencia de ARN/métodos , RNA-Seq/métodos , Aprendizaje Profundo , Programas Informáticos , Análisis de Expresión Génica de una Sola Célula
5.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35437603

RESUMEN

Each type of cancer usually has several subtypes with distinct clinical implications, and therefore the discovery of cancer subtypes is an important and urgent task in disease diagnosis and therapy. Using single-omics data to predict cancer subtypes is difficult because genomes are dysregulated and complicated by multiple molecular mechanisms, and therefore linking cancer genomes to cancer phenotypes is not an easy task. Using multi-omics data to effectively predict cancer subtypes is an area of much interest; however, integrating multi-omics data is challenging. Here, we propose a novel method of multi-omics data integration for clustering to identify cancer subtypes (MDICC) that integrates new affinity matrix and network fusion methods. Our experimental results show the effectiveness and generalization of the proposed MDICC model in identifying cancer subtypes, and its performance was better than those of currently available state-of-the-art clustering methods. Furthermore, the survival analysis demonstrates that MDICC delivered comparable or even better results than many typical integrative methods.


Asunto(s)
Neoplasias , Análisis por Conglomerados , Humanos , Neoplasias/genética , Análisis de Supervivencia
6.
PLoS Comput Biol ; 19(3): e1010939, 2023 03.
Artículo en Inglés | MEDLINE | ID: mdl-36930678

RESUMEN

During breast cancer metastasis, the developmental process epithelial-mesenchymal (EM) transition is abnormally activated. Transcriptional regulatory networks controlling EM transition are well-studied; however, alternative RNA splicing also plays a critical regulatory role during this process. Alternative splicing was proved to control the EM transition process, and RNA-binding proteins were determined to regulate alternative splicing. A comprehensive understanding of alternative splicing and the RNA-binding proteins that regulate it during EM transition and their dynamic impact on breast cancer remains largely unknown. To accurately study the dynamic regulatory relationships, time-series data of the EM transition process are essential. However, only cross-sectional data of epithelial and mesenchymal specimens are available. Therefore, we developed a pseudotemporal causality-based Bayesian (PCB) approach to infer the dynamic regulatory relationships between alternative splicing events and RNA-binding proteins. Our study sheds light on facilitating the regulatory network-based approach to identify key RNA-binding proteins or target alternative splicing events for the diagnosis or treatment of cancers. The data and code for PCB are available at: http://hkumath.hku.hk/~wkc/PCB(data+code).zip.


Asunto(s)
Neoplasias de la Mama , Humanos , Femenino , Neoplasias de la Mama/metabolismo , Teorema de Bayes , Estudios Transversales , Línea Celular Tumoral , Procesos Neoplásicos , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo , Empalme Alternativo/genética , Transición Epitelial-Mesenquimal/genética
7.
Brief Bioinform ; 22(6)2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34410342

RESUMEN

MOTIVATION: The epithelial-mesenchymal transition (EMT) is a cellular-developmental process activated during tumor metastasis. Transcriptional regulatory networks controlling EMT are well studied; however, alternative RNA splicing also plays a critical regulatory role during this process. Unfortunately, a comprehensive understanding of alternative splicing (AS) and the RNA-binding proteins (RBPs) that regulate it during EMT remains largely unknown. Therefore, a great need exists to develop effective computational methods for predicting associations of RBPs and AS events. Dramatically increasing data sources that have direct and indirect information associated with RBPs and AS events have provided an ideal platform for inferring these associations. RESULTS: In this study, we propose a novel method for RBP-AS target prediction based on weighted data fusion with sparse matrix tri-factorization (WDFSMF in short) that simultaneously decomposes heterogeneous data source matrices into low-rank matrices to reveal hidden associations. WDFSMF can select and integrate data sources by assigning different weights to those sources, and these weights can be assigned automatically. In addition, WDFSMF can identify significant RBP complexes regulating AS events and eliminate noise and outliers from the data. Our proposed method achieves an area under the receiver operating characteristic curve (AUC) of $90.78\%$, which shows that WDFSMF can effectively predict RBP-AS event associations with higher accuracy compared with previous methods. Furthermore, this study identifies significant RBPs as complexes for AS events during EMT and provides solid ground for further investigation into RNA regulation during EMT and metastasis. WDFSMF is a general data fusion framework, and as such it can also be adapted to predict associations between other biological entities.


Asunto(s)
Empalme Alternativo , Biología Computacional/métodos , Transición Epitelial-Mesenquimal/genética , Regulación Neoplásica de la Expresión Génica , Proteínas de Unión al ARN/metabolismo , Algoritmos , Biología Computacional/normas , Humanos , Curva ROC , Reproducibilidad de los Resultados , Programas Informáticos
8.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33517359

RESUMEN

MOTIVATION: The developmental process of epithelial-mesenchymal transition (EMT) is abnormally activated during breast cancer metastasis. Transcriptional regulatory networks that control EMT have been well studied; however, alternative RNA splicing plays a vital regulatory role during this process and the regulating mechanism needs further exploration. Because of the huge cost and complexity of biological experiments, the underlying mechanisms of alternative splicing (AS) and associated RNA-binding proteins (RBPs) that regulate the EMT process remain largely unknown. Thus, there is an urgent need to develop computational methods for predicting potential RBP-AS event associations during EMT. RESULTS: We developed a novel model for RBP-AS target prediction during EMT that is based on inductive matrix completion (RAIMC). Integrated RBP similarities were calculated based on RBP regulating similarity, and RBP Gaussian interaction profile (GIP) kernel similarity, while integrated AS event similarities were computed based on AS event module similarity and AS event GIP kernel similarity. Our primary objective was to complete missing or unknown RBP-AS event associations based on known associations and on integrated RBP and AS event similarities. In this paper, we identify significant RBPs for AS events during EMT and discuss potential regulating mechanisms. Our computational results confirm the effectiveness and superiority of our model over other state-of-the-art methods. Our RAIMC model achieved AUC values of 0.9587 and 0.9765 based on leave-one-out cross-validation (CV) and 5-fold CV, respectively, which are larger than the AUC values from the previous models. RAIMC is a general matrix completion framework that can be adopted to predict associations between other biological entities. We further validated the prediction performance of RAIMC on the genes CD44 and MAP3K7. RAIMC can identify the related regulating RBPs for isoforms of these two genes. AVAILABILITY AND IMPLEMENTATION: The source code for RAIMC is available at https://github.com/yushanqiu/RAIMC. CONTACT: zouquan@nclab.net online.


Asunto(s)
Empalme Alternativo , Neoplasias de la Mama , Transición Epitelial-Mesenquimal/genética , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Proteínas de Neoplasias , Proteínas de Unión al ARN , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Femenino , Humanos , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo
9.
RNA ; 26(9): 1257-1267, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-32467311

RESUMEN

During breast cancer metastasis, the developmental process epithelial-mesenchymal transition (EMT) is abnormally activated. Transcriptional regulatory networks controlling EMT are well-studied; however, alternative RNA splicing also plays a critical regulatory role during this process. A comprehensive understanding of alternative splicing (AS) and the RNA binding proteins (RBPs) that regulate it during EMT and their impact on breast cancer remains largely unknown. In this study, we annotated AS in the breast cancer TCGA data set and identified an AS signature that is capable of distinguishing epithelial and mesenchymal states of the tumors. This AS signature contains 25 AS events, among which nine showed increased exon inclusion and 16 showed exon skipping during EMT. This AS signature accurately assigns the EMT status of cells in the CCLE data set and robustly predicts patient survival. We further developed an effective computational method using bipartite networks to identify RBP-AS networks during EMT. This network analysis revealed the complexity of RBP regulation and nominated previously unknown RBPs that regulate EMT-associated AS events. This study highlights the importance of global AS regulation during EMT in cancer progression and paves the way for further investigation into RNA regulation in EMT and metastasis.


Asunto(s)
Empalme Alternativo/genética , Neoplasias de la Mama/genética , Transición Epitelial-Mesenquimal/genética , ARN/genética , Línea Celular Tumoral , Exones/genética , Femenino , Regulación Neoplásica de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Humanos , Células MCF-7 , Proteínas de Unión al ARN/genética
10.
RNA ; 24(10): 1326-1338, 2018 10.
Artículo en Inglés | MEDLINE | ID: mdl-30042172

RESUMEN

The epithelial-mesenchymal transition (EMT) is a fundamental developmental process that is abnormally activated in cancer metastasis. Dynamic changes in alternative splicing occur during EMT. ESRP1 and hnRNPM are splicing regulators that promote an epithelial splicing program and a mesenchymal splicing program, respectively. The functional relationships between these splicing factors in the genome scale remain elusive. Comparing alternative splicing targets of hnRNPM and ESRP1 revealed that they coregulate a set of cassette exon events, with the majority showing discordant splicing regulation. Discordant splicing events regulated by hnRNPM show a positive correlation with splicing during EMT; however, concordant events do not, indicating the role of hnRNPM in regulating alternative splicing during EMT is more complex than previously understood. Motif enrichment analysis near hnRNPM-ESRP1 coregulated exons identifies guanine-uridine rich motifs downstream from hnRNPM-repressed and ESRP1-enhanced exons, supporting a general model of competitive binding to these cis-elements to antagonize alternative splicing. The set of coregulated exons are enriched in genes associated with cell migration and cytoskeletal reorganization, which are pathways associated with EMT. Splicing levels of coregulated exons are associated with breast cancer patient survival and correlate with gene sets involved in EMT and breast cancer subtyping. This study identifies complex modes of interaction between hnRNPM and ESRP1 in regulation of splicing in disease-relevant contexts.


Asunto(s)
Empalme Alternativo , Transición Epitelial-Mesenquimal/genética , Regulación de la Expresión Génica , Ribonucleoproteína Heterogénea-Nuclear Grupo M/metabolismo , Proteínas de Unión al ARN/metabolismo , Sitios de Unión , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/mortalidad , Línea Celular Tumoral , Exones , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Motivos de Nucleótidos , Pronóstico , Unión Proteica , Reproducibilidad de los Resultados
11.
BMC Bioinformatics ; 17 Suppl 7: 240, 2016 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-27454116

RESUMEN

BACKGROUND: Abnormalities in glycan biosynthesis have been conclusively related to various diseases, whereas the complexity of the glycosylation process has impeded the quantitative analysis of biochemical experimental data for the identification of glycoforms contributing to disease. To overcome this limitation, the automatic construction of glycosylation reaction networks in silico is a critical step. RESULTS: In this paper, a framework K2014 is developed to automatically construct N-glycosylation networks in MATLAB with the involvement of the 27 most-known enzyme reaction rules of 22 enzymes, as an extension of previous model KB2005. A toolbox named Glycosylation Network Analysis Toolbox (GNAT) is applied to define network properties systematically, including linkages, stereochemical specificity and reaction conditions of enzymes. Our network shows a strong ability to predict a wider range of glycans produced by the enzymes encountered in the Golgi Apparatus in human cell expression systems. CONCLUSIONS: Our results demonstrate a better understanding of the underlying glycosylation process and the potential of systems glycobiology tools for analyzing conventional biochemical or mass spectrometry-based experimental data quantitatively in a more realistic and practical way.


Asunto(s)
Vías Biosintéticas , Simulación por Computador , Glicómica/métodos , Modelos Biológicos , Polisacáridos/biosíntesis , Glicosilación , Humanos , Hidrolasas/metabolismo , Espectrometría de Masas , Transferasas/metabolismo
12.
Artículo en Inglés | MEDLINE | ID: mdl-38215334

RESUMEN

Clustering is a common technique for statistical data analysis and is essential for developing precision medicine. Numerous computational methods have been proposed for integrating multi-omics data to identify cancer subtypes. However, most existing clustering models based on network fusion fail to preserve the consistency of the distribution of the data before and after fusion. Motivated by this observation, we would like to measure and minimize the distribution difference between networks, which may not be in the same space, to improve the performance of data fusion. We were therefore motivated to develop a flexible clustering model, based on network fusion, that minimizes the distribution difference between the data before and after fusion by co-regularization; the model can be applied to both single- and multi-omics data. We propose a new network fusion model for single- and multi-omics data clustering for identifying cancer or cell subtypes based on co-regularized network fusion (SMCC). SMCC integrates low-rank subspace representation and entropy to fuse networks. In addition, it measures and minimizes the distribution difference between the similarity networks and the fusion network by co-regularization. The model can both reduce the noise interference in the source data and make the statistical characteristics of the fusion result closer to those of the source data. We evaluated the clustering performance of SMCC across 16 real single- and multi-omics dataset. The experimental results demonstrated that SMCC is superior to 17 state-of-the-art clustering methods. Moreover, it is effective for identifying cancer or cell subtypes, thereby promoting the development of precision medicine.

13.
IEEE/ACM Trans Comput Biol Bioinform ; 20(2): 1431-1444, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37815942

RESUMEN

Advances in single-cell RNA sequencing (scRNA-seq) technology provide an unbiased and high-throughput analysis of each cell at single-cell resolution, and further facilitate the development of cellular heterogeneity analysis. Despite the promise of scRNA-seq, the data generated by this method are sparse and noisy because of the presence of dropout events, which can greatly impact downstream analyses such as differential gene expression, cell type annotation, and linage trajectory reconstruction. The development of effective and robust computational methods to address both dropout and clustering are thus urgently needed. In this study, we propose a flexible, accurate two-stage algorithm for single cell heterogeneity analysis via hierarchical clustering based on an optimal imputation strategy, called scHOIS. At the first stage, masked non-negative matrix factorization is applied to approximate the original observed scRNA-seq data, with optimal rank determined by variance analysis. At the second stage, hierarchical clustering is applied to group the imputed cells using Pearson correlation to measure similarity, with the optimal number of clusters determined by integrating three classical indexes. We performed extensive experiments on real-world datasets, which showed that scHOIS effectively and robustly distinguished cellular differences and that the clustering performance of this algorithm was superior to that of other state-of-the-art methods.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica , Análisis de Secuencia de ARN/métodos , Análisis por Conglomerados , Análisis de la Célula Individual/métodos
14.
iScience ; 26(4): 106517, 2023 Apr 21.
Artículo en Inglés | MEDLINE | ID: mdl-37123236

RESUMEN

Epithelial-to-mesenchymal transition (EMT) is the underlying mechanism for tumor metastasis and shows the metastatic potential of tumor cells. Although the transcriptional regulation of EMT has been well studied, the role of alternative splicing (AS) regulation in EMT remains largely uncharacterized. The rapid accumulation of RNA-seq datasets has provided the opportunities for developing computational methods to associate mRNA isoform variations with EMT. In this study, we propose regularization models to identify significant AS events during EMT. Our experimental results confirm that the predicted AS events are closely related to apoptosis, focal adhesion-invadopodium shift and tight junction formation that are essential during EMT. Therefore, our study highlights the broad role of posttranscriptional regulation during EMT and identifies key subsets of AS events serving as distinct regulatory nodes.

15.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2714-2723, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-32386162

RESUMEN

Clustering tumor metastasis samples from gene expression data at the whole genome level remains an arduous challenge, in particular, when the number of experimental samples is small and the number of genes is huge. We focus on the prediction of the epithelial-mesenchymal transition (EMT), which is an underlying mechanism of tumor metastasis, here, rather than tumor metastasis itself, to avoid confounding effects of uncertainties derived from various factors. In this paper, we propose a novel model in predicting EMT based on multidimensional scaling (MDS) strategies and integrating entropy and random matrix detection strategies to determine the optimal reduced number of dimension in low dimensional space. We verified our proposed model with the gene expression data for EMT samples of breast cancer and the experimental results demonstrated the superiority over state-of-the-art clustering methods. Furthermore, we developed a novel feature extraction method for selecting the significant genes and predicting the tumor metastasis. The source code is available at "https://github.com/yushanqiu/yushan.qiu-szu.edu.cn".


Asunto(s)
Biología Computacional/métodos , Transición Epitelial-Mesenquimal/genética , Análisis de Escalamiento Multidimensional , Aprendizaje Automático no Supervisado , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Análisis por Conglomerados , Femenino , Humanos , Metástasis de la Neoplasia/genética , Transcriptoma/genética
16.
Artículo en Inglés | MEDLINE | ID: mdl-29994681

RESUMEN

The identification of drug side-effects is considered to be an important step in drug design, which could not only shorten the time but also reduce the cost of drug development. In this paper, we investigate the relationship between the potential side-effects of drug candidates and their chemical structures. The preliminary Regularized Regression (RR) model for drug side-effects prediction has promising features in the efficiency of model training and the existence of a closed form solution. It performs better than other state-of-the-art methods, in terms of minimum accuracy and average accuracy. In order to dig inside how drug structure will associate with side effect, we further propose weighted GTS (Generalized T-Student Kernel: WGTS) SVM model from a structural risk minimization perspective. The SVM model proposed in this paper provides a better understanding of drug side-effects in the process of drug development. The usefulness of the WGTS model lies in the superior performance in a cross validation setting on 888 approved drugs with 1385 side-effects profiling from SIDER database. This work is expected to shed light on intriguing studies that predict potential un-identifying side-effects and suggest how we can avoid drug side-effects by the removal of some distinguished chemical structures.


Asunto(s)
Biología Computacional/métodos , Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Modelos Estadísticos , Preparaciones Farmacéuticas/química , Humanos , Estructura Molecular , Análisis de Regresión , Máquina de Vectores de Soporte
17.
Artif Intell Med ; 95: 96-103, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30352711

RESUMEN

Identifying tumor metastasis signatures from gene expression data at the whole genome level remains an arduous challenge, particularly so when the number of genes is huge and the number of experimental samples is small. We focus on the prediction of the epithelial-mesenchymal transition (EMT), which is an underlying mechanism of tumor metastasis, here, rather than on tumor metastasis itself, to avoid confounding effects of uncertainties derived from various factors. We apply an extended LASSO model, L1/2-regularization model, as a feature selector, to identify significant RNA-binding proteins (RBPs) that contribute to regulating the EMT. We find that the L1/2-regularization model significantly outperforms LASSO in the EMT regulation problem. Furthermore, remarkable improvement in L1/2-regularization model classification performance can be achieved by incorporating extra information, specifically correlation values. We demonstrate that the L1/2-regularization model is applicable for identifying significant RBPs in biological research. Identified RBPs will facilitate study of the underlying mechanisms of the EMT.


Asunto(s)
Transición Epitelial-Mesenquimal , Proteínas de Unión al ARN/fisiología , Algoritmos , Línea Celular Tumoral , Humanos , Modelos Biológicos
18.
IEEE Access ; 7: 127745-127753, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-33598376

RESUMEN

Boolean Network (BN) is a simple and popular mathematical model that has attracted significant attention from systems biology due to its capacity to reveal genetic regulatory network behavior. In addition, observability, as an important network feature, plays a vital role in deciphering the underlying mechanisms driving a genetic regulatory network and has been widely investigated. Prior studies examined observability of BNs and other complex networks. That said, observability of attractor, which can serve as a biomarker for disease, has not been fully examined in the literature. In this study, we formulated a new definition for singleton or cyclic attractor observability in BNs and developed an effective methodology to resolve the captured problem. We also showed complexity is of O(Pmn), when the maximal period of cyclic attractor is P, the number of attractor is m and the number of genes is n. Importantly, we have confirmed our method can faithfully predict the expression pattern of segment polarity genes in Drosophila melanogaster and showed it can effectively and efficiently deal with the captured observability problem.

19.
BMC Syst Biol ; 12(Suppl 1): 7, 2018 04 11.
Artículo en Inglés | MEDLINE | ID: mdl-29671395

RESUMEN

BACKGROUND: Traditional drug discovery methods focused on the efficacy of drugs rather than their toxicity. However, toxicity and/or lack of efficacy are produced when unintended targets are affected in metabolic networks. Thus, identification of biological targets which can be manipulated to produce the desired effect with minimum side-effects has become an important and challenging topic. Efficient computational methods are required to identify the drug targets while incurring minimal side-effects. RESULTS: In this paper, we propose a graph-based computational damage model that summarizes the impact of enzymes on compounds in metabolic networks. An efficient method based on Integer Linear Programming formalism is then developed to identify the optimal enzyme-combination so as to minimize the side-effects. The identified target enzymes for known successful drugs are then verified by comparing the results with those in the existing literature. CONCLUSIONS: Side-effects reduction plays a crucial role in the study of drug development. A graph-based computational damage model is proposed and the theoretical analysis states the captured problem is NP-completeness. The proposed approaches can therefore contribute to the discovery of drug targets. Our developed software is available at " http://hkumath.hku.hk/~wkc/APBC2018-metabolic-network.zip ".


Asunto(s)
Biología Computacional/métodos , Redes y Vías Metabólicas , Programación Lineal , Algoritmos , Gráficos por Computador , Descubrimiento de Drogas
20.
IET Syst Biol ; 11(1): 30-35, 2017 02.
Artículo en Inglés | MEDLINE | ID: mdl-28303791

RESUMEN

Boolean network (BN) is a popular mathematical model for revealing the behaviour of a genetic regulatory network. Furthermore, observability, an important network feature, plays a significant role in understanding the underlying network. Several studies have been done on analysis of observability of BNs and complex networks. However, the observability of attractor cycles, which can serve as biomarker detection, has not yet been addressed in the literature. This is an important, interesting and challenging problem that deserves a detailed study. In this study, a novel problem was first proposed on attractor observability in BNs. Identification of the minimum set of consecutive nodes can be used to discriminate different attractors. Furthermore, it can serve as a biomarker for different disease types (represented as different attractor cycles). Then a novel integer programming method was developed to identify the desired set of nodes. The proposed approach is demonstrated and verified by numerical examples. The computational results further illustrates that the proposed model is effective and efficient.


Asunto(s)
Regulación de la Expresión Génica/genética , Redes Reguladoras de Genes/genética , Modelos Genéticos , Modelos Estadísticos , Proteoma/genética , Transducción de Señal/genética , Algoritmos , Simulación por Computador , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA