Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 47
Filtrar
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38701413

RESUMO

With the emergence of large amount of single-cell RNA sequencing (scRNA-seq) data, the exploration of computational methods has become critical in revealing biological mechanisms. Clustering is a representative for deciphering cellular heterogeneity embedded in scRNA-seq data. However, due to the diversity of datasets, none of the existing single-cell clustering methods shows overwhelming performance on all datasets. Weighted ensemble methods are proposed to integrate multiple results to improve heterogeneity analysis performance. These methods are usually weighted by considering the reliability of the base clustering results, ignoring the performance difference of the same base clustering on different cells. In this paper, we propose a high-order element-wise weighting strategy based self-representative ensemble learning framework: scEWE. By assigning different base clustering weights to individual cells, we construct and optimize the consensus matrix in a careful and exquisite way. In addition, we extracted the high-order information between cells, which enhanced the ability to represent the similarity relationship between cells. scEWE is experimentally shown to significantly outperform the state-of-the-art methods, which strongly demonstrates the effectiveness of the method and supports the potential applications in complex single-cell data analytical problems.


Assuntos
Análise de Sequência de RNA , Análise de Célula Única , Análise de Célula Única/métodos , Análise por Conglomerados , Análise de Sequência de RNA/métodos , Algoritmos , Biologia Computacional/métodos , Humanos , RNA-Seq/métodos
2.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37864293

RESUMO

Inference of gene regulatory network (GRN) from gene expression profiles has been a central problem in systems biology and bioinformatics in the past decades. The tremendous emergency of single-cell RNA sequencing (scRNA-seq) data brings new opportunities and challenges for GRN inference: the extensive dropouts and complicated noise structure may also degrade the performance of contemporary gene regulatory models. Thus, there is an urgent need to develop more accurate methods for gene regulatory network inference in single-cell data while considering the noise structure at the same time. In this paper, we extend the traditional structural equation modeling (SEM) framework by considering a flexible noise modeling strategy, namely we use the Gaussian mixtures to approximate the complex stochastic nature of a biological system, since the Gaussian mixture framework can be arguably served as a universal approximation for any continuous distributions. The proposed non-Gaussian SEM framework is called NG-SEM, which can be optimized by iteratively performing Expectation-Maximization algorithm and weighted least-squares method. Moreover, the Akaike Information Criteria is adopted to select the number of components of the Gaussian mixture. To probe the accuracy and stability of our proposed method, we design a comprehensive variate of control experiments to systematically investigate the performance of NG-SEM under various conditions, including simulations and real biological data sets. Results on synthetic data demonstrate that this strategy can improve the performance of traditional Gaussian SEM model and results on real biological data sets verify that NG-SEM outperforms other five state-of-the-art methods.


Assuntos
Redes Reguladoras de Genes , Análise da Expressão Gênica de Célula Única , Análise de Classes Latentes , Algoritmos , Biologia Computacional/métodos
3.
Bioinformatics ; 39(7)2023 07 01.
Artigo em Inglês | MEDLINE | ID: mdl-37382572

RESUMO

MOTIVATION: Simultaneous profiling of multi-omics single-cell data represents exciting technological advancements for understanding cellular states and heterogeneity. Cellular indexing of transcriptomes and epitopes by sequencing allowed for parallel quantification of cell-surface protein expression and transcriptome profiling in the same cells; methylome and transcriptome sequencing from single cells allows for analysis of transcriptomic and epigenomic profiling in the same individual cells. However, effective integration method for mining the heterogeneity of cells over the noisy, sparse, and complex multi-modal data is in growing need. RESULTS: In this article, we propose a multi-modal high-order neighborhood Laplacian matrix optimization framework for integrating the multi-omics single-cell data: scHoML. Hierarchical clustering method was presented for analyzing the optimal embedding representation and identifying cell clusters in a robust manner. This novel method by integrating high-order and multi-modal Laplacian matrices would robustly represent the complex data structures and allow for systematic analysis at the multi-omics single-cell level, thus promoting further biological discoveries. AVAILABILITY AND IMPLEMENTATION: Matlab code is available at https://github.com/jianghruc/scHoML.


Assuntos
Algoritmos , Multiômica , Perfilação da Expressão Gênica , Transcriptoma , Análise por Conglomerados , Análise de Célula Única
4.
Bioinformatics ; 39(5)2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-37079737

RESUMO

MOTIVATION: From a systematic perspective, it is crucial to infer and analyze gene regulatory network (GRN) from high-throughput single-cell RNA sequencing data. However, most existing GRN inference methods mainly focus on the network topology, only few of them consider how to explicitly describe the updated logic rules of regulation in GRNs to obtain their dynamics. Moreover, some inference methods also fail to deal with the over-fitting problem caused by the noise in time series data. RESULTS: In this article, we propose a novel embedded Boolean threshold network method called LogBTF, which effectively infers GRN by integrating regularized logistic regression and Boolean threshold function. First, the continuous gene expression values are converted into Boolean values and the elastic net regression model is adopted to fit the binarized time series data. Then, the estimated regression coefficients are applied to represent the unknown Boolean threshold function of the candidate Boolean threshold network as the dynamical equations. To overcome the multi-collinearity and over-fitting problems, a new and effective approach is designed to optimize the network topology by adding a perturbation design matrix to the input data and thereafter setting sufficiently small elements of the output coefficient vector to zeros. In addition, the cross-validation procedure is implemented into the Boolean threshold network model framework to strengthen the inference capability. Finally, extensive experiments on one simulated Boolean value dataset, dozens of simulation datasets, and three real single-cell RNA sequencing datasets demonstrate that the LogBTF method can infer GRNs from time series data more accurately than some other alternative methods for GRN inference. AVAILABILITY AND IMPLEMENTATION: The source data and code are available at https://github.com/zpliulab/LogBTF.


Assuntos
Algoritmos , Redes Reguladoras de Genes , Fatores de Tempo , Simulação por Computador , Expressão Gênica
5.
PLoS Comput Biol ; 19(3): e1010939, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36930678

RESUMO

During breast cancer metastasis, the developmental process epithelial-mesenchymal (EM) transition is abnormally activated. Transcriptional regulatory networks controlling EM transition are well-studied; however, alternative RNA splicing also plays a critical regulatory role during this process. Alternative splicing was proved to control the EM transition process, and RNA-binding proteins were determined to regulate alternative splicing. A comprehensive understanding of alternative splicing and the RNA-binding proteins that regulate it during EM transition and their dynamic impact on breast cancer remains largely unknown. To accurately study the dynamic regulatory relationships, time-series data of the EM transition process are essential. However, only cross-sectional data of epithelial and mesenchymal specimens are available. Therefore, we developed a pseudotemporal causality-based Bayesian (PCB) approach to infer the dynamic regulatory relationships between alternative splicing events and RNA-binding proteins. Our study sheds light on facilitating the regulatory network-based approach to identify key RNA-binding proteins or target alternative splicing events for the diagnosis or treatment of cancers. The data and code for PCB are available at: http://hkumath.hku.hk/~wkc/PCB(data+code).zip.


Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/metabolismo , Teorema de Bayes , Estudos Transversais , Linhagem Celular Tumoral , Processos Neoplásicos , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Processamento Alternativo/genética , Transição Epitelial-Mesenquimal/genética
6.
IEEE Trans Neural Netw Learn Syst ; 34(2): 921-931, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-34428155

RESUMO

An autoencoder is a layered neural network whose structure can be viewed as consisting of an encoder, which compresses an input vector to a lower dimensional vector, and a decoder, which transforms the low-dimensional vector back to the original input vector (or one that is very similar). In this article, we explore the compressive power of autoencoders that are Boolean threshold networks by studying the numbers of nodes and layers that are required to ensure that each vector in a given set of distinct input binary vectors is transformed back to its original. We show that for any set of n distinct vectors there exists a seven-layer autoencoder with the optimal compression ratio, (i.e., the size of the middle layer is logarithmic in n ), but that there is a set of n vectors for which there is no three-layer autoencoder with a middle layer of logarithmic size. In addition, we present a kind of tradeoff: if the compression ratio is allowed to be considerably larger than the optimal, then there is a five-layer autoencoder. We also study the numbers of nodes and layers required only for encoding, and the results suggest that the decoding part is the bottleneck of autoencoding. For example, there always is a three-layer Boolean threshold encoder that compresses n vectors into a dimension that is twice the logarithm of n .

7.
Comput Biol Chem ; 100: 107747, 2022 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-35932551

RESUMO

Recently, identifying robust biomarkers or signatures from gene expression profiling data has attracted much attention in computational biomedicine. The successful discovery of biomarkers for complex diseases such as spontaneous preterm birth (SPTB) and high-grade serous ovarian cancer (HGSOC) will be beneficial to reduce the risk of preterm birth and ovarian cancer among women for early detection and intervention. In this paper, we propose a stable machine learning-recursive feature elimination (StabML-RFE for short) strategy for screening robust biomarkers from high-throughput gene expression data. We employ eight popular machine learning methods, namely AdaBoost (AB), Decision Tree (DT), Gradient Boosted Decision Trees (GBDT), Naive Bayes (NB), Neural Network (NNET), Random Forest (RF), Support Vector Machine (SVM) and XGBoost (XGB), to train on all feature genes of training data, apply recursive feature elimination (RFE) to remove the least important features sequentially, and obtain eight gene subsets with feature importance ranking. Then we select the top-ranking features in each ranked subset as the optimal feature subset. We establish a stability metric aggregated with classification performance on test data to assess the robustness of the eight different feature selection techniques. Finally, StabML-RFE chooses the high-frequent features in the subsets of the combination with maximum stability value as robust biomarkers. Particularly, we verify the screened biomarkers not only via internal validation, functional enrichment analysis and literature check, but also via external validation on two real-world SPTB and HGSOC datasets respectively. Obviously, the proposed StabML-RFE biomarker discovery pipeline easily serves as a model for identifying diagnostic biomarkers for other complex diseases from omics data. The source code and data can be found at https://github.com/zpliulab/StabML-RFE.


Assuntos
Neoplasias Ovarianas , Nascimento Prematuro , Algoritmos , Teorema de Bayes , Biomarcadores/metabolismo , Feminino , Expressão Gênica , Humanos , Recém-Nascido , Aprendizado de Máquina , Neoplasias Ovarianas/diagnóstico , Neoplasias Ovarianas/genética , Máquina de Vetores de Suporte
8.
IEEE Trans Neural Netw Learn Syst ; 33(9): 4147-4159, 2022 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33587712

RESUMO

We study the distribution of successor states in Boolean networks (BNs). The state vector y is called a successor of x if y = F(x) holds, where x, y ∈ {0,1}n are state vectors and F is an ordered set of Boolean functions describing the state transitions. This problem is motivated by analyzing how information propagates via hidden layers in Boolean threshold networks (discrete model of neural networks) and is kept or lost during time evolution in BNs. In this article, we measure the distribution via entropy and study how entropy changes via the transition from x to y , assuming that x is given uniformly at random. We focus on BNs consisting of exclusive OR (XOR) functions, canalyzing functions, and threshold functions. As a main result, we show that there exists a BN consisting of d -ary XOR functions, which preserves the entropy if d is odd and , whereas there does not exist such a BN if d is even. We also show that there exists a specific BN consisting of d -ary threshold functions, which preserves the entropy if [Formula: see text]. Furthermore, we theoretically analyze the upper and lower bounds of the entropy for BNs consisting of canalyzing functions and perform computational experiments using BN models of real biological networks.

9.
Sensors (Basel) ; 21(18)2021 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-34577516

RESUMO

Pallet management as a backbone of logistics and supply chain activities is essential to supply chain parties, while a number of regulations, standards and operational constraints are considered in daily operations. In recent years, pallet pooling has been unconventionally advocated to manage pallets in a closed-loop system to enhance the sustainability and operational effectiveness, but pitfalls in terms of service reliability, quality compliance and pallet limitation when using a single service provider may occur. Therefore, this study incorporates a decentralisation mechanism into the pallet management to formulate a technological eco-system for pallet pooling, namely Pallet as a Service (PalletaaS), raised by the foundation of consortium blockchain and Internet of things (IoT). Consortium blockchain is regarded as the blockchain 3.0 to facilitate more industrial applications, except cryptocurrency, and the synergy of integrating a consortium blockchain and IoT is thus investigated. The corresponding layered architecture is proposed to structure the system deployment in the industry, in which the location-inventory-routing problem for pallet pooling is formulated. To demonstrate the values of this study, a case analysis to illustrate the human-computer interaction and pallet pooling operations is conducted. Overall, this study standardises the decentralised pallet management in the closed-loop mechanism, resulting in a constructive impact to sustainable development in the logistics industry.


Assuntos
Blockchain , Internet das Coisas , Humanos , Reprodutibilidade dos Testes
10.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34410342

RESUMO

MOTIVATION: The epithelial-mesenchymal transition (EMT) is a cellular-developmental process activated during tumor metastasis. Transcriptional regulatory networks controlling EMT are well studied; however, alternative RNA splicing also plays a critical regulatory role during this process. Unfortunately, a comprehensive understanding of alternative splicing (AS) and the RNA-binding proteins (RBPs) that regulate it during EMT remains largely unknown. Therefore, a great need exists to develop effective computational methods for predicting associations of RBPs and AS events. Dramatically increasing data sources that have direct and indirect information associated with RBPs and AS events have provided an ideal platform for inferring these associations. RESULTS: In this study, we propose a novel method for RBP-AS target prediction based on weighted data fusion with sparse matrix tri-factorization (WDFSMF in short) that simultaneously decomposes heterogeneous data source matrices into low-rank matrices to reveal hidden associations. WDFSMF can select and integrate data sources by assigning different weights to those sources, and these weights can be assigned automatically. In addition, WDFSMF can identify significant RBP complexes regulating AS events and eliminate noise and outliers from the data. Our proposed method achieves an area under the receiver operating characteristic curve (AUC) of $90.78\%$, which shows that WDFSMF can effectively predict RBP-AS event associations with higher accuracy compared with previous methods. Furthermore, this study identifies significant RBPs as complexes for AS events during EMT and provides solid ground for further investigation into RNA regulation during EMT and metastasis. WDFSMF is a general data fusion framework, and as such it can also be adapted to predict associations between other biological entities.


Assuntos
Processamento Alternativo , Biologia Computacional/métodos , Transição Epitelial-Mesenquimal/genética , Regulação Neoplásica da Expressão Gênica , Proteínas de Ligação a RNA/metabolismo , Algoritmos , Biologia Computacional/normas , Humanos , Curva ROC , Reprodutibilidade dos Testes , Software
11.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33517359

RESUMO

MOTIVATION: The developmental process of epithelial-mesenchymal transition (EMT) is abnormally activated during breast cancer metastasis. Transcriptional regulatory networks that control EMT have been well studied; however, alternative RNA splicing plays a vital regulatory role during this process and the regulating mechanism needs further exploration. Because of the huge cost and complexity of biological experiments, the underlying mechanisms of alternative splicing (AS) and associated RNA-binding proteins (RBPs) that regulate the EMT process remain largely unknown. Thus, there is an urgent need to develop computational methods for predicting potential RBP-AS event associations during EMT. RESULTS: We developed a novel model for RBP-AS target prediction during EMT that is based on inductive matrix completion (RAIMC). Integrated RBP similarities were calculated based on RBP regulating similarity, and RBP Gaussian interaction profile (GIP) kernel similarity, while integrated AS event similarities were computed based on AS event module similarity and AS event GIP kernel similarity. Our primary objective was to complete missing or unknown RBP-AS event associations based on known associations and on integrated RBP and AS event similarities. In this paper, we identify significant RBPs for AS events during EMT and discuss potential regulating mechanisms. Our computational results confirm the effectiveness and superiority of our model over other state-of-the-art methods. Our RAIMC model achieved AUC values of 0.9587 and 0.9765 based on leave-one-out cross-validation (CV) and 5-fold CV, respectively, which are larger than the AUC values from the previous models. RAIMC is a general matrix completion framework that can be adopted to predict associations between other biological entities. We further validated the prediction performance of RAIMC on the genes CD44 and MAP3K7. RAIMC can identify the related regulating RBPs for isoforms of these two genes. AVAILABILITY AND IMPLEMENTATION: The source code for RAIMC is available at https://github.com/yushanqiu/RAIMC. CONTACT: zouquan@nclab.net online.


Assuntos
Processamento Alternativo , Neoplasias da Mama , Transição Epitelial-Mesenquimal/genética , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Proteínas de Neoplasias , Proteínas de Ligação a RNA , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Feminino , Humanos , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo
12.
IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2714-2723, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-32386162

RESUMO

Clustering tumor metastasis samples from gene expression data at the whole genome level remains an arduous challenge, in particular, when the number of experimental samples is small and the number of genes is huge. We focus on the prediction of the epithelial-mesenchymal transition (EMT), which is an underlying mechanism of tumor metastasis, here, rather than tumor metastasis itself, to avoid confounding effects of uncertainties derived from various factors. In this paper, we propose a novel model in predicting EMT based on multidimensional scaling (MDS) strategies and integrating entropy and random matrix detection strategies to determine the optimal reduced number of dimension in low dimensional space. We verified our proposed model with the gene expression data for EMT samples of breast cancer and the experimental results demonstrated the superiority over state-of-the-art clustering methods. Furthermore, we developed a novel feature extraction method for selecting the significant genes and predicting the tumor metastasis. The source code is available at "https://github.com/yushanqiu/yushan.qiu-szu.edu.cn".


Assuntos
Biologia Computacional/métodos , Transição Epitelial-Mesenquimal/genética , Análise de Escalonamento Multidimensional , Aprendizado de Máquina não Supervisionado , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Análise por Conglomerados , Feminino , Humanos , Metástase Neoplásica/genética , Transcriptoma/genética
13.
Artigo em Inglês | MEDLINE | ID: mdl-29994681

RESUMO

The identification of drug side-effects is considered to be an important step in drug design, which could not only shorten the time but also reduce the cost of drug development. In this paper, we investigate the relationship between the potential side-effects of drug candidates and their chemical structures. The preliminary Regularized Regression (RR) model for drug side-effects prediction has promising features in the efficiency of model training and the existence of a closed form solution. It performs better than other state-of-the-art methods, in terms of minimum accuracy and average accuracy. In order to dig inside how drug structure will associate with side effect, we further propose weighted GTS (Generalized T-Student Kernel: WGTS) SVM model from a structural risk minimization perspective. The SVM model proposed in this paper provides a better understanding of drug side-effects in the process of drug development. The usefulness of the WGTS model lies in the superior performance in a cross validation setting on 888 approved drugs with 1385 side-effects profiling from SIDER database. This work is expected to shed light on intriguing studies that predict potential un-identifying side-effects and suggest how we can avoid drug side-effects by the removal of some distinguished chemical structures.


Assuntos
Biologia Computacional/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Modelos Estatísticos , Preparações Farmacêuticas/química , Humanos , Estrutura Molecular , Análise de Regressão , Máquina de Vetores de Suporte
14.
Artif Intell Med ; 95: 96-103, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30352711

RESUMO

Identifying tumor metastasis signatures from gene expression data at the whole genome level remains an arduous challenge, particularly so when the number of genes is huge and the number of experimental samples is small. We focus on the prediction of the epithelial-mesenchymal transition (EMT), which is an underlying mechanism of tumor metastasis, here, rather than on tumor metastasis itself, to avoid confounding effects of uncertainties derived from various factors. We apply an extended LASSO model, L1/2-regularization model, as a feature selector, to identify significant RNA-binding proteins (RBPs) that contribute to regulating the EMT. We find that the L1/2-regularization model significantly outperforms LASSO in the EMT regulation problem. Furthermore, remarkable improvement in L1/2-regularization model classification performance can be achieved by incorporating extra information, specifically correlation values. We demonstrate that the L1/2-regularization model is applicable for identifying significant RBPs in biological research. Identified RBPs will facilitate study of the underlying mechanisms of the EMT.


Assuntos
Transição Epitelial-Mesenquimal , Proteínas de Ligação a RNA/fisiologia , Algoritmos , Linhagem Celular Tumoral , Humanos , Modelos Biológicos
15.
J Theor Biol ; 463: 1-11, 2019 02 21.
Artigo em Inglês | MEDLINE | ID: mdl-30543810

RESUMO

It is known that many driver nodes are required to control complex biological networks. Previous studies imply that O(N) driver nodes are required in both linear complex network and Boolean network models with N nodes if an arbitrary state is specified as the target. In order to cope with this intrinsic difficulty, we consider a special case of the control problem in which the targets are restricted to attractors. For this special case, we mathematically prove under the uniform distribution of states in basins that the expected number of driver nodes is only O(log2N+log2M) for controlling Boolean networks, where M is the number of attractors. Since it is expected that M is not very large in many practical networks, the new model requires a much smaller number of driver nodes. This result is based on discovery of novel relationships between control problems on Boolean networks and the coupon collector's problem, a well-known concept in combinatorics. We also provide lower bounds of the number of driver nodes as well as simulation results using artificial and realistic network data, which support our theoretical findings.


Assuntos
Modelos Biológicos , Modelos Teóricos , Algoritmos , Biologia de Sistemas/métodos
16.
BMC Syst Biol ; 12(Suppl 1): 7, 2018 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-29671395

RESUMO

BACKGROUND: Traditional drug discovery methods focused on the efficacy of drugs rather than their toxicity. However, toxicity and/or lack of efficacy are produced when unintended targets are affected in metabolic networks. Thus, identification of biological targets which can be manipulated to produce the desired effect with minimum side-effects has become an important and challenging topic. Efficient computational methods are required to identify the drug targets while incurring minimal side-effects. RESULTS: In this paper, we propose a graph-based computational damage model that summarizes the impact of enzymes on compounds in metabolic networks. An efficient method based on Integer Linear Programming formalism is then developed to identify the optimal enzyme-combination so as to minimize the side-effects. The identified target enzymes for known successful drugs are then verified by comparing the results with those in the existing literature. CONCLUSIONS: Side-effects reduction plays a crucial role in the study of drug development. A graph-based computational damage model is proposed and the theoretical analysis states the captured problem is NP-completeness. The proposed approaches can therefore contribute to the discovery of drug targets. Our developed software is available at " http://hkumath.hku.hk/~wkc/APBC2018-metabolic-network.zip ".


Assuntos
Biologia Computacional/métodos , Redes e Vias Metabólicas , Programação Linear , Algoritmos , Gráficos por Computador , Descoberta de Drogas
17.
IEEE Trans Neural Netw Learn Syst ; 29(4): 869-881, 2018 04.
Artigo em Inglês | MEDLINE | ID: mdl-28129190

RESUMO

This paper studies the problem of exactly identifying the structure of a probabilistic Boolean network (PBN) from a given set of samples, where PBNs are probabilistic extensions of Boolean networks. Cheng et al. studied the problem while focusing on PBNs consisting of pairs of AND/OR functions. This paper considers PBNs consisting of Boolean threshold functions while focusing on those threshold functions that have unit coefficients. The treatment of Boolean threshold functions, and triplets and -tuplets of such functions, necessitates a deepening of the theoretical analyses. It is shown that wide classes of PBNs with such threshold functions can be exactly identified from samples under reasonable constraints, which include: 1) PBNs in which any number of threshold functions can be assigned provided that all have the same number of input variables and 2) PBNs consisting of pairs of threshold functions with different numbers of input variables. It is also shown that the problem of deciding the equivalence of two Boolean threshold functions is solvable in pseudopolynomial time but remains co-NP complete.

18.
IET Syst Biol ; 11(1): 30-35, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-28303791

RESUMO

Boolean network (BN) is a popular mathematical model for revealing the behaviour of a genetic regulatory network. Furthermore, observability, an important network feature, plays a significant role in understanding the underlying network. Several studies have been done on analysis of observability of BNs and complex networks. However, the observability of attractor cycles, which can serve as biomarker detection, has not yet been addressed in the literature. This is an important, interesting and challenging problem that deserves a detailed study. In this study, a novel problem was first proposed on attractor observability in BNs. Identification of the minimum set of consecutive nodes can be used to discriminate different attractors. Furthermore, it can serve as a biomarker for different disease types (represented as different attractor cycles). Then a novel integer programming method was developed to identify the desired set of nodes. The proposed approach is demonstrated and verified by numerical examples. The computational results further illustrates that the proposed model is effective and efficient.


Assuntos
Regulação da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Modelos Genéticos , Modelos Estatísticos , Proteoma/genética , Transdução de Sinais/genética , Algoritmos , Simulação por Computador , Humanos
19.
BMC Syst Biol ; 11(Suppl 7): 138, 2017 12 21.
Artigo em Inglês | MEDLINE | ID: mdl-29322919

RESUMO

BACKGROUND: Breast cancer is one of the leading causes of deaths for women. It is of great necessity to develop effective methods for breast cancer detection and diagnosis. Recent studies have focused on gene-based signatures for outcome predictions. Kernel SVM for its discriminative power in dealing with small sample pattern recognition problems has attracted a lot attention. But how to select or construct an appropriate kernel for a specified problem still needs further investigation. RESULTS: Here we propose a novel kernel (Hadamard Kernel) in conjunction with Support Vector Machines (SVMs) to address the problem of breast cancer outcome prediction using gene expression data. Hadamard Kernel outperform the classical kernels and correlation kernel in terms of Area under the ROC Curve (AUC) values where a number of real-world data sets are adopted to test the performance of different methods. CONCLUSIONS: Hadamard Kernel SVM is effective for breast cancer predictions, either in terms of prognosis or diagnosis. It may benefit patients by guiding therapeutic options. Apart from that, it would be a valuable addition to the current SVM kernel families. We hope it will contribute to the wider biology and related communities.


Assuntos
Neoplasias da Mama/diagnóstico , Biologia Computacional/métodos , Máquina de Vetores de Suporte , Bases de Dados Factuais , Humanos , Prognóstico , Curva ROC
20.
BMC Syst Biol ; 11(Suppl 6): 115, 2017 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-29297357

RESUMO

BACKGROUND: Positive semi-definiteness is a critical property in kernel methods for Support Vector Machine (SVM) by which efficient solutions can be guaranteed through convex quadratic programming. However, a lot of similarity functions in applications do not produce positive semi-definite kernels. METHODS: We propose projection method by constructing projection matrix on indefinite kernels. As a generalization of the spectrum method (denoising method and flipping method), the projection method shows better or comparable performance comparing to the corresponding indefinite kernel methods on a number of real world data sets. Under the Bregman matrix divergence theory, we can find suggested optimal λ in projection method using unconstrained optimization in kernel learning. In this paper we focus on optimal λ determination, in the pursuit of precise optimal λ determination method in unconstrained optimization framework. We developed a perturbed von-Neumann divergence to measure kernel relationships. RESULTS: We compared optimal λ determination with Logdet Divergence and perturbed von-Neumann Divergence, aiming at finding better λ in projection method. Results on a number of real world data sets show that projection method with optimal λ by Logdet divergence demonstrate near optimal performance. And the perturbed von-Neumann Divergence can help determine a relatively better optimal projection method. CONCLUSIONS: Projection method ia easy to use for dealing with indefinite kernels. And the parameter embedded in the method can be determined through unconstrained optimization under Bregman matrix divergence theory. This may provide a new way in kernel SVMs for varied objectives.


Assuntos
Aprendizado de Máquina Supervisionado , Máquina de Vetores de Suporte , Algoritmos , Inteligência Artificial , Conjuntos de Dados como Assunto , Reconhecimento Automatizado de Padrão/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...