Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Bioinformatics ; 33(19): 3018-3027, 2017 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-28595376

RESUMO

MOTIVATION: High-throughput mRNA sequencing (RNA-Seq) is a powerful tool for quantifying gene expression. Identification of transcript isoforms that are differentially expressed in different conditions, such as in patients and healthy subjects, can provide insights into the molecular basis of diseases. Current transcript quantification approaches, however, do not take advantage of the shared information in the biological replicates, potentially decreasing sensitivity and accuracy. RESULTS: We present a novel hierarchical Bayesian model called Differentially Expressed Isoform detection from Multiple biological replicates (DEIsoM) for identifying differentially expressed (DE) isoforms from multiple biological replicates representing two conditions, e.g. multiple samples from healthy and diseased subjects. DEIsoM first estimates isoform expression within each condition by (1) capturing common patterns from sample replicates while allowing individual differences, and (2) modeling the uncertainty introduced by ambiguous read mapping in each replicate. Specifically, we introduce a Dirichlet prior distribution to capture the common expression pattern of replicates from the same condition, and treat the isoform expression of individual replicates as samples from this distribution. Ambiguous read mapping is modeled as a multinomial distribution, and ambiguous reads are assigned to the most probable isoform in each replicate. Additionally, DEIsoM couples an efficient variational inference and a post-analysis method to improve the accuracy and speed of identification of DE isoforms over alternative methods. Application of DEIsoM to an hepatocellular carcinoma (HCC) dataset identifies biologically relevant DE isoforms. The relevance of these genes/isoforms to HCC are supported by principal component analysis (PCA), read coverage visualization, and the biological literature. AVAILABILITY AND IMPLEMENTATION: The software is available at https://github.com/hao-peng/DEIsoM. CONTACT: pengh@alumni.purdue.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Isoformas de RNA/metabolismo , Análise de Sequência de RNA , Teorema de Bayes , Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/metabolismo , Simulação por Computador , Humanos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/metabolismo , Software
2.
Bioinformatics ; 29(16): 1987-96, 2013 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-23749986

RESUMO

MOTIVATION: By capturing various biochemical interactions, biological pathways provide insight into underlying biological processes. Given high-dimensional microarray or RNA-sequencing data, a critical challenge is how to integrate them with rich information from pathway databases to jointly select relevant pathways and genes for phenotype prediction or disease prognosis. Addressing this challenge can help us deepen biological understanding of phenotypes and diseases from a systems perspective. RESULTS: In this article, we propose a novel sparse Bayesian model for joint network and node selection. This model integrates information from networks (e.g. pathways) and nodes (e.g. genes) by a hybrid of conditional and generative components. For the conditional component, we propose a sparse prior based on graph Laplacian matrices, each of which encodes detailed correlation structures between network nodes. For the generative component, we use a spike and slab prior over network nodes. The integration of these two components, coupled with efficient variational inference, enables the selection of networks as well as correlated network nodes in the selected networks. Simulation results demonstrate improved predictive performance and selection accuracy of our method over alternative methods. Based on three expression datasets for cancer study and the KEGG pathway database, we selected relevant genes and pathways, many of which are supported by biological literature. In addition to pathway analysis, our method is expected to have a wide range of applications in selecting relevant groups of correlated high-dimensional biomarkers. AVAILABILITY: The code can be downloaded at www.cs.purdue.edu/homes/szhe/software.html. CONTACT: alanqi@purdue.edu.


Assuntos
Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Algoritmos , Teorema de Bayes , Carcinoma Ductal Pancreático/genética , Carcinoma Ductal Pancreático/metabolismo , Neoplasias Colorretais/genética , Neoplasias Colorretais/metabolismo , Bases de Dados Factuais , Genômica/métodos , Humanos , Linfoma Difuso de Grandes Células B/genética , Linfoma Difuso de Grandes Células B/metabolismo , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/metabolismo
3.
JCO Clin Cancer Inform ; 7: e2300057, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37490642

RESUMO

PURPOSE: To determine prognostic and predictive clinical outcomes in metastatic hormone-sensitive prostate cancer (mHSPC) and metastatic castrate-resistant prostate cancer (mCRPC) on the basis of a combination of plasma-derived genomic alterations and lipid features in a longitudinal cohort of patients with advanced prostate cancer. METHODS: A multifeature classifier was constructed to predict clinical outcomes using plasma-based genomic alterations detected in 120 genes and 772 lipidomic species as informative features in a cohort of 71 patients with mHSPC and 144 patients with mCRPC. Outcomes of interest were collected over 11 years of follow-up. These included in mHSPC state early failure of androgen-deprivation therapy (ADT) and exceptional responders to ADT; early death (poor prognosis) and long-term survivors in mCRPC state. The approach was to build binary classification models that identified discriminative candidates with optimal weights to predict outcomes. To achieve this, we built multi-omic feature-based classifiers using traditional machine learning (ML) methods, including logistic regression with sparse regularization, multi-kernel Gaussian process regression, and support vector machines. RESULTS: The levels of specific ceramides (d18:1/14:0 and d18:1/17:0), and the presence of CHEK2 mutations, AR amplification, and RB1 deletion were identified as the most crucial factors associated with clinical outcomes. Using ML models, the optimal multi-omics feature combination determined resulted in AUC scores of 0.751 for predicting mHSPC survival and 0.638 for predicting ADT failure; and in mCRPC state, 0.687 for prognostication and 0.727 for exceptional survival. The models were observed to be superior than using a limited candidate number of features for developing multi-omic prognostic and predictive signatures. CONCLUSION: Using a ML approach that incorporates multiple omic features improves the prediction accuracy for metastatic prostate cancer outcomes significantly. Validation of these models will be needed in independent data sets in future.


Assuntos
Neoplasias de Próstata Resistentes à Castração , Masculino , Humanos , Neoplasias de Próstata Resistentes à Castração/diagnóstico , Neoplasias de Próstata Resistentes à Castração/genética , Neoplasias de Próstata Resistentes à Castração/terapia , Antagonistas de Androgênios/uso terapêutico , Lipidômica , Multiômica , Estudos Retrospectivos , Genômica
4.
Sci Rep ; 12(1): 1355, 2022 01 25.
Artigo em Inglês | MEDLINE | ID: mdl-35079127

RESUMO

Accurately predicting red blood cell (RBC) transfusion requirements in cardiothoracic (CT) surgery could improve blood inventory management and be used as a surrogate marker for assessing hemorrhage risk preoperatively. We developed a machine learning (ML) method to predict intraoperative RBC transfusions in CT surgery. A detailed database containing time-stamped clinical variables for all CT surgeries from 5/2014-6/2019 at a single center (n = 2410) was used for model development. After random forest feature selection, surviving features were inputs for ML algorithms using five-fold cross-validation. The dataset was updated with 437 additional cases from 8/2019-8/2020 for validation. We developed and validated a hybrid ML method given the skewed nature of the dataset. Our Gaussian Process (GP) regression ML algorithm accurately predicted RBC transfusion amounts of 0 and 1-3 units (root mean square error, RMSE 0.117 and 1.705, respectively) and our GP classification ML algorithm accurately predicted 4 + RBC units transfused (area under the curve, AUC = 0.826). The final prediction is the regression result if classification predicted < 4 units transfused, or the classification result if 4 + units were predicted. We developed and validated an ML method to accurately predict intraoperative RBC transfusions in CT surgery using local data.


Assuntos
Aprendizado de Máquina , Cirurgia Torácica/métodos , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Fatores de Risco
5.
Neural Netw ; 130: 11-21, 2020 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-32589587

RESUMO

Deep neural networks (DNNs) have achieved outstanding performance in a wide range of applications, e.g., image classification, natural language processing, etc. Despite the good performance, the huge number of parameters in DNNs brings challenges to efficient training of DNNs and also their deployment in low-end devices with limited computing resources. In this paper, we explore the correlations in the weight matrices, and approximate the weight matrices with the low-rank block-term tensors. We name the new corresponding structure as block-term tensor layers (BT-layers), which can be easily adapted to neural network models, such as CNNs and RNNs. In particular, the inputs and the outputs in BT-layers are reshaped into low-dimensional high-order tensors with a similar or improved representation power. Sufficient experiments have demonstrated that BT-layers in CNNs and RNNs can achieve a very large compression ratio on the number of parameters while preserving or improving the representation power of the original DNNs.


Assuntos
Processamento de Linguagem Natural , Redes Neurais de Computação , Compressão de Dados/métodos
6.
IEEE Trans Neural Netw Learn Syst ; 30(2): 369-378, 2019 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-29994133

RESUMO

Kernel support vector machines (SVMs) deliver state-of-the-art results in many real-world nonlinear classification problems, but the computational cost can be quite demanding in order to maintain a large number of support vectors. Linear SVM, on the other hand, is highly scalable to large data but only suited for linearly separable problems. In this paper, we propose a novel approach called low-rank linearized SVM to scale up kernel SVM on limited resources. Our approach transforms a nonlinear SVM to a linear one via an approximate empirical kernel map computed from efficient kernel low-rank decompositions. We theoretically analyze the gap between the solutions of the approximate and optimal rank- k kernel map, which in turn provides guidance on the sampling scheme of the Nyström approximation. Furthermore, we extend it to a semisupervised metric learning scenario in which partially labeled samples can be exploited to further improve the quality of the low-rank embedding. Our approach inherits rich representability of kernel SVM and high efficiency of linear SVM. Experimental results demonstrate that our approach is more robust and achieves a better tradeoff between model representability and scalability against state-of-the-art algorithms for large-scale SVMs.

7.
IEEE Trans Neural Netw Learn Syst ; 30(1): 318-324, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-29994274

RESUMO

Link prediction is a fundamental problem in network modeling. A family of link prediction approaches is to treat network data as an exchangeable array whose entries can be explained by random functions (e.g., block models and Gaussian processes) over latent node factors. Despite their powerful ability in modeling missing links, these models tend to have a large computational complexity and thus are hard to deal with large networks. To address this problem, we develop a novel variational random function model by defining latent Gaussian processes on exchangeable arrays. This model not only inherits the ability of Gaussian process to describe the nonlinear interactions between nodes, but also enjoys significant reduction on computational complexity. To further make the model scalable to large network data, we develop an efficient key-value-free strategy under the map-reduce framework to tremendously reduce the inference time. Experimental results on large network data have demonstrated both the efficacy and efficiency of the proposed method over state-of-the-arts methods in network modeling.

8.
Pac Symp Biocomput ; : 300-11, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24297556

RESUMO

A key step for Alzheimer's disease (AD) study is to identify associations between genetic variations and intermediate phenotypes (e.g., brain structures). At the same time, it is crucial to develop a noninvasive means for AD diagnosis. Although these two tasks-association discovery and disease diagnosis-have been treated separately by a variety of approaches, they are tightly coupled due to their common biological basis. We hypothesize that the two tasks can potentially benefit each other by a joint analysis, because (i) the association study discovers correlated biomarkers from different data sources, which may help improve diagnosis accuracy, and (ii) the disease status may help identify disease-sensitive associations between genetic variations and MRI features. Based on this hypothesis, we present a new sparse Bayesian approach for joint association study and disease diagnosis. In this approach, common latent features are extracted from different data sources based on sparse projection matrices and used to predict multiple disease severity levels based on Gaussian process ordinal regression; in return, the disease status is used to guide the discovery of relationships between the data sources. The sparse projection matrices not only reveal the associations but also select groups of biomarkers related to AD. To learn the model from data, we develop an efficient variational expectation maximization algorithm. Simulation results demonstrate that our approach achieves higher accuracy in both predicting ordinal labels and discovering associations between data sources than alternative methods. We apply our approach to an imaging genetics dataset of AD. Our joint analysis approach not only identifies meaningful and interesting associations between genetic variations, brain structures, and AD status, but also achieves significantly higher accuracy for predicting ordinal AD stages than the competing methods.


Assuntos
Doença de Alzheimer/diagnóstico , Doença de Alzheimer/genética , Algoritmos , Inteligência Artificial , Teorema de Bayes , Biomarcadores , Encéfalo/patologia , Biologia Computacional , Simulação por Computador , Bases de Dados Genéticas/estatística & dados numéricos , Diagnóstico por Computador/estatística & dados numéricos , Estudos de Associação Genética/estatística & dados numéricos , Humanos , Imageamento por Ressonância Magnética , Modelos Estatísticos , Distribuição Normal , Fenótipo , Medicina de Precisão
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA