Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 78
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Bioinformatics ; 37(5): 650-658, 2021 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-33016988

RESUMEN

MOTIVATION: High-throughput RNA sequencing has revolutionized the scope and depth of transcriptome analysis. Accurate reconstruction of a phenotype-specific transcriptome is challenging due to the noise and variability of RNA-seq data. This requires computational identification of transcripts from multiple samples of the same phenotype, given the underlying consensus transcript structure. RESULTS: We present a Bayesian method, integrated assembly of phenotype-specific transcripts (IntAPT), that identifies phenotype-specific isoforms from multiple RNA-seq profiles. IntAPT features a novel two-layer Bayesian model to capture the presence of isoforms at the group layer and to quantify the abundance of isoforms at the sample layer. A spike-and-slab prior is used to model the isoform expression and to enforce the sparsity of expressed isoforms. Dependencies between the existence of isoforms and their expression are modeled explicitly to facilitate parameter estimation. Model parameters are estimated iteratively using Gibbs sampling to infer the joint posterior distribution, from which the presence and abundance of isoforms can reliably be determined. Studies using both simulations and real datasets show that IntAPT consistently outperforms existing methods for the IntAPT. Experimental results demonstrate that, despite sequencing errors, IntAPT exhibits a robust performance among multiple samples, resulting in notably improved identification of expressed isoforms of low abundance. AVAILABILITY AND IMPLEMENTATION: The IntAPT package is available at http://github.com/henryxushi/IntAPT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Teorema de Bayes , Fenotipo , RNA-Seq , Análisis de Secuencia de ARN , Programas Informáticos
2.
PLoS Comput Biol ; 17(7): e1009203, 2021 07.
Artículo en Inglés | MEDLINE | ID: mdl-34292930

RESUMEN

Transcription factors (TFs) often function as a module including both master factors and mediators binding at cis-regulatory regions to modulate nearby gene transcription. ChIP-seq profiling of multiple TFs makes it feasible to infer functional TF modules. However, when inferring TF modules based on co-localization of ChIP-seq peaks, often many weak binding events are missed, especially for mediators, resulting in incomplete identification of modules. To address this problem, we develop a ChIP-seq data-driven Gibbs Sampler to infer Modules (ChIP-GSM) using a Bayesian framework that integrates ChIP-seq profiles of multiple TFs. ChIP-GSM samples read counts of module TFs iteratively to estimate the binding potential of a module to each region and, across all regions, estimates the module abundance. Using inferred module-region probabilistic bindings as feature units, ChIP-GSM then employs logistic regression to predict active regulatory elements. Validation of ChIP-GSM predicted regulatory regions on multiple independent datasets sharing the same context confirms the advantage of using TF modules for predicting regulatory activity. In a case study of K562 cells, we demonstrate that the ChIP-GSM inferred modules form as groups, activate gene expression at different time points, and mediate diverse functional cellular processes. Hence, ChIP-GSM infers biologically meaningful TF modules and improves the prediction accuracy of regulatory region activities.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina/métodos , Redes Reguladoras de Genes , Secuencias Reguladoras de Ácidos Nucleicos/genética , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Teorema de Bayes , Sitios de Unión/genética , Cromatina/genética , Cromatina/metabolismo , Secuenciación de Inmunoprecipitación de Cromatina/estadística & datos numéricos , Biología Computacional , Elementos de Facilitación Genéticos , Epigénesis Genética , Regulación de la Expresión Génica , Humanos , Células K562 , Células MCF-7 , Modelos Estadísticos , Regiones Promotoras Genéticas
3.
BMC Bioinformatics ; 22(1): 193, 2021 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-33858322

RESUMEN

BACKGROUND: ChIP-seq combines chromatin immunoprecipitation assays with sequencing and identifies genome-wide binding sites for DNA binding proteins. While many binding sites have strong ChIP-seq 'peak' observations and are well captured, there are still regions bound by proteins weakly, with a relatively low ChIP-seq signal enrichment. These weak binding sites, especially those at promoters and enhancers, are functionally important because they also regulate nearby gene expression. Yet, it remains a challenge to accurately identify weak binding sites in ChIP-seq data due to the ambiguity in differentiating these weak binding sites from the amplified background DNAs. RESULTS: ChIP-BIT2 ( http://sourceforge.net/projects/chipbitc/ ) is a software package for ChIP-seq peak detection. ChIP-BIT2 employs a mixture model integrating protein and control ChIP-seq data and predicts strong or weak protein binding sites at promoters, enhancers, or other genomic locations. For binding sites at gene promoters, ChIP-BIT2 simultaneously predicts their target genes. ChIP-BIT2 has been validated on benchmark regions and tested using large-scale ENCODE ChIP-seq data, demonstrating its high accuracy and wide applicability. CONCLUSION: ChIP-BIT2 is an efficient ChIP-seq peak caller. It provides a better lens to examine weak binding sites and can refine or extend the existing binding site collection, providing additional regulatory regions for decoding the mechanism of gene expression regulation.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Teorema de Bayes , Sitios de Unión , Inmunoprecipitación de Cromatina , Análisis de Secuencia por Matrices de Oligonucleótidos , Análisis de Secuencia de ADN
4.
Bioinformatics ; 34(1): 56-63, 2018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-28968634

RESUMEN

Motivation: Recent advances in high-throughput RNA sequencing (RNA-seq) technologies have made it possible to reconstruct the full transcriptome of various types of cells. It is important to accurately assemble transcripts or identify isoforms for an improved understanding of molecular mechanisms in biological systems. Results: We have developed a novel Bayesian method, SparseIso, to reliably identify spliced isoforms from RNA-seq data. A spike-and-slab prior is incorporated into the Bayesian model to enforce the sparsity for isoform identification, effectively alleviating the problem of overfitting. A Gibbs sampling procedure is further developed to simultaneously identify and quantify transcripts from RNA-seq data. With the sampling approach, SparseIso estimates the joint distribution of all candidate transcripts, resulting in a significantly improved performance in detecting lowly expressed transcripts and multiple expressed isoforms of genes. Both simulation study and real data analysis have demonstrated that the proposed SparseIso method significantly outperforms existing methods for improved transcript assembly and isoform identification. Availability and implementation: The SparseIso package is available at http://github.com/henryxushi/SparseIso. Contact: xuan@vt.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Empalme Alternativo , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Modelos Biológicos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Teorema de Bayes , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Línea Celular , Línea Celular Tumoral , Biología Computacional/métodos , Femenino , Humanos , Transcriptoma
5.
Bioinformatics ; 34(10): 1733-1740, 2018 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-29280996

RESUMEN

Motivation: NGS techniques have been widely applied in genetic and epigenetic studies. Multiple ChIP-seq and RNA-seq profiles can now be jointly used to infer functional regulatory networks (FRNs). However, existing methods suffer from either oversimplified assumption on transcription factor (TF) regulation or slow convergence of sampling for FRN inference from large-scale ChIP-seq and time-course RNA-seq data. Results: We developed an efficient Bayesian integration method (CRNET) for FRN inference using a two-stage Gibbs sampler to estimate iteratively hidden TF activities and the posterior probabilities of binding events. A novel statistic measure that jointly considers regulation strength and regression error enables the sampling process of CRNET to converge quickly, thus making CRNET very efficient for large-scale FRN inference. Experiments on synthetic and benchmark data showed a significantly improved performance of CRNET when compared with existing methods. CRNET was applied to breast cancer data to identify FRNs functional at promoter or enhancer regions in breast cancer MCF-7 cells. Transcription factor MYC is predicted as a key functional factor in both promoter and enhancer FRNs. We experimentally validated the regulation effects of MYC on CRNET-predicted target genes using appropriate RNAi approaches in MCF-7 cells. Availability and implementation: R scripts of CRNET are available at http://www.cbil.ece.vt.edu/software.htm. Contact: xuan@vt.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
ARN/genética , Análisis de Secuencia de ARN/métodos , Teorema de Bayes , Neoplasias de la Mama/genética , Humanos , Regiones Promotoras Genéticas , Factores de Transcripción/metabolismo
6.
Bioinformatics ; 33(2): 161-168, 2017 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-27616707

RESUMEN

MOTIVATION: The advent of high-throughput DNA methylation profiling techniques has enabled the possibility of accurate identification of differentially methylated genes for cancer research. The large number of measured loci facilitates whole genome methylation study, yet posing great challenges for differential methylation detection due to the high variability in tumor samples. RESULTS: We have developed a novel probabilistic approach, D: ifferential M: ethylation detection using a hierarchical B: ayesian model exploiting L: ocal D: ependency (DM-BLD), to detect differentially methylated genes based on a Bayesian framework. The DM-BLD approach features a joint model to capture both the local dependency of measured loci and the dependency of methylation change in samples. Specifically, the local dependency is modeled by Leroux conditional autoregressive structure; the dependency of methylation changes is modeled by a discrete Markov random field. A hierarchical Bayesian model is developed to fully take into account the local dependency for differential analysis, in which differential states are embedded as hidden variables. Simulation studies demonstrate that DM-BLD outperforms existing methods for differential methylation detection, particularly when the methylation change is moderate and the variability of methylation in samples is high. DM-BLD has been applied to breast cancer data to identify important methylated genes (such as polycomb target genes and genes involved in transcription factor activity) associated with breast cancer recurrence. AVAILABILITY AND IMPLEMENTATION: A Matlab package of DM-BLD is available at http://www.cbil.ece.vt.edu/software.htm CONTACT: Xuan@vt.eduSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neoplasias de la Mama/genética , Metilación de ADN , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Teorema de Bayes , Neoplasias de la Mama/metabolismo , Neoplasias de la Mama/patología , ADN de Neoplasias , Femenino , Genómica/métodos , Humanos , Recurrencia Local de Neoplasia/genética
7.
Bioinformatics ; 33(2): 177-183, 2017 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-27659451

RESUMEN

MOTIVATION: Whole genome DNA-sequencing (WGS) of paired tumor and normal samples has enabled the identification of somatic DNA changes in an unprecedented detail. Large-scale identification of somatic structural variations (SVs) for a specific cancer type will deepen our understanding of driver mechanisms in cancer progression. However, the limited number of WGS samples, insufficient read coverage, and the impurity of tumor samples that contain normal and neoplastic cells, limit reliable and accurate detection of somatic SVs. RESULTS: We present a novel pattern-based probabilistic approach, PSSV, to identify somatic structural variations from WGS data. PSSV features a mixture model with hidden states representing different mutation patterns; PSSV can thus differentiate heterozygous and homozygous SVs in each sample, enabling the identification of those somatic SVs with heterozygous mutations in normal samples and homozygous mutations in tumor samples. Simulation studies demonstrate that PSSV outperforms existing tools. PSSV has been successfully applied to breast cancer data to identify somatic SVs of key factors associated with breast cancer development. AVAILABILITY AND IMPLEMENTATION: An R package of PSSV is available at http://www.cbil.ece.vt.edu/software.htm CONTACT: xuan@vt.eduSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neoplasias de la Mama/genética , Análisis Mutacional de ADN/métodos , ADN de Neoplasias , Variación Estructural del Genoma , Programas Informáticos , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Mutación , ARN Mensajero
8.
Nucleic Acids Res ; 44(7): e65, 2016 Apr 20.
Artículo en Inglés | MEDLINE | ID: mdl-26704972

RESUMEN

Chromatin immunoprecipitation with massively parallel DNA sequencing (ChIP-seq) has greatly improved the reliability with which transcription factor binding sites (TFBSs) can be identified from genome-wide profiling studies. Many computational tools are developed to detect binding events or peaks, however the robust detection of weak binding events remains a challenge for current peak calling tools. We have developed a novel Bayesian approach (ChIP-BIT) to reliably detect TFBSs and their target genes by jointly modeling binding signal intensities and binding locations of TFBSs. Specifically, a Gaussian mixture model is used to capture both binding and background signals in sample data. As a unique feature of ChIP-BIT, background signals are modeled by a local Gaussian distribution that is accurately estimated from the input data. Extensive simulation studies showed a significantly improved performance of ChIP-BIT in target gene prediction, particularly for detecting weak binding signals at gene promoter regions. We applied ChIP-BIT to find target genes from NOTCH3 and PBX1 ChIP-seq data acquired from MCF-7 breast cancer cells. TF knockdown experiments have initially validated about 30% of co-regulated target genes identified by ChIP-BIT as being differentially expressed in MCF-7 cells. Functional analysis on these genes further revealed the existence of crosstalk between Notch and Wnt signaling pathways.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Modelos Estadísticos , Análisis de Secuencia de ADN/métodos , Factores de Transcripción/metabolismo , Teorema de Bayes , Sitios de Unión , Proteínas de Unión al ADN/metabolismo , Regulación de la Expresión Génica , Humanos , Células K562 , Células MCF-7 , Factor de Transcripción 1 de la Leucemia de Células Pre-B , Proteínas Proto-Oncogénicas/metabolismo , Receptor Notch3 , Receptores Notch/metabolismo
9.
Breast Cancer Res ; 19(1): 77, 2017 Jul 03.
Artículo en Inglés | MEDLINE | ID: mdl-28673325

RESUMEN

BACKGROUND: Maternal and paternal high-fat (HF) diet intake before and/or during pregnancy increases mammary cancer risk in several preclinical models. We studied if maternal consumption of a HF diet that began at a time when the fetal primordial germ cells travel to the genital ridge and start differentiating into germ cells would result in a transgenerational inheritance of increased mammary cancer risk. METHODS: Pregnant C57BL/6NTac mouse dams were fed either a control AIN93G or isocaloric HF diet composed of corn oil high in n-6 polyunsaturated fatty acids between gestational days 10 and 20. Offspring in subsequent F1-F3 generations were fed only the control diet. RESULTS: Mammary tumor incidence induced by 7,12-dimethylbenz[a]anthracene was significantly higher in F1 (p < 0.016) and F3 generation offspring of HF diet-fed dams (p < 0.040) than in the control offspring. Further, tumor latency was significantly shorter (p < 0.028) and burden higher (p < 0.027) in F1 generation HF offspring, and similar trends were seen in F3 generation HF offspring. RNA sequencing was done on normal mammary glands to identify signaling differences that may predispose to increased breast cancer risk by maternal HF intake. Analysis revealed 1587 and 4423 differentially expressed genes between HF and control offspring in F1 and F3 generations, respectively, of which 48 genes were similarly altered in both generations. Quantitative real-time polymerase chain reaction analysis validated 13 chosen up- and downregulated genes in F3 HF offspring, but only downregulated genes in F1 HF offspring. Ingenuity Pathway Analysis identified upregulation of Notch signaling as a key alteration in HF offspring. Further, knowledge-fused differential dependency network analysis identified ten node genes that in the HF offspring were uniquely connected to genes linked to increased cancer risk (ANKEF1, IGFBP6, SEMA5B), increased resistance to cancer treatments (SLC26A3), poor prognosis (ID4, JAM3, TBX2), and impaired anticancer immunity (EGR3, ZBP1). CONCLUSIONS: We conclude that maternal HF diet intake during pregnancy induces a transgenerational increase in offspring mammary cancer risk in mice. The mechanisms of inheritance in the F3 generation may be different from the F1 generation because significantly more changes were seen in the transcriptome.


Asunto(s)
Neoplasias de la Mama/metabolismo , Dieta Alta en Grasa , Ácidos Grasos Omega-6/metabolismo , Exposición Materna , Efectos Tardíos de la Exposición Prenatal , Animales , Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Modelos Animales de Enfermedad , Femenino , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Xenoinjertos , Masculino , Glándulas Mamarias Animales , Ratones , Embarazo , Reproducibilidad de los Resultados
10.
Bioinformatics ; 31(14): 2412-4, 2015 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-25755273

RESUMEN

UNLABELLED: Identification of protein interaction subnetworks is an important step to help us understand complex molecular mechanisms in cancer. In this paper, we develop a BMRF-Net package, implemented in Java and C++, to identify protein interaction subnetworks based on a bagging Markov random field (BMRF) framework. By integrating gene expression data and protein-protein interaction data, this software tool can be used to identify biologically meaningful subnetworks. A user friendly graphic user interface is developed as a Cytoscape plugin for the BMRF-Net software to deal with the input/output interface. The detailed structure of the identified networks can be visualized in Cytoscape conveniently. The BMRF-Net package has been applied to breast cancer data to identify significant subnetworks related to breast cancer recurrence. AVAILABILITY AND IMPLEMENTATION: The BMRF-Net package is available at http://sourceforge.net/projects/bmrfcjava/. The package is tested under Ubuntu 12.04 (64-bit), Java 7, glibc 2.15 and Cytoscape 3.1.0.


Asunto(s)
Mapeo de Interacción de Proteínas/métodos , Programas Informáticos , Algoritmos , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Femenino , Expresión Génica , Humanos , Cadenas de Markov
11.
Bioinformatics ; 31(1): 137-9, 2015 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-25212756

RESUMEN

SUMMARY: We develop a novel unsupervised deconvolution method, within a well-grounded mathematical framework, to dissect mixed gene expressions in heterogeneous tumor samples. We implement an R package, UNsupervised DecOnvolution (UNDO), that can be used to automatically detect cell-specific marker genes (MGs) located on the scatter radii of mixed gene expressions, estimate cellular proportions in each sample and deconvolute mixed expressions into cell-specific expression profiles. We demonstrate the performance of UNDO over a wide range of tumor-stroma mixing proportions, validate UNDO on various biologically mixed benchmark gene expression datasets and further estimate tumor purity in TCGA/CPTAC datasets. The highly accurate deconvolution results obtained suggest not only the existence of cell-specific MGs but also UNDO's ability to detect them blindly and correctly. Although the principal application here involves microarray gene expressions, our methodology can be readily applied to other types of quantitative molecular profiling data. AVAILABILITY AND IMPLEMENTATION: UNDO is available at http://bioconductor.org/packages.


Asunto(s)
Neoplasias de la Mama/genética , Regulación Neoplásica de la Expresión Génica , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Algoritmos , Simulación por Computador , Femenino , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Células Tumorales Cultivadas
12.
Bioinformatics ; 31(2): 287-9, 2015 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-25273109

RESUMEN

UNLABELLED: We have developed an integrated molecular network learning method, within a well-grounded mathematical framework, to construct differential dependency networks with significant rewiring. This knowledge-fused differential dependency networks (KDDN) method, implemented as a Java Cytoscape app, can be used to optimally integrate prior biological knowledge with measured data to simultaneously construct both common and differential networks, to quantitatively assign model parameters and significant rewiring p-values and to provide user-friendly graphical results. The KDDN algorithm is computationally efficient and provides users with parallel computing capability using ubiquitous multi-core machines. We demonstrate the performance of KDDN on various simulations and real gene expression datasets, and further compare the results with those obtained by the most relevant peer methods. The acquired biologically plausible results provide new insights into network rewiring as a mechanistic principle and illustrate KDDN's ability to detect them efficiently and correctly. Although the principal application here involves microarray gene expressions, our methodology can be readily applied to other types of quantitative molecular profiling data. AVAILABILITY: Source code and compiled package are freely available for download at http://apps.cytoscape.org/apps/kddn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes Reguladoras de Genes , Genes cdc/genética , Saccharomyces cerevisiae/genética , Programas Informáticos , Biología de Sistemas/métodos , Algoritmos , Ciclo Celular/genética , Modelos Biológicos , Anotación de Secuencia Molecular , Estrés Oxidativo , Saccharomyces cerevisiae/citología , Saccharomyces cerevisiae/metabolismo
13.
BMC Genomics ; 16 Suppl 7: S10, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26099273

RESUMEN

BACKGROUND: Identification of protein interaction network is a very important step for understanding the molecular mechanisms in cancer. Several methods have been developed to integrate protein-protein interaction (PPI) data with gene expression data for network identification. However, they often fail to model the dependency between genes in the network, which makes many important genes, especially the upstream genes, unidentified. It is necessary to develop a method to improve the network identification performance by incorporating the dependency between genes. RESULTS: We proposed an approach for identifying protein interaction network by incorporating mutual information (MI) into a Markov random field (MRF) based framework to model the dependency between genes. MI is widely used in information theory to measure the uncertainty between random variables. Different from traditional Pearson correlation test, MI is capable of capturing both linear and non-linear relationship between random variables. Among all the existing MI estimators, we choose to use k-nearest neighbor MI (kNN-MI) estimator which is proved to have minimum bias. The estimated MI is integrated with an MRF framework to model the gene dependency in the context of network. The maximum a posterior (MAP) estimation is applied on the MRF-based model to estimate the network score. In order to reduce the computational complexity of finding the optimal network, a probabilistic searching algorithm is implemented. We further increase the robustness and reproducibility of the results by applying a non-parametric bootstrapping method to measure the confidence level of the identified genes. To evaluate the performance of the proposed method, we test the method on simulation data under different conditions. The experimental results show an improved accuracy in terms of subnetwork identification compared to existing methods. Furthermore, we applied our method onto real breast cancer patient data; the identified protein interaction network shows a close association with the recurrence of breast cancer, which is supported by functional annotation. We also show that the identified subnetworks can be used to predict the recurrence status of cancer patients by survival analysis. CONCLUSIONS: We have developed an integrated approach for protein interaction network identification, which combines Markov random field framework and mutual information to model the gene dependency in PPI network. Improvements in subnetwork identification have been demonstrated with simulation datasets compared to existing methods. We then apply our method onto breast cancer patient data to identify recurrence related subnetworks. The experiment results show that the identified genes are enriched in the pathway and functional categories relevant to progression and recurrence of breast cancer. Finally, the survival analysis based on identified subnetworks achieves a good result of classifying the recurrence status of cancer patients.


Asunto(s)
Biología Computacional/métodos , Neoplasias/genética , Mapas de Interacción de Proteínas , Algoritmos , Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Humanos , Cadenas de Markov , Modelos Genéticos , Neoplasias/metabolismo , Mapeo de Interacción de Proteínas/métodos
14.
Nucleic Acids Res ; 41(2): e42, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23161673

RESUMEN

Identification of differentially expressed subnetworks from protein-protein interaction (PPI) networks has become increasingly important to our global understanding of the molecular mechanisms that drive cancer. Several methods have been proposed for PPI subnetwork identification, but the dependency among network member genes is not explicitly considered, leaving many important hub genes largely unidentified. We present a new method, based on a bagging Markov random field (BMRF) framework, to improve subnetwork identification for mechanistic studies of breast cancer. The method follows a maximum a posteriori principle to form a novel network score that explicitly considers pairwise gene interactions in PPI networks, and it searches for subnetworks with maximal network scores. To improve their robustness across data sets, a bagging scheme based on bootstrapping samples is implemented to statistically select high confidence subnetworks. We first compared the BMRF-based method with existing methods on simulation data to demonstrate its improved performance. We then applied our method to breast cancer data to identify PPI subnetworks associated with breast cancer progression and/or tamoxifen resistance. The experimental results show that not only an improved prediction performance can be achieved by the BMRF approach when tested on independent data sets, but biologically meaningful subnetworks can also be revealed that are relevant to breast cancer and tamoxifen resistance.


Asunto(s)
Neoplasias de la Mama/metabolismo , Perfilación de la Expresión Génica/métodos , Mapeo de Interacción de Proteínas/métodos , Antineoplásicos Hormonales/uso terapéutico , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/genética , Línea Celular Tumoral , Resistencia a Antineoplásicos , Femenino , Redes Reguladoras de Genes , Humanos , Cadenas de Markov , Mapas de Interacción de Proteínas , Tamoxifeno/uso terapéutico
15.
BMC Bioinformatics ; 15 Suppl 9: S6, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25252852

RESUMEN

BACKGROUND: Recent advances in RNA sequencing (RNA-Seq) technology have offered unprecedented scope and resolution for transcriptome analysis. However, precise quantification of mRNA abundance and identification of differentially expressed genes are complicated due to biological and technical variations in RNA-Seq data. RESULTS: We systematically study the variation in count data and dissect the sources of variation into between-sample variation and within-sample variation. A novel Bayesian framework is developed for joint estimate of gene level mRNA abundance and differential state, which models the intrinsic variability in RNA-Seq to improve the estimation. Specifically, a Poisson-Lognormal model is incorporated into the Bayesian framework to model within-sample variation; a Gamma-Gamma model is then used to model between-sample variation, which accounts for over-dispersion of read counts among multiple samples. Simulation studies, where sequencing counts are synthesized based on parameters learned from real datasets, have demonstrated the advantage of the proposed method in both quantification of mRNA abundance and identification of differentially expressed genes. Moreover, performance comparison on data from the Sequencing Quality Control (SEQC) Project with ERCC spike-in controls has shown that the proposed method outperforms existing RNA-Seq methods in differential analysis. Application on breast cancer dataset has further illustrated that the proposed Bayesian model can 'blindly' estimate sources of variation caused by sequencing biases. CONCLUSIONS: We have developed a novel Bayesian hierarchical approach to investigate within-sample and between-sample variations in RNA-Seq data. Simulation and real data applications have validated desirable performance of the proposed method. The software package is available at http://www.cbil.ece.vt.edu/software.htm.


Asunto(s)
Teorema de Bayes , Perfilación de la Expresión Génica/métodos , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos , Simulación por Computador , Regulación de la Expresión Génica , Humanos , Modelos Genéticos , Programas Informáticos
16.
Bioinformatics ; 28(15): 1990-7, 2012 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-22595208

RESUMEN

MOTIVATION: Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive 'noise' in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. RESULTS: In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. AVAILABILITY AND IMPLEMENTATION: The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. CONTACT: xuan@vt.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Neoplasias de la Mama/genética , Biología Computacional/métodos , Redes Reguladoras de Genes , Programas Informáticos , Factores de Transcripción/genética , Ciclo Celular/genética , Línea Celular Tumoral , Simulación por Computador , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Transducción de Señal/genética , Relación Señal-Ruido , Factores de Transcripción/metabolismo
17.
Bioinformatics ; 27(5): 736-8, 2011 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-21186245

RESUMEN

UNLABELLED: Phenotypic Up-regulated Gene Support Vector Machine (PUGSVM) is a cancer Biomedical Informatics Grid (caBIG™) analytical tool for multiclass gene selection and classification. PUGSVM addresses the problem of imbalanced class separability, small sample size and high gene space dimensionality, where multiclass gene markers are defined by the union of one-versus-everyone phenotypic upregulated genes, and used by a well-matched one-versus-rest support vector machine. PUGSVM provides a simple yet more accurate strategy to identify statistically reproducible mechanistic marker genes for characterization of heterogeneous diseases. AVAILABILITY: http://www.cbil.ece.vt.edu/caBIG-PUGSVM.htm.


Asunto(s)
Biología Computacional/métodos , Genes Relacionados con las Neoplasias , Neoplasias/genética , Programas Informáticos , Algoritmos , Perfilación de la Expresión Génica , Humanos , Neoplasias/clasificación , Fenotipo
18.
Bioinformatics ; 27(7): 1036-8, 2011 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-21296752

RESUMEN

UNLABELLED: Differential dependency network (DDN) is a caBIG® (cancer Biomedical Informatics Grid) analytical tool for detecting and visualizing statistically significant topological changes in transcriptional networks representing two biological conditions. Developed under caBIG®'s In Silico Research Centers of Excellence (ISRCE) Program, DDN enables differential network analysis and provides an alternative way for defining network biomarkers predictive of phenotypes. DDN also serves as a useful systems biology tool for users across biomedical research communities to infer how genetic, epigenetic or environment variables may affect biological networks and clinical phenotypes. Besides the standalone Java application, we have also developed a Cytoscape plug-in, CytoDDN, to integrate network analysis and visualization seamlessly. AVAILABILITY: The Java and MATLAB source code can be downloaded at the authors' web site http://www.cbil.ece.vt.edu/software.htm.


Asunto(s)
Redes Reguladoras de Genes , Programas Informáticos , Animales , Biología Computacional , Epigénesis Genética , Femenino , Glándulas Mamarias Animales/metabolismo , Ratas , Biología de Sistemas
19.
Drug Discov Today Dis Mech ; 9(1-2): e11-e17, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23539064

RESUMEN

Understanding the molecular changes that drive an acquired antiestrogen resistance phenotype is of major clinical relevance. Previous methodologies for addressing this question have taken a single gene/pathway approach and the resulting gains have been limited in terms of their clinical impact. Recent systems biology approaches allow for the integration of data from high throughput "-omics" technologies. We highlight recent advances in the field of antiestrogen resistance with a focus on transcriptomics, proteomics and methylomics.

20.
Neurocomputing (Amst) ; 92: 9-17, 2012 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-22773895

RESUMEN

To construct biologically interpretable gene sets for muscular dystrophy (MD) sub-type classification, we propose a novel computational scheme to integrate protein-protein interaction (PPI) network, functional gene set information, and mRNA profiling data. The workflow of the proposed scheme includes the following three major steps: firstly, we apply an affinity propagation clustering (APC) approach to identify gene sub-networks associated with each MD sub-type, in which a new distance metric is proposed for APC to combine PPI network information and gene-gene co-expression relationship; secondly, we further incorporate functional gene set knowledge, which complements the physical PPI information, into our scheme for biomarker identification; finally, based on the constructed sub-networks and gene set features, we apply multi-class support vector machines (MSVMs) for MD sub-type classification, with which to highlight the biomarkers contributing to sub-type prediction. The experimental results show that our scheme can help identify sub-networks and gene sets that are more relevant to MD than those constructed by other conventional approaches. Moreover, our integrative strategy improves the prediction accuracy substantially, especially for those 'hard-to-classify' sub-types.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA