Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 124
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Int J Mol Sci ; 23(21)2022 Oct 29.
Artículo en Inglés | MEDLINE | ID: mdl-36361936

RESUMEN

The idea of a digital twin has recently gained widespread attention. While, so far, it has been used predominantly for problems in engineering and manufacturing, it is believed that a digital twin also holds great promise for applications in medicine and health. However, a problem that severely hampers progress in these fields is the lack of a solid definition of the concept behind a digital twin that would be directly amenable for such big data-driven fields requiring a statistical data analysis. In this paper, we address this problem. We will see that the term 'digital twin', as used in the literature, is like a Matryoshka doll. For this reason, we unstack the concept via a data-centric machine learning perspective, allowing us to define its main components. As a consequence, we suggest to use the term Digital Twin System instead of digital twin because this highlights its complex interconnected substructure. In addition, we address ethical concerns that result from treatment suggestions for patients based on simulated data and a possible lack of explainability of the underling models.


Asunto(s)
Aprendizaje Automático , Proyectos de Investigación , Humanos , Macrodatos
2.
Brief Bioinform ; 19(3): 506-523, 2018 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-28069634

RESUMEN

Large-scale perturbation databases, such as Connectivity Map (CMap) or Library of Integrated Network-based Cellular Signatures (LINCS), provide enormous opportunities for computational pharmacogenomics and drug design. A reason for this is that in contrast to classical pharmacology focusing at one target at a time, the transcriptomics profiles provided by CMap and LINCS open the door for systems biology approaches on the pathway and network level. In this article, we provide a review of recent developments in computational pharmacogenomics with respect to CMap and LINCS and related applications.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Farmacogenética , Bibliotecas de Moléculas Pequeñas/farmacología , Transcriptoma , Bases de Datos Factuales , Redes Reguladoras de Genes , Humanos
3.
BMC Cancer ; 19(1): 1176, 2019 Dec 03.
Artículo en Inglés | MEDLINE | ID: mdl-31796020

RESUMEN

BACKGROUND: Deciphering the meaning of the human DNA is an outstanding goal which would revolutionize medicine and our way for treating diseases. In recent years, non-coding RNAs have attracted much attention and shown to be functional in part. Yet the importance of these RNAs especially for higher biological functions remains under investigation. METHODS: In this paper, we analyze RNA-seq data, including non-coding and protein coding RNAs, from lung adenocarcinoma patients, a histologic subtype of non-small-cell lung cancer, with deep learning neural networks and other state-of-the-art classification methods. The purpose of our paper is three-fold. First, we compare the classification performance of different versions of deep belief networks with SVMs, decision trees and random forests. Second, we compare the classification capabilities of protein coding and non-coding RNAs. Third, we study the influence of feature selection on the classification performance. RESULTS: As a result, we find that deep belief networks perform at least competitively to other state-of-the-art classifiers. Second, data from non-coding RNAs perform better than coding RNAs across a number of different classification methods. This demonstrates the equivalence of predictive information as captured by non-coding RNAs compared to protein coding RNAs, conventionally used in computational diagnostics tasks. Third, we find that feature selection has in general a negative effect on the classification performance which means that unfiltered data with all features give the best classification results. CONCLUSIONS: Our study is the first to use ncRNAs beyond miRNAs for the computational classification of cancer and for performing a direct comparison of the classification capabilities of protein coding RNAs and non-coding RNAs.


Asunto(s)
Neoplasias Pulmonares/clasificación , Neoplasias Pulmonares/genética , ARN Mensajero/metabolismo , ARN no Traducido/genética , Biología Computacional/métodos , Árboles de Decisión , Humanos , Neoplasias Pulmonares/patología , Aprendizaje Automático , MicroARNs/genética , Redes Neurales de la Computación , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos
4.
Curr Genomics ; 20(1): 38-48, 2019 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-31015790

RESUMEN

BACKGROUND: Cancer is a complex disease with a lucid etiology and in understanding the causation, we need to appreciate this complexity. OBJECTIVE: Here we are aiming to gain insights into the genetic associations of prostate cancer through a network-based systems approach using the BC3Net algorithm. METHODS: Specifically, we infer a prostate cancer Gene Regulatory Network (GRN) from a large-scale gene expression data set of 333 patient RNA-seq profiles obtained from The Cancer Genome Atlas (TCGA) database. RESULTS: We analyze the functional components of the inferred network by extracting subnetworks based on biological process information and interpret the role of known cancer genes within each process. Fur-thermore, we investigate the local landscape of prostate cancer genes and discuss pathological associa-tions that may be relevant in the development of new targeted cancer therapies. CONCLUSION: Our network-based analysis provides a practical systems biology approach to reveal the collective gene-interactions of prostate cancer. This allows a close interpretation of biological activity in terms of the hallmarks of cancer.

5.
Entropy (Basel) ; 21(5)2019 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-33267196

RESUMEN

In this paper, we study several distance-based entropy measures on fullerene graphs. These include the topological information content of a graph I a ( G ) , a degree-based entropy measure, the eccentric-entropy I f σ ( G ) , the Hosoya entropy H ( G ) and, finally, the radial centric information entropy H e c c . We compare these measures on two infinite classes of fullerene graphs denoted by A 12 n + 4 and B 12 n + 6 . We have chosen these measures as they are easily computable and capture meaningful graph properties. To demonstrate the utility of these measures, we investigate the Pearson correlation between them on the fullerene graphs.

6.
Brief Bioinform ; 17(3): 393-407, 2016 05.
Artículo en Inglés | MEDLINE | ID: mdl-26342128

RESUMEN

Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq.


Asunto(s)
ARN/genética , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Reproducibilidad de los Resultados , Tamaño de la Muestra , Análisis de Secuencia de ARN
7.
BMC Bioinformatics ; 18(1): 61, 2017 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-28118818

RESUMEN

BACKGROUND: Gene set analysis (in a form of functionally related genes or pathways) has become the method of choice for analyzing omics data in general and gene expression data in particular. There are many statistical methods that either summarize gene-level statistics for a gene set or apply a multivariate statistic that accounts for intergene correlations. Most available methods detect complex departures from the null hypothesis but lack the ability to identify the specific alternative hypothesis that rejects the null. RESULTS: GSAR (Gene Set Analysis in R) is an open-source R/Bioconductor software package for gene set analysis (GSA). It implements self-contained multivariate non-parametric statistical methods testing a complex null hypothesis against specific alternatives, such as differences in mean (shift), variance (scale), or net correlation structure. The package also provides a graphical visualization tool, based on the union of two minimum spanning trees, for correlation networks to examine the change in the correlation structures of a gene set between two conditions and highlight influential genes (hubs). CONCLUSIONS: Package GSAR provides a set of multivariate non-parametric statistical methods that test a complex null hypothesis against specific alternatives. The methods in package GSAR are applicable to any type of omics data that can be represented in a matrix format. The package, with detailed instructions and examples, is freely available under the GPL (> = 2) license from the Bioconductor web site.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Programas Informáticos , Línea Celular Tumoral , Expresión Génica , Humanos , Modelos Teóricos , Análisis Multivariante , Fenotipo , Análisis de Secuencia de ARN , Proteína p53 Supresora de Tumor/genética , Proteína p53 Supresora de Tumor/metabolismo
8.
BMC Bioinformatics ; 18(1): 325, 2017 Jul 04.
Artículo en Inglés | MEDLINE | ID: mdl-28676075

RESUMEN

BACKGROUND: sgnesR (Stochastic Gene Network Expression Simulator in R) is an R package that provides an interface to simulate gene expression data from a given gene network using the stochastic simulation algorithm (SSA). The package allows various options for delay parameters and can easily included in reactions for promoter delay, RNA delay and Protein delay. A user can tune these parameters to model various types of reactions within a cell. As examples, we present two network models to generate expression profiles. We also demonstrated the inference of networks and the evaluation of association measure of edge and non-edge components from the generated expression profiles. RESULTS: The purpose of sgnesR is to enable an easy to use and a quick implementation for generating realistic gene expression data from biologically relevant networks that can be user selected. CONCLUSIONS: sgnesR is freely available for academic use. The R package has been tested for R 3.2.0 under Linux, Windows and Mac OS X.


Asunto(s)
Redes Reguladoras de Genes , Interfaz Usuario-Computador , Algoritmos , Expresión Génica , Internet
9.
Bioinformatics ; 32(21): 3345-3347, 2016 11 01.
Artículo en Inglés | MEDLINE | ID: mdl-27402900

RESUMEN

MOTIVATION: Data from RNA-seq experiments provide us with many new possibilities to gain insights into biological and disease mechanisms of cellular functioning. However, the reproducibility and robustness of RNA-seq data analysis results is often unclear. This is in part attributed to the two counter acting goals of (i) a cost efficient and (ii) an optimal experimental design leading to a compromise, e.g. in the sequencing depth of experiments. RESULTS: We introduce an R package called samExploreR that allows the subsampling (m out of n bootstraping) of short-reads based on SAM files facilitating the investigation of sequencing depth related questions for the experimental design. Overall, this provides a systematic way for exploring the reproducibility and robustness of general RNA-seq studies. We exemplify the usage of samExploreR by studying the influence of the sequencing depth and the annotation on the identification of differentially expressed genes. AVAILABILITY AND IMPLEMENTATION: samExploreR is available as an R package from Bioconductor. CONTACT: v@bio-complexity.comSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
ARN/genética , Análisis de Secuencia de ARN , Reproducibilidad de los Resultados , Proyectos de Investigación , Programas Informáticos
10.
BMC Bioinformatics ; 17: 129, 2016 Mar 18.
Artículo en Inglés | MEDLINE | ID: mdl-26987731

RESUMEN

BACKGROUND: It is generally acknowledged that a functional understanding of a biological system can only be obtained by an understanding of the collective of molecular interactions in form of biological networks. Protein networks are one particular network type of special importance, because proteins form the functional base units of every biological cell. On a mesoscopic level of protein networks, modules are of significant importance because these building blocks may be the next elementary functional level above individual proteins allowing to gain insight into fundamental organizational principles of biological cells. RESULTS: In this paper, we provide a comparative analysis of five popular and four novel module detection algorithms. We study these module prediction methods for simulated benchmark networks as well as 10 biological protein interaction networks (PINs). A particular focus of our analysis is placed on the biological meaning of the predicted modules by utilizing the Gene Ontology (GO) database as gold standard for the definition of biological processes. Furthermore, we investigate the robustness of the results by perturbing the PINs simulating in this way our incomplete knowledge of protein networks. CONCLUSIONS: Overall, our study reveals that there is a large heterogeneity among the different module prediction algorithms if one zooms-in the biological level of biological processes in the form of GO terms and all methods are severely affected by a slight perturbation of the networks. However, we also find pathways that are enriched in multiple modules, which could provide important information about the hierarchical organization of the system.


Asunto(s)
Algoritmos , Ontología de Genes , Mapeo de Interacción de Proteínas , Animales , Análisis por Conglomerados , Eucariontes/metabolismo , Humanos
11.
Bioinformatics ; 30(3): 360-8, 2014 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-24292935

RESUMEN

MOTIVATION: To date, gene set analysis approaches primarily focus on identifying differentially expressed gene sets (pathways). Methods for identifying differentially coexpressed pathways also exist but are mostly based on aggregated pairwise correlations or other pairwise measures of coexpression. Instead, we propose Gene Sets Net Correlations Analysis (GSNCA), a multivariate differential coexpression test that accounts for the complete correlation structure between genes. RESULTS: In GSNCA, weight factors are assigned to genes in proportion to the genes' cross-correlations (intergene correlations). The problem of finding the weight vectors is formulated as an eigenvector problem with a unique solution. GSNCA tests the null hypothesis that for a gene set there is no difference in the weight vectors of the genes between two conditions. In simulation studies and the analyses of experimental data, we demonstrate that GSNCA captures changes in the structure of genes' cross-correlations rather than differences in the averaged pairwise correlations. Thus, GSNCA infers differences in coexpression networks, however, bypassing method-dependent steps of network inference. As an additional result from GSNCA, we define hub genes as genes with the largest weights and show that these genes correspond frequently to major and specific pathway regulators, as well as to genes that are most affected by the biological difference between two conditions. In summary, GSNCA is a new approach for the analysis of differentially coexpressed pathways that also evaluates the importance of the genes in the pathways, thus providing unique information that may result in the generation of novel biological hypotheses. AVAILABILITY AND IMPLEMENTATION: Implementation of the GSNCA test in R is available upon request from the authors.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Redes Reguladoras de Genes , Línea Celular Tumoral , Genes p53 , Humanos , Análisis Multivariante , Análisis de Secuencia por Matrices de Oligonucleótidos
12.
Bioinformatics ; 30(19): 2834-6, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-24928209

RESUMEN

SUMMARY: NetBioV (Network Biology Visualization) is an R package that allows the visualization of large network data in biology and medicine. The purpose of NetBioV is to enable an organized and reproducible visualization of networks by emphasizing or highlighting specific structural properties that are of biological relevance. AVAILABILITY AND IMPLEMENTATION: NetBioV is freely available for academic use. The package has been tested for R 2.14.2 under Linux, Windows and Mac OS X. It is available from Bioconductor.


Asunto(s)
Biología Computacional/métodos , Redes Reguladoras de Genes , Programas Informáticos , Algoritmos , Arabidopsis/metabolismo , Gráficos por Computador , Humanos , Linfoma de Células B/metabolismo , Lenguajes de Programación , Reproducibilidad de los Resultados
13.
BMC Cancer ; 15: 7, 2015 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-25588624

RESUMEN

BACKGROUND: Oncology is a field that profits tremendously from the genomic data generated by high-throughput technologies, including next-generation sequencing. However, in order to exploit, integrate, visualize and interpret such high-dimensional data efficiently, non-trivial computational and statistical analysis methods are required that need to be developed in a problem-directed manner. DISCUSSION: For this reason, computational cancer biology aims to fill this gap. Unfortunately, computational cancer biology is not yet fully recognized as a coequal field in oncology, leading to a delay in its maturation and, as an immediate consequence, an under-exploration of high-throughput data for translational research. Here we argue that this imbalance, favoring 'wet lab-based activities', will be naturally rectified over time, if the next generation of scientists receives an academic education that provides a fair and competent introduction to computational biology and its manifold capabilities. Furthermore, we discuss a number of local educational provisions that can be implemented on university level to help in facilitating the process of harmonization.


Asunto(s)
Biología Computacional/educación , Neoplasias , Biología Computacional/métodos , Interpretación Estadística de Datos , Humanos , Universidades
14.
Nucleic Acids Res ; 41(7): e82, 2013 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-23389952

RESUMEN

In this article, we focus on the analysis of competitive gene set methods for detecting the statistical significance of pathways from gene expression data. Our main result is to demonstrate that some of the most frequently used gene set methods, GSEA, GSEArot and GAGE, are severely influenced by the filtering of the data in a way that such an analysis is no longer reconcilable with the principles of statistical inference, rendering the obtained results in the worst case inexpressive. A possible consequence of this is that these methods can increase their power by the addition of unrelated data and noise. Our results are obtained within a bootstrapping framework that allows a rigorous assessment of the robustness of results and enables power estimates. Our results indicate that when using competitive gene set methods, it is imperative to apply a stringent gene filtering criterion. However, even when genes are filtered appropriately, for gene expression data from chips that do not provide a genome-scale coverage of the expression values of all mRNAs, this is not enough for GSEA, GSEArot and GAGE to ensure the statistical soundness of the applied procedure. For this reason, for biomedical and clinical studies, we strongly advice not to use GSEA, GSEArot and GAGE for such data sets.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Interpretación Estadística de Datos , Femenino , Regulación Neoplásica de la Expresión Génica , Genómica/métodos , Humanos , Masculino , Análisis de Secuencia por Matrices de Oligonucleótidos , Leucemia-Linfoma Linfoblástico de Células Precursoras/genética , Leucemia-Linfoma Linfoblástico de Células Precursoras/metabolismo , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/metabolismo , Tamaño de la Muestra
15.
Genomics ; 103(5-6): 329-36, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24691108

RESUMEN

Although many methods have been developed for inference of biological networks, the validation of the resulting models has largely remained an unsolved problem. Here we present a framework for quantitative assessment of inferred gene interaction networks using knock-down data from cell line experiments. Using this framework we are able to show that network inference based on integration of prior knowledge derived from the biomedical literature with genomic data significantly improves the quality of inferred networks relative to other approaches. Our results also suggest that cell line experiments can be used to quantitatively assess the quality of networks inferred from tumor samples.


Asunto(s)
Perfilación de la Expresión Génica , Redes Reguladoras de Genes , Línea Celular Tumoral , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/metabolismo , Humanos , Transcriptoma , Estudios de Validación como Asunto
16.
Chin J Cancer ; 34(10): 427-38, 2015 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-26253000

RESUMEN

BACKGROUND: Data from RNA-seq experiments provide a wealth of information about the transcriptome of an organism. However, the analysis of such data is very demanding. In this study, we aimed to establish robust analysis procedures that can be used in clinical practice. METHODS: We studied RNA-seq data from triple-negative breast cancer patients. Specifically, we investigated the subsampling of RNA-seq data. RESULTS: The main results of our investigations are as follows: (1) the subsampling of RNA-seq data gave biologically realistic simulations of sequencing experiments with smaller sequencing depth but not direct scaling of count matrices; (2) the saturation of results required an average sequencing depth larger than 32 million reads and an individual sequencing depth larger than 46 million reads; and (3) for an abrogated feature selection, higher moments of the distribution of all expressed genes had a higher sensitivity for signal detection than the corresponding mean values. CONCLUSIONS: Our results reveal important characteristics of RNA-seq data that must be understood before one can apply such an approach to translational medicine.


Asunto(s)
Perfilación de la Expresión Génica , ARN , Neoplasias de la Mama Triple Negativas , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Transcriptoma
17.
BMC Bioinformatics ; 15: 397, 2014 Dec 05.
Artículo en Inglés | MEDLINE | ID: mdl-25475910

RESUMEN

BACKGROUND: Over the last few years transcriptome sequencing (RNA-Seq) has almost completely taken over microarrays for high-throughput studies of gene expression. Currently, the most popular use of RNA-Seq is to identify genes which are differentially expressed between two or more conditions. Despite the importance of Gene Set Analysis (GSA) in the interpretation of the results from RNA-Seq experiments, the limitations of GSA methods developed for microarrays in the context of RNA-Seq data are not well understood. RESULTS: We provide a thorough evaluation of popular multivariate and gene-level self-contained GSA approaches on simulated and real RNA-Seq data. The multivariate approach employs multivariate non-parametric tests combined with popular normalizations for RNA-Seq data. The gene-level approach utilizes univariate tests designed for the analysis of RNA-Seq data to find gene-specific P-values and combines them into a pathway P-value using classical statistical techniques. Our results demonstrate that the Type I error rate and the power of multivariate tests depend only on the test statistics and are insensitive to the different normalizations. In general standard multivariate GSA tests detect pathways that do not have any bias in terms of pathways size, percentage of differentially expressed genes, or average gene length in a pathway. In contrast the Type I error rate and the power of gene-level GSA tests are heavily affected by the methods for combining P-values, and all aforementioned biases are present in detected pathways. CONCLUSIONS: Our result emphasizes the importance of using self-contained non-parametric multivariate tests for detecting differentially expressed pathways for RNA-Seq data and warns against applying gene-level GSA tests, especially because of their high level of Type I error rates for both, simulated and real data.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Genes Ligados a X , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN/genética , Análisis de Secuencia de ARN/métodos , Transducción de Señal , Simulación por Computador , Femenino , Humanos , Linfocitos/metabolismo , Masculino , Programas Informáticos
18.
BMC Bioinformatics ; 15 Suppl 6: S6, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25079297

RESUMEN

Cancer is a complex disease that has proven to be difficult to understand on the single-gene level. For this reason a functional elucidation needs to take interactions among genes on a systems-level into account. In this study, we infer a colon cancer network from a large-scale gene expression data set by using the method BC3Net. We provide a structural and a functional analysis of this network and also connect its molecular interaction structure with the chromosomal locations of the genes enabling the definition of cis- and trans-interactions. Furthermore, we investigate the interaction of genes that can be found in close neighborhoods on the chromosomes to gain insight into regulatory mechanisms. To our knowledge this is the first study analyzing the genome-scale colon cancer network.


Asunto(s)
Neoplasias del Colon/genética , Redes Reguladoras de Genes , Neoplasias del Colon/metabolismo , Biología Computacional , Perfilación de la Expresión Génica , Humanos , Proteínas/genética , Proteínas/metabolismo
20.
J Transl Med ; 12: 26, 2014 Jan 25.
Artículo en Inglés | MEDLINE | ID: mdl-24460894

RESUMEN

This is a report on the 4th international conference in 'Quantitative Biology and Bioinformatics in Modern Medicine' held in Belfast (UK), 19-20 September 2013. The aim of the conference was to bring together leading experts from a variety of different areas that are key for Systems Medicine to exchange novel findings and promote interdisciplinary ideas and collaborations.


Asunto(s)
Biología Computacional/métodos , Medicina , Biomarcadores de Tumor/metabolismo , Bases de Datos Genéticas , Descubrimiento de Drogas , Genómica , Humanos , Farmacogenética , Transducción de Señal , Biología de Sistemas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA