Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Biostatistics ; 24(1): 85-107, 2022 12 12.
Artículo en Inglés | MEDLINE | ID: mdl-34363680

RESUMEN

Risk prediction models are a crucial tool in healthcare. Risk prediction models with a binary outcome (i.e., binary classification models) are often constructed using methodology which assumes the costs of different classification errors are equal. In many healthcare applications, this assumption is not valid, and the differences between misclassification costs can be quite large. For instance, in a diagnostic setting, the cost of misdiagnosing a person with a life-threatening disease as healthy may be larger than the cost of misdiagnosing a healthy person as a patient. In this article, we present Tailored Bayes (TB), a novel Bayesian inference framework which "tailors" model fitting to optimize predictive performance with respect to unbalanced misclassification costs. We use simulation studies to showcase when TB is expected to outperform standard Bayesian methods in the context of logistic regression. We then apply TB to three real-world applications, a cardiac surgery, a breast cancer prognostication task, and a breast cancer tumor classification task and demonstrate the improvement in predictive performance over standard methods.


Asunto(s)
Neoplasias de la Mama , Modelos Estadísticos , Humanos , Femenino , Teorema de Bayes , Modelos Logísticos , Simulación por Computador , Neoplasias de la Mama/diagnóstico
2.
Biostatistics ; 21(2): 219-235, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-30192903

RESUMEN

We consider high-dimensional regression over subgroups of observations. Our work is motivated by biomedical problems, where subsets of samples, representing for example disease subtypes, may differ with respect to underlying regression models. In the high-dimensional setting, estimating a different model for each subgroup is challenging due to limited sample sizes. Focusing on the case in which subgroup-specific models may be expected to be similar but not necessarily identical, we treat subgroups as related problem instances and jointly estimate subgroup-specific regression coefficients. This is done in a penalized framework, combining an $\ell_1$ term with an additional term that penalizes differences between subgroup-specific coefficients. This gives solutions that are globally sparse but that allow information-sharing between the subgroups. We present algorithms for estimation and empirical results on simulated data and using Alzheimer's disease, amyotrophic lateral sclerosis, and cancer datasets. These examples demonstrate the gains joint estimation can offer in prediction as well as in providing subgroup-specific sparsity patterns.


Asunto(s)
Algoritmos , Investigación Biomédica/métodos , Bioestadística/métodos , Pronóstico , Análisis de Regresión , Enfermedad de Alzheimer/diagnóstico , Esclerosis Amiotrófica Lateral/diagnóstico , Simulación por Computador , Humanos , Neoplasias/tratamiento farmacológico , Proyectos de Investigación
3.
Nat Methods ; 13(4): 310-8, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-26901648

RESUMEN

It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.


Asunto(s)
Causalidad , Redes Reguladoras de Genes , Neoplasias/genética , Mapeo de Interacción de Proteínas/métodos , Programas Informáticos , Biología de Sistemas , Algoritmos , Biología Computacional , Simulación por Computador , Perfilación de la Expresión Génica , Humanos , Modelos Biológicos , Transducción de Señal , Células Tumorales Cultivadas
4.
Bioinformatics ; 33(18): 2890-2896, 2017 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-28535188

RESUMEN

MOTIVATION: Molecular pathways and networks play a key role in basic and disease biology. An emerging notion is that networks encoding patterns of molecular interplay may themselves differ between contexts, such as cell type, tissue or disease (sub)type. However, while statistical testing of differences in mean expression levels has been extensively studied, testing of network differences remains challenging. Furthermore, since network differences could provide important and biologically interpretable information to identify molecular subgroups, there is a need to consider the unsupervised task of learning subgroups and networks that define them. This is a nontrivial clustering problem, with neither subgroups nor subgroup-specific networks known at the outset. RESULTS: We leverage recent ideas from high-dimensional statistics for testing and clustering in the network biology setting. The methods we describe can be applied directly to most continuous molecular measurements and networks do not need to be specified beforehand. We illustrate the ideas and methods in a case study using protein data from The Cancer Genome Atlas (TCGA). This provides evidence that patterns of interplay between signalling proteins differ significantly between cancer types. Furthermore, we show how the proposed approaches can be used to learn subtypes and the molecular networks that define them. AVAILABILITY AND IMPLEMENTATION: As the Bioconductor package nethet. CONTACT: staedler.n@gmail.com or sach.mukherjee@dzne.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Neoplasias/metabolismo , Análisis por Conglomerados , Femenino , Humanos , Proteínas de Neoplasias , Neoplasias/genética , Transducción de Señal
5.
Biostatistics ; 16(1): 47-59, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24974316

RESUMEN

The identification of predefined groups of genes ("gene-sets") which are differentially expressed between two conditions ("gene-set analysis", or GSA) is a very popular analysis in bioinformatics. GSA incorporates biological knowledge by aggregating over genes that are believed to be functionally related. This can enhance statistical power over analyses that consider only one gene at a time. However, currently available GSA approaches are based on univariate two-sample comparison of single genes. This means that they cannot test for multivariate hypotheses such as differences in covariance structure between the two conditions. Yet interplay between genes is a central aspect of biological investigation and it is likely that such interplay may differ between conditions. This paper proposes a novel approach for gene-set analysis that allows for truly multivariate hypotheses, in particular differences in gene-gene networks between conditions. Testing hypotheses concerning networks is challenging due the nature of the underlying estimation problem. Our starting point is a recent, general approach for high-dimensional two-sample testing. We refine the approach and show how it can be used to perform multivariate, network-based gene-set testing. We validate the approach in simulated examples and show results using high-throughput data from several studies in cancer biology.


Asunto(s)
Bioestadística/métodos , Expresión Génica/genética , Redes Reguladoras de Genes/genética , Modelos Genéticos , Modelos Estadísticos , Humanos , Neoplasias/genética
6.
Bioinformatics ; 30(17): i468-74, 2014 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-25161235

RESUMEN

MOTIVATION: Networks are widely used as structural summaries of biochemical systems. Statistical estimation of networks is usually based on linear or discrete models. However, the dynamics of biochemical systems are generally non-linear, suggesting that suitable non-linear formulations may offer gains with respect to causal network inference and aid in associated prediction problems. RESULTS: We present a general framework for network inference and dynamical prediction using time course data that is rooted in non-linear biochemical kinetics. This is achieved by considering a dynamical system based on a chemical reaction graph with associated kinetic parameters. Both the graph and kinetic parameters are treated as unknown; inference is carried out within a Bayesian framework. This allows prediction of dynamical behavior even when the underlying reaction graph itself is unknown or uncertain. Results, based on (i) data simulated from a mechanistic model of mitogen-activated protein kinase signaling and (ii) phosphoproteomic data from cancer cell lines, demonstrate that non-linear formulations can yield gains in causal network inference and permit dynamical prediction and uncertainty quantification in the challenging setting where the reaction graph is unknown. AVAILABILITY AND IMPLEMENTATION: MATLAB R2014a software is available to download from warwick.ac.uk/chrisoates. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Transducción de Señal , Teorema de Bayes , Línea Celular Tumoral , Humanos , Cinética , Sistema de Señalización de MAP Quinasas , Modelos Químicos
7.
Sci Rep ; 14(1): 11861, 2024 05 24.
Artículo en Inglés | MEDLINE | ID: mdl-38789621

RESUMEN

The Integrative Cluster subtypes (IntClusts) provide a framework for the classification of breast cancer tumors into 10 distinct groups based on copy number and gene expression, each with unique biological drivers of disease and clinical prognoses. Gene expression data is often lacking, and accurate classification of samples into IntClusts with copy number data alone is essential. Current classification methods achieve low accuracy when gene expression data are absent, warranting the development of new approaches to IntClust classification. Copy number data from 1980 breast cancer samples from METABRIC was used to train multiclass XGBoost machine learning algorithms (CopyClust). A piecewise constant fit was applied to the average copy number profile of each IntClust and unique breakpoints across the 10 profiles were identified and converted into ~ 500 genomic regions used as features for CopyClust. These models consisted of two approaches: a 10-class model with the final IntClust label predicted by a single multiclass model and a 6-class model with binary reclassification in which four pairs of IntClusts were combined for initial multiclass classification. Performance was validated on the TCGA dataset, with copy number data generated from both SNP arrays and WES platforms. CopyClust achieved 81% and 79% overall accuracy with the TCGA SNP and WES datasets, respectively, a nine-percentage point or greater improvement in overall IntClust subtype classification accuracy. CopyClust achieves a significant improvement over current methods in classification accuracy of IntClust subtypes for samples without available gene expression data and is an easily implementable algorithm for IntClust classification of breast cancer samples with copy number data.


Asunto(s)
Algoritmos , Neoplasias de la Mama , Variaciones en el Número de Copia de ADN , Aprendizaje Automático , Humanos , Neoplasias de la Mama/genética , Neoplasias de la Mama/clasificación , Femenino , Variaciones en el Número de Copia de ADN/genética , Análisis por Conglomerados , Perfilación de la Expresión Génica/métodos
8.
Bioinformatics ; 28(18): 2342-8, 2012 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-22815361

RESUMEN

MOTIVATION: Network inference approaches are widely used to shed light on regulatory interplay between molecular players such as genes and proteins. Biochemical processes underlying networks of interest (e.g. gene regulatory or protein signalling networks) are generally nonlinear. In many settings, knowledge is available concerning relevant chemical kinetics. However, existing network inference methods for continuous, steady-state data are typically rooted in statistical formulations, which do not exploit chemical kinetics to guide inference. RESULTS: Herein, we present an approach to network inference for steady-state data that is rooted in non-linear descriptions of biochemical mechanism. We use equilibrium analysis of chemical kinetics to obtain functional forms that are in turn used to infer networks using steady-state data. The approach we propose is directly applicable to conventional steady-state gene expression or proteomic data and does not require knowledge of either network topology or any kinetic parameters. We illustrate the approach in the context of protein phosphorylation networks, using data simulated from a recent mechanistic model and proteomic data from cancer cell lines. In the former, the true network is known and used for assessment, whereas in the latter, results are compared against known biochemistry. We find that the proposed methodology is more effective at estimating network topology than methods based on linear models. AVAILABILITY: mukherjeelab.nki.nl/CODE/GK_Kinetics.zip CONTACT: c.j.oates@warwick.ac.uk; s.mukherjee@nki.nl SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Proteómica , Biología de Sistemas/métodos , Neoplasias de la Mama/enzimología , Línea Celular Tumoral , Femenino , Redes Reguladoras de Genes , Humanos , Cinética , Sistema de Señalización de MAP Quinasas , Cadenas de Markov , Método de Montecarlo , Fosforilación
9.
Bioinformatics ; 28(21): 2804-10, 2012 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-22923301

RESUMEN

MOTIVATION: Protein signaling networks play a key role in cellular function, and their dysregulation is central to many diseases, including cancer. To shed light on signaling network topology in specific contexts, such as cancer, requires interrogation of multiple proteins through time and statistical approaches to make inferences regarding network structure. RESULTS: In this study, we use dynamic Bayesian networks to make inferences regarding network structure and thereby generate testable hypotheses. We incorporate existing biology using informative network priors, weighted objectively by an empirical Bayes approach, and exploit a connection between variable selection and network inference to enable exact calculation of posterior probabilities of interest. The approach is computationally efficient and essentially free of user-set tuning parameters. Results on data where the true, underlying network is known place the approach favorably relative to existing approaches. We apply these methods to reverse-phase protein array time-course data from a breast cancer cell line (MDA-MB-468) to predict signaling links that we independently validate using targeted inhibition. The methods proposed offer a general approach by which to elucidate molecular networks specific to biological context, including, but not limited to, human cancers. AVAILABILITY: http://mukherjeelab.nki.nl/DBN (code and data).


Asunto(s)
Teorema de Bayes , Neoplasias de la Mama/metabolismo , Modelos Moleculares , Modelos Estadísticos , Transducción de Señal , Área Bajo la Curva , Neoplasias de la Mama/patología , Comunicación Celular , Línea Celular Tumoral , Simulación por Computador , Femenino , Humanos , Probabilidad
10.
BMC Bioinformatics ; 13: 94, 2012 May 11.
Artículo en Inglés | MEDLINE | ID: mdl-22578440

RESUMEN

BACKGROUND: An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data. RESULTS: We put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information. CONCLUSIONS: The empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge.


Asunto(s)
Simulación por Computador , Modelos Biológicos , Neoplasias/metabolismo , Antineoplásicos/farmacología , Teorema de Bayes , Biomarcadores Farmacológicos/metabolismo , Humanos , Funciones de Verosimilitud , Probabilidad , Proyectos de Investigación
11.
Bioinformatics ; 27(7): 994-1000, 2011 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-21317141

RESUMEN

MOTIVATION: Networks and pathways are important in describing the collective biological function of molecular players such as genes or proteins. In many areas of biology, for example in cancer studies, available data may harbour undiscovered subtypes which differ in terms of network phenotype. That is, samples may be heterogeneous with respect to underlying molecular networks. This motivates a need for unsupervised methods capable of discovering such subtypes and elucidating the corresponding network structures. RESULTS: We exploit recent results in sparse graphical model learning to put forward a 'network clustering' approach in which data are partitioned into subsets that show evidence of underlying, subset-level network structure. This allows us to simultaneously learn subset-specific networks and corresponding subset membership under challenging small-sample conditions. We illustrate this approach on synthetic and proteomic data. AVAILABILITY: go.warwick.ac.uk/sachmukherjee/networkclustering.


Asunto(s)
Modelos Biológicos , Proteómica/métodos , Línea Celular Tumoral , Análisis por Conglomerados , Biología Computacional/métodos , Gráficos por Computador , Humanos , Neoplasias/metabolismo , Transducción de Señal
12.
Elife ; 112022 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-36043458

RESUMEN

Omics-based technologies are driving major advances in precision medicine, but efforts are still required to consolidate their use in drug discovery. In this work, we exemplify the use of multi-omics to support the development of 3-chloropiperidines, a new class of candidate anticancer agents. Combined analyses of transcriptome and chromatin accessibility elucidated the mechanisms underlying sensitivity to test agents. Furthermore, we implemented a new versatile strategy for the integration of RNA- and ATAC-seq (Assay for Transposase-Accessible Chromatin) data, able to accelerate and extend the standalone analyses of distinct omic layers. This platform guided the construction of a perturbation-informed basal signature predicting cancer cell lines' sensitivity and to further direct compound development against specific tumor types. Overall, this approach offers a scalable pipeline to support the early phases of drug discovery, understanding of mechanisms, and potentially inform the positioning of therapeutics in the clinic.


Asunto(s)
Cromatina , Transcriptoma , Medicina de Precisión , ARN , Transposasas/metabolismo
13.
iScience ; 25(11): 105328, 2022 Nov 18.
Artículo en Inglés | MEDLINE | ID: mdl-36310583

RESUMEN

Population-scale datasets of healthy individuals capture genetic and environmental factors influencing gene expression. The expression variance of a gene of interest (GOI) can be exploited to set up a quasi loss- or gain-of-function "in population" experiment. We describe here an approach, huva (human variation), taking advantage of population-scale multi-layered data to infer gene function and relationships between phenotypes and expression. Within a reference dataset, huva derives two experimental groups with LOW or HIGH expression of the GOI, enabling the subsequent comparison of their transcriptional profile and functional parameters. We demonstrate that this approach robustly identifies the phenotypic relevance of a GOI allowing the stratification of genes according to biological functions, and we generalize this concept to almost 16,000 genes in the human transcriptome. Additionally, we describe how huva predicts monocytes to be the major cell type in the pathophysiology of STAT1 mutations, evidence validated in a clinical cohort.

14.
Bioinformatics ; 26(3): 355-62, 2010 Feb 01.
Artículo en Inglés | MEDLINE | ID: mdl-19996165

RESUMEN

MOTIVATION: Identifying regulatory modules is an important task in the exploratory analysis of gene expression time series data. Clustering algorithms are often used for this purpose. However, gene regulatory events may induce complex temporal features in a gene expression profile, including time delays, inversions and transient correlations, which are not well accounted for by current clustering methods. As the cost of microarray experiments continues to fall, the temporal resolution of time course studies is increasing. This has led to a need to take account of detailed temporal features of this kind. Thus, while standard clustering methods are both widely used and much studied, their shared shortcomings with respect to such temporal features motivates the work presented here. RESULTS: Here, we introduce a temporal clustering approach for high-dimensional gene expression data which takes account of time delays, inversions and transient correlations. We do so by exploiting a recently introduced, message-passing-based algorithm called Affinity Propagation (AP). We take account of temporal features of interest following an approximate but efficient dynamic programming approach due to Qian et al. The resulting approach is demonstrably effective in its ability to discern non-obvious temporal features, yet efficient and robust enough for routine use as an exploratory tool. We show results on validated transcription factor-target pairs in yeast and on gene expression data from a study of Arabidopsis thaliana under pathogen infection. The latter reveals a number of biologically striking findings. AVAILABILITY: Matlab code for our method is available at http://www.wsbc.warwick.ac.uk/stevenkiddle/tcap.html.


Asunto(s)
Arabidopsis/genética , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Factores de Transcripción/genética , Análisis por Conglomerados , Bases de Datos Genéticas , Transcripción Genética
15.
Proc Natl Acad Sci U S A ; 105(38): 14313-8, 2008 Sep 23.
Artículo en Inglés | MEDLINE | ID: mdl-18799736

RESUMEN

Recent years have seen much interest in the study of systems characterized by multiple interacting components. A class of statistical models called graphical models, in which graphs are used to represent probabilistic relationships between variables, provides a framework for formal inference regarding such systems. In many settings, the object of inference is the network structure itself. This problem of "network inference" is well known to be a challenging one. However, in scientific settings there is very often existing information regarding network connectivity. A natural idea then is to take account of such information during inference. This article addresses the question of incorporating prior information into network inference. We focus on directed models called Bayesian networks, and use Markov chain Monte Carlo to draw samples from posterior distributions over network structures. We introduce prior distributions on graphs capable of capturing information regarding network features including edges, classes of edges, degree distributions, and sparsity. We illustrate our approach in the context of systems biology, applying our methods to network inference in cancer signaling.


Asunto(s)
Redes Reguladoras de Genes , Modelos Biológicos , Neoplasias/metabolismo , Transducción de Señal , Teorema de Bayes , Simulación por Computador , Receptores ErbB/metabolismo , Cadenas de Markov , Método de Montecarlo , Biología de Sistemas
16.
Bioinformatics ; 25(2): 265-71, 2009 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-19038985

RESUMEN

MOTIVATION: Combinatorial effects, in which several variables jointly influence an output or response, play an important role in biological systems. In many settings, Boolean functions provide a natural way to describe such influences. However, biochemical data using which we may wish to characterize such influences are usually subject to much variability. Furthermore, in high-throughput biological settings Boolean relationships of interest are very often sparse, in the sense of being embedded in an overall dataset of higher dimensionality. This motivates a need for statistical methods capable of making inferences regarding Boolean functions under conditions of noise and sparsity. RESULTS: We put forward a statistical model for sparse, noisy Boolean functions and methods for inference under the model. We focus on the case in which the form of the underlying Boolean function, as well as the number and identity of its inputs are all unknown. We present results on synthetic data and on a study of signalling proteins in cancer biology.


Asunto(s)
Biología Computacional/métodos , Modelos Estadísticos , Neoplasias/metabolismo , Algoritmos , Proteínas/química , Proteínas/metabolismo
17.
Stat Comput ; 30(3): 697-719, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32132772

RESUMEN

Penalized likelihood approaches are widely used for high-dimensional regression. Although many methods have been proposed and the associated theory is now well developed, the relative efficacy of different approaches in finite-sample settings, as encountered in practice, remains incompletely understood. There is therefore a need for empirical investigations in this area that can offer practical insight and guidance to users. In this paper, we present a large-scale comparison of penalized regression methods. We distinguish between three related goals: prediction, variable selection and variable ranking. Our results span more than 2300 data-generating scenarios, including both synthetic and semisynthetic data (real covariates and simulated responses), allowing us to systematically consider the influence of various factors (sample size, dimensionality, sparsity, signal strength and multicollinearity). We consider several widely used approaches (Lasso, Adaptive Lasso, Elastic Net, Ridge Regression, SCAD, the Dantzig Selector and Stability Selection). We find considerable variation in performance between methods. Our results support a "no panacea" view, with no unambiguous winner across all scenarios or goals, even in this restricted setting where all data align well with the assumptions underlying the methods. The study allows us to make some recommendations as to which approaches may be most (or least) suitable given the goal and some data characteristics. Our empirical results complement existing theory and provide a resource to compare methods across a range of scenarios and metrics.

18.
iScience ; 23(1): 100780, 2020 Jan 24.
Artículo en Inglés | MEDLINE | ID: mdl-31918046

RESUMEN

Acute myeloid leukemia (AML) is a severe, mostly fatal hematopoietic malignancy. We were interested in whether transcriptomic-based machine learning could predict AML status without requiring expert input. Using 12,029 samples from 105 different studies, we present a large-scale study of machine learning-based prediction of AML in which we address key questions relating to the combination of machine learning and transcriptomics and their practical use. We find data-driven, high-dimensional approaches-in which multivariate signatures are learned directly from genome-wide data with no prior knowledge-to be accurate and robust. Importantly, these approaches are highly scalable with low marginal cost, essentially matching human expert annotation in a near-automated workflow. Our results support the notion that transcriptomics combined with machine learning could be used as part of an integrated -omics approach wherein risk prediction, differential diagnosis, and subclassification of AML are achieved by genomics while diagnosis could be assisted by transcriptomic-based machine learning.

19.
Methods Mol Biol ; 1883: 25-48, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30547395

RESUMEN

In this chapter, we review the problem of network inference from time-course data, focusing on a class of graphical models known as dynamic Bayesian networks (DBNs). We discuss the relationship of DBNs to models based on ordinary differential equations, and consider extensions to nonlinear time dynamics. We provide an introduction to time-varying DBN models, which allow for changes to the network structure and parameters over time. We also discuss causal perspectives on network inference, including issues around model semantics that can arise due to missing variables. We present a case study of applying time-varying DBNs to gene expression measurements over the life cycle of Drosophila melanogaster. We finish with a discussion of future perspectives, including possible applications of time-varying network inference to single-cell gene expression data.


Asunto(s)
Biología Computacional/métodos , Redes Reguladoras de Genes , Modelos Genéticos , Animales , Teorema de Bayes , Biología Computacional/instrumentación , Biología Computacional/tendencias , Drosophila melanogaster/genética , Perfilación de la Expresión Génica/instrumentación , Perfilación de la Expresión Génica/métodos , Dinámicas no Lineales , Análisis de la Célula Individual/instrumentación , Análisis de la Célula Individual/métodos
20.
J Mach Learn Res ; 20: 127, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31992961

RESUMEN

This paper frames causal structure estimation as a machine learning task. The idea is to treat indicators of causal relationships between variables as 'labels' and to exploit available data on the variables of interest to provide features for the labelling task. Background scientific knowledge or any available interventional data provide labels on some causal relationships and the remainder are treated as unlabelled. To illustrate the key ideas, we develop a distance-based approach (based on bivariate histograms) within a manifold regularization framework. We present empirical results on three different biological data sets (including examples where causal effects can be verified by experimental intervention), that together demonstrate the efficacy and general nature of the approach as well as its simplicity from a user's point of view.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA