Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
1.
Bioinform Biol Insights ; 18: 11779322241271535, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39286768

RESUMEN

Tumor heterogeneity is a challenge to designing effective and targeted therapies. Glioma-type identification depends on specific molecular and histological features, which are defined by the official World Health Organization (WHO) classification of the central nervous system (CNS). These guidelines are constantly updated to support the diagnosis process, which affects all the successive clinical decisions. In this context, the search for new potential diagnostic and prognostic targets, characteristic of each glioma type, is crucial to support the development of novel therapies. Based on The Cancer Genome Atlas (TCGA) glioma RNA-sequencing data set updated according to the 2016 and 2021 WHO guidelines, we proposed a 2-step variable selection approach for biomarker discovery. Our framework encompasses the graphical lasso algorithm to estimate sparse networks of genes carrying diagnostic information. These networks are then used as input for regularized Cox survival regression model, allowing the identification of a smaller subset of genes with prognostic value. In each step, the results derived from the 2016 and 2021 classes were discussed and compared. For both WHO glioma classifications, our analysis identifies potential biomarkers, characteristic of each glioma type. Yet, better results were obtained for the WHO CNS classification in 2021, thereby supporting recent efforts to include molecular data on glioma classification.

2.
Sci Rep ; 14(1): 18105, 2024 08 05.
Artículo en Inglés | MEDLINE | ID: mdl-39103384

RESUMEN

In complex systems, it's crucial to uncover latent mechanisms and their context-dependent relationships. This is especially true in medical research, where identifying unknown cancer mechanisms and their impact on phenomena like drug resistance is vital. Directly observing these mechanisms is challenging due to measurement complexities, leading to an approach that infers latent mechanisms from observed variable distributions. Despite machine learning advancements enabling sophisticated generative models, their black-box nature complicates the interpretation of complex latent mechanisms. A promising method for understanding these mechanisms involves estimating latent factors through linear projection, though there's no assurance that inferences made under specific conditions will remain valid across contexts. We propose a novel solution, suggesting data, even from systems appearing complex, can often be explained by sparse dependencies among a few common latent factors, regardless of the situation. This simplification allows for modeling that yields significant insights across diverse fields. We demonstrate this with datasets from finance, where we capture societal trends from stock price movements, and medicine, where we uncover new insights into cancer drug resistance through gene expression analysis.


Asunto(s)
Neoplasias , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Aprendizaje Automático , Resistencia a Antineoplásicos
3.
bioRxiv ; 2024 Jan 23.
Artículo en Inglés | MEDLINE | ID: mdl-38328080

RESUMEN

Background: Gene co-expression networks (GCNs) describe relationships among expressed genes key to maintaining cellular identity and homeostasis. However, the small sample size of typical RNA-seq experiments which is several orders of magnitude fewer than the number of genes is too low to infer GCNs reliably. recount3, a publicly available dataset comprised of 316,443 uniformly processed human RNA-seq samples, provides an opportunity to improve power for accurate network reconstruction and obtain biological insight from the resulting networks. Results: We compared alternate aggregation strategies to identify an optimal workflow for GCN inference by data aggregation and inferred three consensus networks: a universal network, a non-cancer network, and a cancer network in addition to 27 tissue context-specific networks. Central network genes from our consensus networks were enriched for evolutionarily constrained genes and ubiquitous biological pathways, whereas central context-specific network genes included tissue-specific transcription factors and factorization based on the hubs led to clustering of related tissue contexts. We discovered that annotations corresponding to context-specific networks inferred from aggregated data were enriched for trait heritability beyond known functional genomic annotations and were significantly more enriched when we aggregated over a larger number of samples. Conclusion: This study outlines best practices for network GCN inference and evaluation by data aggregation. We recommend estimating and regressing confounders in each data set before aggregation and prioritizing large sample size studies for GCN reconstruction. Increased statistical power in inferring context-specific networks enabled the derivation of variant annotations that were enriched for concordant trait heritability independent of functional genomic annotations that are context-agnostic. While we observed strictly increasing held-out log-likelihood with data aggregation, we noted diminishing marginal improvements. Future directions aimed at alternate methods for estimating confounders and integrating orthogonal information from modalities such as Hi-C and ChIP-seq can further improve GCN inference.

4.
Multivariate Behav Res ; 59(3): 461-481, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38247019

RESUMEN

Network analysis has gained popularity as an approach to investigate psychological constructs. However, there are currently no guidelines for applied researchers when encountering missing values. In this simulation study, we compared the performance of a two-step EM algorithm with separated steps for missing handling and regularization, a combined direct EM algorithm, and pairwise deletion. We investigated conditions with varying network sizes, numbers of observations, missing data mechanisms, and percentages of missing values. These approaches are evaluated with regard to recovering population networks in terms of loss in the precision matrix, edge set identification and network statistics. The simulation showed adequate performance only in conditions with large samples (n≥500) or small networks (p = 10). Comparing the missing data approaches, the direct EM appears to be more sensitive and superior in nearly all chosen conditions. The two-step EM yields better results when the ratio of n/p is very large - being less sensitive but more specific. Pairwise deletion failed to converge across numerous conditions and yielded inferior results overall. Overall, direct EM is recommended in most cases, as it is able to mitigate the impact of missing data quite well, while modifications to two-step EM could improve its performance.


Asunto(s)
Algoritmos , Simulación por Computador , Humanos , Simulación por Computador/estadística & datos numéricos , Interpretación Estadística de Datos , Modelos Estadísticos
5.
Neuroimage Clin ; 39: 103488, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37660556

RESUMEN

Notable success has been achieved in the study of neurodegenerative conditions using reduction techniques such as principal component analysis (PCA) and sparse inverse covariance estimation (SICE) in positron emission tomography (PET) data despite their widely differing approach. In a recent study of SICE applied to metabolic scans from Parkinson's disease (PD) patients, we showed that by using PCA to prespecify disease-related partition layers, we were able to optimize maps of functional metabolic connectivity within the relevant networks. Here, we show the potential of SICE, enhanced by disease-specific subnetwork partitions, to identify key regional hubs and their connections, and track their associations in PD patients with increasing disease duration. This approach enabled the identification of a core zone that included elements of the striatum, pons, cerebellar vermis, and parietal cortex and provided a deeper understanding of progressive changes in their connectivity. This subnetwork constituted a robust invariant disease feature that was unrelated to phenotype. Mean expression levels for this subnetwork increased steadily in a group of 70 PD patients spanning a range of symptom durations between 1 and 21 years. The findings were confirmed in a validation sample of 69 patients with up to 32 years of symptoms. The common core elements represent possible targets for disease modification, while their connections to external regions may be better suited for symptomatic treatment.


Asunto(s)
Vermis Cerebeloso , Enfermedad de Parkinson , Humanos , Enfermedad de Parkinson/diagnóstico por imagen , Tomografía Computarizada por Rayos X , Cuerpo Estriado/diagnóstico por imagen , Progresión de la Enfermedad
6.
BioData Min ; 16(1): 26, 2023 Sep 26.
Artículo en Inglés | MEDLINE | ID: mdl-37752578

RESUMEN

Gliomas are primary malignant brain tumors with poor survival and high resistance to available treatments. Improving the molecular understanding of glioma and disclosing novel biomarkers of tumor development and progression could help to find novel targeted therapies for this type of cancer. Public databases such as The Cancer Genome Atlas (TCGA) provide an invaluable source of molecular information on cancer tissues. Machine learning tools show promise in dealing with the high dimension of omics data and extracting relevant information from it. In this work, network inference and clustering methods, namely Joint Graphical lasso and Robust Sparse K-means Clustering, were applied to RNA-sequencing data from TCGA glioma patients to identify shared and distinct gene networks among different types of glioma (glioblastoma, astrocytoma, and oligodendroglioma) and disclose new patient groups and the relevant genes behind groups' separation. The results obtained suggest that astrocytoma and oligodendroglioma have more similarities compared with glioblastoma, highlighting the molecular differences between glioblastoma and the others glioma subtypes. After a comprehensive literature search on the relevant genes pointed our from our analysis, we identified potential candidates for biomarkers of glioma. Further molecular validation of these genes is encouraged to understand their potential role in diagnosis and in the design of novel therapies.

7.
BMC Genomics ; 24(1): 213, 2023 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-37095447

RESUMEN

BACKGROUND: Understanding the mechanisms underlining forage production and its biomass nutritive quality at the omics level is crucial for boosting the output of high-quality dry matter per unit of land. Despite the advent of multiple omics integration for the study of biological systems in major crops, investigations on forage species are still scarce. RESULTS: Our results identified substantial changes in gene co-expression and metabolite-metabolite network topologies as a result of genetic perturbation by hybridizing L. perenne with another species within the genus (L. multiflorum) relative to across genera (F. pratensis). However, conserved hub genes and hub metabolomic features were detected between pedigree classes, some of which were highly heritable and displayed one or more significant edges with agronomic traits in a weighted omics-phenotype network. In spite of tagging relevant biological molecules as, for example, the light-induced rice 1 (LIR1), hub features were not necessarily better explanatory variables for omics-assisted prediction than features stochastically sampled and all available regressors. CONCLUSIONS: The utilization of computational techniques for the reconstruction of co-expression networks facilitates the identification of key omic features that serve as central nodes and demonstrate correlation with the manifestation of observed traits. Our results also indicate a robust association between early multi-omic traits measured in a greenhouse setting and phenotypic traits evaluated under field conditions.


Asunto(s)
Oryza , Poaceae , Multiómica , Fenotipo , Metabolómica
8.
Bioinform Biol Insights ; 17: 11779322231152972, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36865982

RESUMEN

Global genetic networks provide additional information for the analysis of human diseases, beyond the traditional analysis that focuses on single genes or local networks. The Gaussian graphical model (GGM) is widely applied to learn genetic networks because it defines an undirected graph decoding the conditional dependence between genes. Many algorithms based on the GGM have been proposed for learning genetic network structures. Because the number of gene variables is typically far more than the number of samples collected, and a real genetic network is typically sparse, the graphical lasso implementation of GGM becomes a popular tool for inferring the conditional interdependence among genes. However, graphical lasso, although showing good performance in low dimensional data sets, is computationally expensive and inefficient or even unable to work directly on genome-wide gene expression data sets. In this study, the method of Monte Carlo Gaussian graphical model (MCGGM) was proposed to learn global genetic networks of genes. This method uses a Monte Carlo approach to sample subnetworks from genome-wide gene expression data and graphical lasso to learn the structures of the subnetworks. The learned subnetworks are then integrated to approximate a global genetic network. The proposed method was evaluated with a relatively small real data set of RNA-seq expression levels. The results indicate the proposed method shows a strong ability of decoding the interactions with high conditional dependences among genes. The method was then applied to genome-wide data sets of RNA-seq expression levels. The gene interactions with high interdependence from the estimated global networks show that most of the predicted gene-gene interactions have been reported in the literatures playing important roles in different human cancers. Also, the results validate the ability and reliability of the proposed method to identify high conditional dependences among genes in large-scale data sets.

9.
J Appl Stat ; 49(16): 4278-4293, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36353301

RESUMEN

In disease screening, a biomarker combination developed by combining multiple markers tends to have a higher sensitivity than an individual marker. Parametric methods for marker combination rely on the inverse of covariance matrices, which is often a non-trivial problem for high-dimensional data generated by modern high-throughput technologies. Additionally, another common problem in disease diagnosis is the existence of limit of detection (LOD) for an instrument - that is, when a biomarker's value falls below the limit, it cannot be observed and is assigned an NA value. To handle these two challenges in combining high-dimensional biomarkers with the presence of LOD, we propose a resample-replace lasso procedure. We first impute the values below LOD and then use the graphical lasso method to estimate the means and precision matrices for the high-dimensional biomarkers. The simulation results show that our method outperforms alternative methods such as either substitute NA values with LOD values or remove observations that have NA values. A real case analysis on a protein profiling study of glioblastoma patients on their survival status indicates that the biomarker combination obtained through the proposed method is more accurate in distinguishing between two groups.

10.
Stat Med ; 41(25): 5150-5187, 2022 11 10.
Artículo en Inglés | MEDLINE | ID: mdl-36161666

RESUMEN

Gaussian graphical models (GGMs) provide a framework for modeling conditional dependencies in multivariate data. In this tutorial, we provide an overview of GGM theory and a demonstration of various GGM tools in R. The mathematical foundations of GGMs are introduced with the goal of enabling the researcher to draw practical conclusions by interpreting model results. Background literature is presented, emphasizing methods recently developed for high-dimensional applications such as genomics, proteomics, or metabolomics. The application of these methods is illustrated using a publicly available dataset of gene expression profiles from 578 participants with ovarian cancer in The Cancer Genome Atlas. Stand-alone code for the demonstration is available as an RMarkdown file at https://github.com/katehoffshutta/ggmTutorial.


Asunto(s)
Genómica , Humanos , Distribución Normal
11.
Entropy (Basel) ; 23(12)2021 Dec 02.
Artículo en Inglés | MEDLINE | ID: mdl-34945929

RESUMEN

We consider learning as an undirected graphical model from sparse data. While several efficient algorithms have been proposed for graphical lasso (GL), the alternating direction method of multipliers (ADMM) is the main approach taken concerning joint graphical lasso (JGL). We propose proximal gradient procedures with and without a backtracking option for the JGL. These procedures are first-order methods and relatively simple, and the subproblems are solved efficiently in closed form. We further show the boundedness for the solution of the JGL problem and the iterates in the algorithms. The numerical results indicate that the proposed algorithms can achieve high accuracy and precision, and their efficiency is competitive with state-of-the-art algorithms.

12.
BMC Bioinformatics ; 22(1): 498, 2021 Oct 15.
Artículo en Inglés | MEDLINE | ID: mdl-34654363

RESUMEN

BACKGROUND: Identifying gene interactions is a topic of great importance in genomics, and approaches based on network models provide a powerful tool for studying these. Assuming a Gaussian graphical model, a gene association network may be estimated from multiomic data based on the non-zero entries of the inverse covariance matrix. Inferring such biological networks is challenging because of the high dimensionality of the problem, making traditional estimators unsuitable. The graphical lasso is constructed for the estimation of sparse inverse covariance matrices in such situations, using [Formula: see text]-penalization on the matrix entries. The weighted graphical lasso is an extension in which prior biological information from other sources is integrated into the model. There are however issues with this approach, as it naïvely forces the prior information into the network estimation, even if it is misleading or does not agree with the data at hand. Further, if an associated network based on other data is used as the prior, the method often fails to utilize the information effectively. RESULTS: We propose a novel graphical lasso approach, the tailored graphical lasso, that aims to handle prior information of unknown accuracy more effectively. We provide an R package implementing the method, tailoredGlasso. Applying the method to both simulated and real multiomic data sets, we find that it outperforms the unweighted and weighted graphical lasso in terms of all performance measures we consider. In fact, the graphical lasso and weighted graphical lasso can be considered special cases of the tailored graphical lasso, and a parameter determined by the data measures the usefulness of the prior information. We also find that among a larger set of methods, the tailored graphical is the most suitable for network inference from high-dimensional data with prior information of unknown accuracy. With our method, mRNA data are demonstrated to provide highly useful prior information for protein-protein interaction networks. CONCLUSIONS: The method we introduce utilizes useful prior information more effectively without involving any risk of loss of accuracy should the prior information be misleading.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Genómica , Distribución Normal , Mapas de Interacción de Proteínas
13.
Biostat Epidemiol ; 5(2): 189-206, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35415380

RESUMEN

This manuscript estimates the area under the receiver operating characteristic curve (AUC) of combined biomarkers in a high-dimensional setting. We propose a penalization approach to the inference of precision matrices in the presence of the limit of detection. A new version of expectation-maximization algorithm is then proposed for the penalized likelihood, with the use of numerical integration and the graphical lasso method. The estimated precision matrix is then applied to the inference of AUCs. The proposed method outperforms the existing methods in numerical studies. We apply the proposed method to a data set of brain tumor study. The results show a higher accuracy on the estimation of AUC compared with the existing methods.

14.
Front Genet ; 12: 760299, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35154240

RESUMEN

Biological networks are often inferred through Gaussian graphical models (GGMs) using gene or protein expression data only. GGMs identify conditional dependence by estimating a precision matrix between genes or proteins. However, conventional GGM approaches often ignore prior knowledge about protein-protein interactions (PPI). Recently, several groups have extended GGM to weighted graphical Lasso (wGlasso) and network-based gene set analysis (Netgsa) and have demonstrated the advantages of incorporating PPI information. However, these methods are either computationally intractable for large-scale data, or disregard weights in the PPI networks. To address these shortcomings, we extended the Netgsa approach and developed an augmented high-dimensional graphical Lasso (AhGlasso) method to incorporate edge weights in known PPI with omics data for global network learning. This new method outperforms weighted graphical Lasso-based algorithms with respect to computational time in simulated large-scale data settings while achieving better or comparable prediction accuracy of node connections. The total runtime of AhGlasso is approximately five times faster than weighted Glasso methods when the graph size ranges from 1,000 to 3,000 with a fixed sample size (n = 300). The runtime difference between AhGlasso and weighted Glasso increases when the graph size increases. Using proteomic data from a study on chronic obstructive pulmonary disease, we demonstrate that AhGlasso improves protein network inference compared to the Netgsa approach by incorporating PPI information.

15.
Biometrics ; 77(4): 1385-1396, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-32865813

RESUMEN

We consider a novel problem, bi-level graphical modeling, in which multiple individual graphical models can be considered as variants of a common group-level graphical model and inference of both the group- and individual-level graphical models is of interest. Such a problem arises from many applications, including multi-subject neuro-imaging and genomics data analysis. We propose a novel and efficient statistical method, the random covariance model, to learn the group- and individual-level graphical models simultaneously. The proposed method can be nicely interpreted as a random covariance model that mimics the random effects model for mean structures in linear regression. It accounts for similarity between individual graphical models, identifies group-level connections that are shared by individuals, and simultaneously infers multiple individual-level networks. Compared to existing multiple graphical modeling methods that only focus on individual-level graphical modeling, our model learns the group-level structure underlying the multiple individual graphical models and enjoys computational efficiency that is particularly attractive for practical use. We further define a measure of degrees-of-freedom for the complexity of the model useful for model selection. We demonstrate the asymptotic properties of our method and show its finite-sample performance through simulation studies. Finally, we apply the method to our motivating clinical data, a multi-subject resting-state functional magnetic resonance imaging dataset collected from participants diagnosed with schizophrenia, identifying both individual- and group-level graphical models of functional connectivity.


Asunto(s)
Conectoma , Esquizofrenia , Encéfalo/diagnóstico por imagen , Simulación por Computador , Humanos , Imagen por Resonancia Magnética/métodos , Esquizofrenia/diagnóstico por imagen
16.
Brief Bioinform ; 22(3)2021 05 20.
Artículo en Inglés | MEDLINE | ID: mdl-32578841

RESUMEN

The rapid accumulation of single-cell chromatin accessibility data offers a unique opportunity to investigate common and specific regulatory mechanisms across different cell types. However, existing methods for cis-regulatory network reconstruction using single-cell chromatin accessibility data were only designed for cells belonging to one cell type, and resulting networks may be incomparable directly due to diverse cell numbers of different cell types. Here, we adopt a computational method to jointly reconstruct cis-regulatory interaction maps (JRIM) of multiple cell populations based on patterns of co-accessibility in single-cell data. We applied JRIM to explore common and specific regulatory interactions across multiple tissues from single-cell ATAC-seq dataset containing ~80 000 cells across 13 mouse tissues. Reconstructed common interactions among 13 tissues indeed relate to basic biological functions, and individual cis-regulatory networks show strong tissue specificity and functional relevance. More importantly, tissue-specific regulatory interactions are mediated by coordination of histone modifications and tissue-related TFs, and many of them may reveal novel regulatory mechanisms.


Asunto(s)
Cromatina/genética , Bases de Datos de Ácidos Nucleicos , Redes Reguladoras de Genes , Análisis de Secuencia de ADN , Análisis de la Célula Individual , Factores de Transcripción/genética , Animales , Ratones , Especificidad de Órganos , Factores de Transcripción/metabolismo
17.
IEEE Trans Netw Sci Eng ; 8(4): 3019-3033, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35224127

RESUMEN

Graph matching consists of aligning the vertices of two unlabeled graphs in order to maximize the shared structure across networks; when the graphs are unipartite, this is commonly formulated as minimizing their edge disagreements. In this paper we address the common setting in which one of the graphs to match is a bipartite network and one is unipartite. Commonly, the bipartite networks are collapsed or projected into a unipartite graph, and graph matching proceeds as in the classical setting. This potentially leads to noisy edge estimates and loss of information. We formulate the graph matching problem between a bipartite and a unipartite graph using an undirected graphical model, and introduce methods to find the alignment with this model without collapsing. We theoretically demonstrate that our methodology is consistent, and provide non-asymptotic conditions that ensure exact recovery of the matching solution. In simulations and real data examples, we show how our methods can result in a more accurate matching than the naive approach of transforming the bipartite networks into unipartite, and we demonstrate the performance gains achieved by our method in simulated and real data networks, including a co-authorship-citation network pair, and brain structural and functional data.

18.
Neuroimage ; 226: 117568, 2021 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-33246128

RESUMEN

In neurodegenerative disorders, a clearer understanding of the underlying aberrant networks facilitates the search for effective therapeutic targets and potential cures. [18F]-fluorodeoxyglucose (FDG) positron emission tomography (PET) imaging data of brain metabolism reflects the distribution of glucose consumption known to be directly related to neural activity. In FDG PET resting-state metabolic data, characteristic disease-related patterns have been identified in group analysis of various neurodegenerative conditions using principal component analysis of multivariate spatial covariance. Notably, among several parkinsonian syndromes, the identified Parkinson's disease-related pattern (PDRP) has been repeatedly validated as an imaging biomarker of PD in independent groups worldwide. Although the primary nodal associations of this network are known, its connectivity is not fully understood. Here, we describe a novel approach to elucidate functional principal component (PC) network connections by performing graph theoretical sparse network derivation directly within the disease relevant PC partition layer of the whole brain data rather than by searching for associations retrospectively in whole brain sparse representations. Using sparse inverse covariance estimation of each overlapping PC partition layer separately, a single coherent network is detected for each layer in contrast to more spatially modular segmentation in whole brain data analysis. Using this approach, the major nodal hubs of the PD disease network are identified and their characteristic functional pathways are clearly distinguished within the basal ganglia, midbrain and parietal areas. Network associations are further clarified using Laplacian spectral analysis of the adjacency matrices. In addition, the innate discriminative capacity of the eigenvector centrality of the graph derived networks in differentiating PD versus healthy external data provides evidence of their validity.


Asunto(s)
Encéfalo/diagnóstico por imagen , Enfermedad de Parkinson/diagnóstico por imagen , Adulto , Anciano , Encéfalo/metabolismo , Estudios de Casos y Controles , Femenino , Fluorodesoxiglucosa F18 , Neuroimagen Funcional , Humanos , Procesamiento de Imagen Asistido por Computador , Masculino , Persona de Mediana Edad , Vías Nerviosas/diagnóstico por imagen , Vías Nerviosas/metabolismo , Enfermedad de Parkinson/metabolismo , Tomografía de Emisión de Positrones , Análisis de Componente Principal , Radiofármacos
19.
Genet Epidemiol ; 44(5): 408-424, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32342572

RESUMEN

Mediation analysis attempts to determine whether the relationship between an independent variable (e.g., exposure) and an outcome variable can be explained, at least partially, by an intermediate variable, called a mediator. Most methods for mediation analysis focus on one mediator at a time, although multiple mediators can be jointly analyzed by structural equation models (SEMs) that account for correlations among the mediators. We extend the use of SEMs for the analysis of multiple mediators by creating a sparse group lasso penalized model such that the penalty considers the natural groupings of parameters that determine mediation, as well as encourages sparseness of the model parameters. This provides a way to simultaneously evaluate many mediators and select those that have the most impact, a feature of modern penalized models. Simulations are used to illustrate the benefits and limitations of our approach, and application to a study of DNA methylation and reactive cortisol stress following childhood trauma discovered two novel methylation loci that mediate the association of childhood trauma scores with reactive cortisol stress levels. Our new methods are incorporated into R software called regmed.


Asunto(s)
Metilación de ADN , Modelos Genéticos , Modelos Estadísticos , Programas Informáticos , Niño , Biología Computacional , Simulación por Computador , Humanos , Hidrocortisona/metabolismo , Heridas y Lesiones/metabolismo
20.
Biostatistics ; 21(2): e1-e16, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-30203001

RESUMEN

Graphical lasso is one of the most used estimators for inferring genetic networks. Despite its diffusion, there are several fields in applied research where the limits of detection of modern measurement technologies make the use of this estimator theoretically unfounded, even when the assumption of a multivariate Gaussian distribution is satisfied. Typical examples are data generated by polymerase chain reactions and flow cytometer. The combination of censoring and high-dimensionality make inference of the underlying genetic networks from these data very challenging. In this article, we propose an $\ell_1$-penalized Gaussian graphical model for censored data and derive two EM-like algorithms for inference. We evaluate the computational efficiency of the proposed algorithms by an extensive simulation study and show that, when censored data are available, our proposal is superior to existing competitors both in terms of network recovery and parameter estimation. We apply the proposed method to gene expression data generated by microfluidic Reverse Transcription quantitative Polymerase Chain Reaction technology in order to make inference on the regulatory mechanisms of blood development. A software implementation of our method is available on github (https://github.com/LuigiAugugliaro/cglasso).


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Distribución Normal , Simulación por Computador , Humanos , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA