RESUMEN
Many challenging problems in biomedical research rely on understanding how variables are associated with each other and influenced by genetic and environmental factors. Probabilistic graphical models (PGMs) are widely acknowledged as a very natural and formal language to describe relationships among variables and have been extensively used for studying complex diseases and traits. In this work, we propose methods that leverage observational Gaussian family data for learning a decomposition of undirected and directed acyclic PGMs according to the influence of genetic and environmental factors. Many structure learning algorithms are strongly based on a conditional independence test. For independent measurements of normally distributed variables, conditional independence can be tested through standard tests for zero partial correlation. In family data, the assumption of independent measurements does not hold since related individuals are correlated due to mainly genetic factors. Based on univariate polygenic linear mixed models, we propose tests that account for the familial dependence structure and allow us to assess the significance of the partial correlation due to genetic (between-family) factors and due to other factors, denoted here as environmental (within-family) factors, separately. Then, we extend standard structure learning algorithms, including the IC/PC and the really fast causal inference (RFCI) algorithms, to Gaussian family data. The algorithms learn the most likely PGM and its decomposition into two components, one explained by genetic factors and the other by environmental factors. The proposed methods are evaluated by simulation studies and applied to the Genetic Analysis Workshop 13 simulated dataset, which captures significant features of the Framingham Heart Study.
Asunto(s)
Algoritmos , Modelos Estadísticos , Simulación por Computador , Humanos , Modelos Genéticos , Modelos Teóricos , Distribución NormalRESUMEN
Faced with the lack of reliability and reproducibility in omics studies, more careful and robust methods are needed to overcome the existing challenges in the multi-omics analysis. In conventional omics data analysis, signal intensity values (denoted by M and values) are estimated neglecting pixel-level uncertainties, which may reflect noise and systematic artifacts. For example, intensity values from two-color microarray data are estimated by taking the mean or median of the pixel intensities within the spot and then subjected to a within-slide normalization by LOWESS. Thus, focusing on estimation and normalization of gene expression profiles, we propose a spot quantification method that takes into account pixel-level variability. Also, to preserve relevant variation that may be removed in LOWESS normalization with poorly chosen parameters, we propose a parameter selection method that is parsimonious and considers intrinsic characteristics of microarray data, such as heteroskedasticity. The usefulness of the proposed methods is illustrated by an application to real intestinal metaplasia data. Compared with the conventional approaches, the analysis is more robust and conservative, identifying fewer but more reliable differentially expressed genes. Also, the variability preservation allowed the identification of new differentially expressed genes. Using the proposed approach, we have identified differentially expressed genes involved in pathways in cancer and confirmed some molecular markers already reported in the literature.
RESUMEN
BACKGROUND: Blood pressure (BP) is associated with carotid intima-media thickness (CIMT), but few studies have explored the association between BP variability and CIMT. We aimed to investigate this association in the Brazilian Longitudinal Study of Adult Health (ELSA-Brasil) baseline. METHODS: We analyzed data from 7,215 participants (56.0% women) without overt cardiovascular disease (CVD) or antihypertensive use. We included 10 BP readings in varying positions during a 6-hour visit. We defined BP variability as the SD of these readings. We performed a 2-step analysis. We first linearly regressed the CIMT values on main and all-order interaction effects of the variables age, sex, body mass index, race, diabetes diagnosis, dyslipidemia diagnosis, family history of premature CVD, smoking status, and ELSA-Brasil site, and calculated the residuals (residual CIMT). We used partial least square path analysis to investigate whether residual CIMT was associated with BP central tendency and BP variability. RESULTS: Systolic BP (SBP) variability was significantly associated with residual CIMT in models including the entire sample (path coefficient [PC]: 0.046; P < 0.001), and in women (PC: 0.046; P = 0.007) but not in men (PC: 0.037; P = 0.09). This loss of significance was probably due to the smaller subsample size, as PCs were not significantly different according to sex. CONCLUSIONS: We found a small but significant association between SBP variability and CIMT values. This was additive to the association between SBP central tendency and CIMT values, supporting a role for high short-term SBP variability in atherosclerosis.